AI Pair Programming Metrics

Overview

Everyone has opinions about AI coding assistants. This study sought actual data: what happens to developer productivity, code quality, and mental effort when LLMs join the pair programming session?

The findings challenge some popular narratives—both the "10x developer" hype and the "it's just autocomplete" dismissals.

Key Findings

Velocity gains are real but uneven: Average 25-40% faster for boilerplate and familiar patterns; negligible or negative for novel problem-solving
Code quality is context-dependent: Higher consistency in style and documentation; increased risk of subtle bugs in complex logic
Cognitive load shifts, not shrinks: Less mental effort on syntax, more on review and verification
Learning effects matter: Developers who understood their assistant's failure modes outperformed those who didn't

Methodology

Study conducted over 8 weeks with 24 developers across 3 organizations. Metrics collected included:

Lines of code written and modified per session
Time to complete standardized coding tasks
Defect rates in code review
Self-reported cognitive load surveys
AI suggestion acceptance and modification rates

Implications

The data suggests organizations should focus less on "will AI make us faster?" and more on "what kinds of work should we route to AI-assisted vs. AI-free workflows?"

"The question isn't whether AI helps. It's knowing when it helps and when it hurts."

Resources

Full methodology and anonymized dataset available on request. Summary presentation slides from the original talk are linked below.