The 6 Dimensions That Actually Predict Engineering Performance

Software engineering has a measurement problem. The industry relies on signals (resume keywords, whiteboard puzzles, years of experience) that research consistently shows are poor predictors of actual performance. Industrial-Organizational psychology and software engineering research have spent decades identifying what actually works. The findings are specific, replicated, and largely ignored. Here are the six research-backed dimensions that matter, and how we measure each one.


Dimension 1: Work Samples (Real Projects)

The research: Schmidt and Hunter's 1998 meta-analysis found that work sample tests are the single best predictor of job performance, with a validity coefficient of r=0.54. Nothing else comes close as a standalone method.

Why it works: Work samples eliminate the gap between "can talk about engineering" and "can actually engineer." Software engineering research confirms this: Behroozi et al.'s controlled studies at FSE (2020, 2022) showed that traditional whiteboard coding measures anxiety rather than ability, while work-sample-style assessments in private settings produced dramatically better signal. When you ask someone to build a real project, with architecture decisions, error handling, edge cases, and production concerns, you see exactly what they'll do on the job. There's no way to fake a working system.

What we do: Every developer in our pipeline builds 6 complete Haskell projects, from a command-line tool to a Glasgow Haskell Compiler contribution. These aren't toy exercises. They require real architecture decisions, state management, interface design, and error handling. When they're done, they've already shipped more real software than most interview processes will ever test.


Dimension 2: Structured Evaluation (Technical Defense)

The research: Behroozi, Shirolkar, Barik, and Parnin (2020, ICSE) studied what goes wrong in technical interview processes across major software companies. Their qualitative analysis identified consistent failure modes: interviewers not communicating evaluation criteria, using inexperienced evaluators, and applying inconsistent standards. The companies with the best hiring outcomes shared one trait: structured processes with predetermined criteria.

Why it works: Structure eliminates two of the biggest problems in evaluation: inconsistency (different evaluators asking different questions) and bias (evaluators pattern-matching on superficial similarities rather than evaluating ability). When you define what good looks like before the evaluation starts, you measure candidates against the job, not against each other's vibes.

What we do: Every project ends with a live defense. Our engineering team interrogates every decision: why this data structure, why this abstraction boundary, why this error handling strategy. The questions are consistent, the evaluation criteria are predetermined, and the format is the same for every developer. No rehearsed answers survive this, because we're evaluating reasoning process, not memorized solutions.


Dimension 3: Cognitive Assessment (Code Challenges)

The research: General cognitive ability tests predict job performance at r=0.51 (Schmidt & Hunter, 1998). For complex jobs like software engineering, the predictive power is even higher. The key insight: you want to measure how someone reasons through novel problems, not what they've memorized.

Why it works: Software engineering requires constant problem-solving in unfamiliar territory. McCracken et al.'s multi-institutional study (2001, ITiCSE) assessed programming ability across 216 students at 4 universities and found that decomposing competency into multiple dimensions (understanding the problem, designing a solution, coding, testing) revealed what single-metric tests completely missed. No single test captured actual programming ability. An engineer who can decompose unfamiliar problems, recognize structural patterns, and apply abstract principles will outperform one who has memorized solutions to common coding questions. Multi-dimensional assessment is essential because engineering ability is itself multi-dimensional.

What we do: 88 functional programming challenges purpose-built for the paradigm. These aren't LeetCode problems with Haskell syntax bolted on. They test higher-order functions, monadic composition, and type-level thinking that mainstream platforms can't touch. Combined with AI-driven interviews that assess communication and problem decomposition across 20+ dimensions, we measure how developers think, not what they've memorized.


Dimension 4: Collaborative Ability (Mentorship)

The research: Research on team effectiveness (Woolley et al., 2010, published in Science) found that the best predictor of team performance isn't the intelligence of the smartest member. It's the social sensitivity of the group and the equality of conversational turn-taking. Engineers who elevate others multiply the output of entire teams.

Why it works: In real engineering organizations, the difference between a good engineer and a great one isn't how fast they solve isolated problems. It's whether they make the people around them better. Bacchelli and Bird's study at ICSE (2013) examined code review practices at Microsoft and found that the primary benefits weren't defect detection. They were knowledge transfer, increased team awareness, and creation of alternative solutions. 38% of developers cited knowledge transfer as their primary motivation for code review. Engineers who engage collaboratively (reviewing code, sharing context, teaching teammates) multiply the output of entire teams.

What we do: We track how developers engage in our community over months, not a single data point, but a pattern. Do they mentor others? Do they contribute to shared learning? Do they elevate peers who are struggling? An engineer who solves their own problems is good. One who makes the whole team better is exceptional. We measure for the latter.


Dimension 5: Verified Competencies (Skill Mapping)

The research: The literature on competency-based assessment (Boyatzis, 1982; Spencer & Spencer, 1993) consistently shows that verified, specific competencies predict job performance far better than years of experience or educational credentials. Knowing exactly what someone can do, not what they claim to know, is the foundation of effective matching.

Why it works: Self-reported skills are notoriously unreliable. Developers overestimate their expertise in some areas and underestimate it in others. Years of experience tells you how long someone has been employed, not what they can actually build. When you map verified competencies against actual job requirements, you eliminate the guesswork that leads to bad hires.

What we do: Our Haskell skill tree maps every functional programming concept a developer has mastered, is developing, or hasn't touched yet. Each competency is verified through actual demonstrations: completed projects, challenge results, and defense performance. When you tell us what you need, we don't guess. We match on verified, mapped competencies.


Dimension 6: Character and Growth (Long-term Relationship)

The research: Carol Dweck's work on growth mindset (2006) demonstrates that learning orientation predicts long-term performance better than current skill level. Conway and Huffcutt (2003) showed that multi-rater assessment (using multiple evaluators over multiple interactions) dramatically increases the reliability of evaluation decisions.

Why it works: A single evaluation session is a snapshot. It tells you how someone performs under those specific conditions on that particular day. It doesn't tell you how they handle setbacks, how they respond to feedback, whether they grow from challenges, or what they're like to work with over months. Those qualities matter enormously for long-term fit and performance, and they're invisible in any short-form assessment.

What we do: We work with every developer for months, sometimes years. We know their character, their work ethic, how they handle setbacks, and how they respond to feedback. When we recommend someone, it's a personal character reference backed by months of observation, not a score from a stranger who met them last Tuesday.


Why Multi-Method Assessment Matters

The research is clear on one more point: no single method is sufficient. Schmidt and Hunter found that combining methods increases predictive validity substantially. A work sample test alone predicts at r=0.54. Combined with a cognitive ability test, the validity rises to r=0.63. Add structured evaluation and the prediction improves further.

Each method catches what the others miss. Work samples reveal building ability but not reasoning under pressure. Cognitive tests reveal reasoning ability but not collaborative skill. Long-term observation reveals character but not raw technical depth. Only by combining all six dimensions do you get a complete picture of engineering capability.

Our pipeline uses six different methods, each targeting a different research-backed predictor of performance, assessed over months rather than hours. It's not more convenient than a three-round interview loop. But the evidence says it works.


References