What CAT is.
Computerized Adaptive Testing is a class of assessment methods grounded in Item Response Theory (IRT). Instead of giving every candidate the same fixed-form test, a CAT engine selects each next item based on the candidate's running ability estimate — easier items if recent responses suggest lower ability, harder items if higher. The assessment ends when the ability estimate reaches the configured precision threshold.
Established practice.
CAT has been the operational testing methodology behind major assessments for decades — including the GMAT (since 1997), the NCLEX nursing licensure exam, several armed-forces aptitude batteries (CAT-ASVAB), and large-scale K–12 state assessments. It is one of the most studied and validated assessment methodologies in modern educational measurement.
Item exposure control
Calibrated bank
AIQCAT's adaptive design.
AIQCAT's assessment draws on the CAT approach, with the difference that the "item" under CAT in our setting is typically a real work artifact rather than a multiple-choice question. The adaptive engine selects the next task or prompt based on the candidate's running estimate across the six dimensions, and stops when the estimate reaches the configured precision. Item authoring is handled by the Question Factory, where a swarm of 5–10 agents drafts and calibrates items in parallel for each organization's exam.
Why this is the right base methodology.
AI competency varies widely across individuals, and the cost of grading a real work artifact is non-trivial. CAT is well suited to this setting because (a) it concentrates grading effort on the items that maximise information about the candidate's ability and (b) it produces results that are comparable across candidates and across forms, without making every candidate sit identical tasks.
Boundaries.
AIQCAT is not, on its own, an authority on CAT methodology. We adopt established CAT practice as documented in the educational-measurement literature (Lord, Wainer, van der Linden et al.) and apply it to the artifact-grading setting. Reliability and predictive-validity studies are ongoing and published in the Research section as data matures under review.
See the six capability axes for what is being measured and Dimensions for the six dimensions that feed the ability estimate.
