What happened
Researchers tested multiple AI models on actual handwritten university STEM assignments and found they systematically misread student work — missing equations, diagrams, and reasoning — at rates far too high for automated grading. This matters because schools are starting to deploy these systems to save teacher time, but the AI is making errors quietly, especially on complex visual content like sketches and mathematical notation mixed together.
Why it matters
If an AI grader misreads a student's work and the teacher never catches it, that student gets marked wrong for work they did right. The paper shows this is happening at scale across different AI models, and a hybrid system (routing only 3% to humans) still masks AI errors that would slip through in a fully automated system.