The world is being quietly rearranged by people who write very long documents.


The title they went with EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions Noisy translates that to

AI grading systems fail silently on real student handwriting


Researchers tested multiple AI models on actual handwritten university STEM assignments and found they systematically misread student work — missing equations, diagrams, and reasoning — at rates far too high for automated grading. This matters because schools are starting to deploy these systems to save teacher time, but the AI is making errors quietly, especially on complex visual content like sketches and mathematical notation mixed together.
If an AI grader misreads a student's work and the teacher never catches it, that student gets marked wrong for work they did right. The paper shows this is happening at scale across different AI models, and a hybrid system (routing only 3% to humans) still masks AI errors that would slip through in a fully automated system.

If you insist
Read the original →