The world is being quietly rearranged by people who write very long documents.


The title they went with Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition Noisy translates that to

AI can now read molecular diagrams from chemistry papers — but not perfectly yet


Researchers fine-tuned an AI model to convert 2D molecular structures from printed chemistry papers into machine-readable code. The model works reasonably well on synthetic examples but still fails on real patent documents compared to the current best approach, which means it's not ready to replace existing chemistry OCR systems.
Chemistry research is buried in millions of printed papers and patents — images of molecular structures that humans can read but computers can't easily parse into usable data. If this problem gets solved, chemists could automatically extract chemical information from decades of published literature and turn it into searchable databases. Right now it's a slow manual process. This attempt shows the AI approach is getting close but hasn't crossed the threshold where it's actually better than specialized chemical software.
Watch whether future versions of this model (or similar approaches) beat the graph-based chemistry tools at the exact-match accuracy metric — the moment AI reads molecules as reliably as the current best tools, the bottleneck shifts from 'can we do this' to 'how fast can we scale it to old papers.'

If you insist
Read the original →