AI can identify cryptographic code — but human experts are still much faster at it

What happened

Researchers built a benchmark test to measure whether large language models can reverse-engineer cryptographic software, and found GPT-4 solves about 60% of problems correctly while human experts solve 92%. The test itself is the signal: for the first time, there's a standardized way to measure whether AI actually helps with one of the most expensive, specialized tasks in software security.

Why it matters

Reverse engineering cryptographic code is one of the hardest, most expensive tasks in security work — it requires expertise that takes years to develop and is needed constantly for vulnerability discovery and malware analysis. This benchmark doesn't show AI is ready to replace humans at this work yet, but it does create a measurement tool that will let companies and security teams watch whether AI actually closes the gap over time. Right now it doesn't. But now there's a way to know if it will.

The signal

Watch whether GPT-5 or Claude's next version materially improves on GPT-4's 60% success rate, and whether any company actually deploys these models into reverse engineering workflows and measures whether it saves money or just creates false confidence.