AI researchers document why AI coding assistants fail at building novel AI architectures — $1,000 experiment shows systematic blind spots

What happened

A researcher attempted to build a specialized AI model using AI coding assistants (Claude and Cursor) without programming experience, and documented the detailed failure: the resulting model had 86 specialized subsystems that contributed less than 2% to actual output. This reveals a concrete problem: AI assistants are good at writing code that looks correct but can't actually validate whether novel architectural designs work in practice.

Why it matters

We're entering an era where non-programmers are using AI to build AI systems, but this experiment shows a critical failure mode: AI assistants can generate plausible-looking code that compiles and runs without catching that the underlying design doesn't actually work. The 86 subsystems producing less than 2% of output suggests the assistants followed instructions faithfully but couldn't evaluate whether those instructions made sense structurally — they can't step back and say 'this architecture is broken' the way a human who understands the domain would. This matters because it exposes a gap between code correctness (does it run?) and system correctness (does it do what you intended?) that gets wider the more novel the design. Organizations betting on AI-assisted development of custom systems should expect this class of failure.

The signal

Whether the five systematic failure modes documented here (about AI assistants and novel architecture development) appear in other attempts to build experimental AI systems, and whether organizations start requiring human architectural review before deploying AI-assisted code for novel designs.