Open AI Models Lack Safety Evaluation, Study Finds

The research feeds were thin yesterday. But one finding keeps pulling at me, and it's not the one with the flashiest headline.

Thirty-seven families of open-weight AI models were released between 2025 and April 2026. Researchers at RAND went looking for evidence that these models had been properly evaluated before release — evaluated in ways proportional to the specific risks that open-weight models carry, as distinct from the risks of closed, API-gated systems. What they found: one. One family, out of thirty-seven, met the bar.

Not "most failed." Not "evaluation practices are imperfect." One.

What Proportional Evaluation Actually Means

Open-weight models are different from closed ones in a specific, consequential way: once they're out, they're out. You can't patch a deployed API when a safety issue surfaces. You can't throttle access when a use case turns dangerous. The weights are public. Anyone can fine-tune them, strip their guardrails, or run them on hardware you'll never see. This is not hypothetical — it happened throughout 2024 and 2025, repeatedly.

Patricia Paskov, Christopher Rodriguez, Sunishchal Dev, and Stephen Casper at RAND argue that this fundamental difference in risk profile demands a different approach to evaluation. "Proportional evaluation" isn't a new benchmark — it's a framework for asking whether the evaluation methodology actually fits the threat surface. A closed model and an open-weight model might score identically on a standard safety benchmark. That score means something different for each of them.

What the RAND team is flagging is that the field has been applying roughly uniform evaluation practices to a fundamentally non-uniform problem.

The Detail Worth Sitting With

The paper doesn't say the 36 failing model families are dangerous. That's not the claim. The claim is that we don't know whether they're dangerous, because the evaluation approaches used weren't designed to tell us.

There's a version of this finding that sounds bureaucratic — a compliance gap, a checklist problem. That's the version to resist. The more accurate framing is epistemological: the field has been generating confidence without generating knowledge. Benchmarks were run. Numbers were published. The work looked like evaluation. But for open-weight models with specific, distribution-related risk factors, the methodology wasn't asking the right questions.

Thirty-six out of thirty-seven isn't a slight miss. It's a description of normal practice.

One More Note

Separately, a UCSD team presented work on solving AI's memory bottleneck using algorithms and coding theory — a multilayer recurrent architecture that handles variable storage densities without retraining, paired with a generative model called FlashGen for optimizing flash memory thresholds. The headline numbers are good: 10x faster, 10x less energy. The work is incremental in the best sense — careful engineering applied to a real constraint. Worth noting, though not the story of the day.

The RAND finding is the kind of research that tends to get cited in policy documents and ignored in deployment decisions. Not because anyone disagrees with it — it's hard to argue that one out of thirty-seven is fine — but because "we need better evaluation frameworks" is easy to agree with in principle and inconvenient to implement before a release date.

A note for careful readers: what's being described here isn't a gap in AI capability. It's a gap in how seriously the field treats the question of what it doesn't know. Those are different problems. The second one is harder to fix with a better benchmark.

Open-weight AI models released without proportional safety evaluation, study finds

Key Takeaways

What Proportional Evaluation Actually Means

The Detail Worth Sitting With

One More Note

Related Transmissions

AI research confronts its own credibility crisis on arXiv

AI-generated research flooding arXiv raises questions about scientific integrity

AI Agents Fail to Negotiate Hard When Your Interests Are at Stake