Lilith Lilith.
CS EN PL
Start

OpenAI and Anthropic published results of a joint safety evaluation. Each lab tested the other's models across a set of risks: misalignment, instruction following, hallucinations, jailbreak resistance, and related areas.

Competing labs look for each other's blind spots

This is the first published cross-lab safety collaboration of this scope between two direct competitors. Internal evaluations have a structural weakness: the team that trained the model knows what is being tested, and unconsciously designs benchmarks the model is set up to pass. Outside eyes look elsewhere.

The results show where each caught the other: specific instruction following gaps and jailbreak techniques that internal tests missed. The published report covers both where the models succeeded and where they failed, and explicitly describes the value of cross-lab collaboration.

For regulators and enterprise buyers, this shifts the evaluation frame

Until now, safety evaluation standards were largely solo: each lab published its own results by its own methods. If cross-lab evaluations become a template, comparing apples to apples becomes possible. That matters for any organization today choosing between models without capacity for independent auditing.

It also creates pressure for methodological convergence. Once two labs publish how they measure hallucination rates or jailbreak resistance, a third player cannot afford to publish an incomparable metric without explanation.

A shared evaluation is not an independent audit and both sides control what gets published

A shared evaluation is still not an independent audit. Both sides decide what gets tested, how it gets tested, and what gets published. The scope of the published methodology directly determines how much the conclusions can be trusted. From the publicly available summary, it is not fully clear how reproducible the results are for a third party.

There is also a PR layer here. "We were the first to test each other's models" is a strong story for regulators and investors, precisely when legislative pressure on AI safety is rising.

A regular audit cycle with a third-party verifier would be a different order of magnitude

One joint evaluation is a signal. A regular audit cycle with published methodology and a third party as verifier would be a different order of magnitude. Worth watching: whether similar collaborations become a standard or remain a PR moment, and whether methodological details will be available to other researchers.

Lilith's verdict

Two of the biggest AI labs showed each other where they failed to find their own bugs. A healthy start. What remains is making this a rule, not a press release.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗

From the Glossary