AI x Bio Benchmark Saturation

How long does a new AI biology benchmark remain useful before frontier models saturate it?

Accelerating Saturation

Each dot is a biology benchmark that has saturated. The Y-axis shows how many months it took. Benchmarks introduced more recently saturate dramatically faster.

23 saturated benchmarks
to

Since 2022, models halve the remaining performance gap on biology benchmarks every

~27months
95% CI: 14–49 months

…and the rate is accelerating

3.4–9.1×faster
Post-breakpoint vs pre-breakpoint

Based on 73 benchmarks across 11 domains

Evaluation Coverage

Which bio capability domains currently have active, unsaturated benchmarks across different evaluation types.

KnowledgeReasoningProceduralAgentic
Virology / Biosecurity
Genomics
Protein
Drug Discovery
Clinical
Bio NLP
Medical Imaging
Science QA
Agentic Bio
ActiveNearing saturationSaturatedNo benchmark