On April 17, 2026, OpenAI launched GPT-Rosalind, its first reasoning model designed for biology, drug discovery, and translational medicine.
The name honors Rosalind Franklin, whose X-ray crystallography made the structure of DNA visible.
The English-speaking press immediately referred to it as a “direct competitor to AlphaFold 3,” which is technically incorrect.
This confusion could cost several days for French-speaking R&D teams who need to decide what to test this week.
This guide breaks down the announcement: what the model does, the published benchmarks, its relationship with AlphaFold 3 and Chai-1, and the decisions a French biotech CTO needs to make.
In brief
- GPT-Rosalind orchestrates, doesn’t fold proteins: it reads literature, formulates hypotheses, plans experiments, and delegates 3D structure to AlphaFold 3 or Chai-1
- Quantified benchmarks: BixBench 0.751 vs 0.732 for GPT-5.4 and 0.698 for Grok 4.2, 6 families beaten out of 11 in LABBench2, 95th human percentile at Dyno Therapeutics
- Free preview but closed to the French: access reserved for qualified US companies via the Trusted Access Program, mandatory biosafety verification
- Fine-tuning to skepticism: first time OpenAI delivers a model trained to reject weak targets rather than comply
- Two paths for French-speaking R&D: wait for EU access or switch to the ESM3 stack plus Chai-1 plus self-hosted Boltz-1
What OpenAI announced on April 17, 2026
OpenAI positioned GPT-Rosalind as the first model in a vertical series dedicated to life sciences
The announcement was relayed the same day by FierceBiotech, Pharmaphorum, and The Next Web
The model is trained to reason about biology, not to generate 3D protein structures
Public partners include Amgen, Moderna, Thermo Fisher Scientific, Novo Nordisk, Broad Institute, Allen Institute, Los Alamos, and UCSF School of Pharmacy
NVIDIA provides the computing power, Benchling ensures connection with digital lab notebooks
Séan Bruich, SVP AI at Amgen, summarizes the logic: accelerating therapeutic timelines by applying advanced tools to new tasks
Joy Jiao, head of Life Sciences Research at OpenAI, tempers: the model is designed to speed up time-consuming phases, not to replace researchers
A reasoning model, not a structural predictor
The modern pharma R&D chain stacks several specialized models that speak different languages
At the top of the stack, a scientific reasoning engine reads articles and plans steps, while below, structural predictors like AlphaFold 3, Chai-1, or Boltz-1 transform a sequence into 3D atomic coordinates
GPT-Rosalind occupies the first tier, not the second
What GPT-Rosalind does
The model is calibrated for five specific tasks listed by OpenAI:
- Scientific literature synthesis: compiling hundreds of PubMed articles and extracting converging evidence
- Mechanistic hypothesis generation: proposing biological pathways ranked by robustness
- Experimental planning: CRISPR protocols, cloning, and cell assays with reagent selection
- Multi-omics interpretation: reading transcriptomic, proteomic, and metabolomic datasets
- Target validation: ranking a therapeutic pipeline based on feasibility criteria
What it doesn’t do
GPT-Rosalind does not generate 3D atomic coordinates
It doesn’t fold proteins, design de novo molecules, or replace traditional docking
OpenAI refers to AlphaFold 3 and Chai-1 for these tasks, called as external tools
The common mistake is to present the model as an alternative to AlphaFold 3: they coexist, they don’t replace each other
The Codex plugin for life sciences
OpenAI has delivered a free plugin for Codex
It connects mainstream models to over 50 scientific tools: PubMed, ClinicalTrials.gov, UniProt, ChEMBL, AlphaFold Atlas, and STRING-DB
For a lab without GPT-Rosalind access, this plugin is already a usable tool
It fits into the logic of reasoning inaugurated with o1, applied to a highly regulated field
Published benchmarks and the “default skeptic” bar
Three evaluations anchor the model’s promises: BixBench, LABBench2, and an industrial collaboration with Dyno Therapeutics
The figures come from OpenAI and have been confirmed by FierceBiotech, with the useful caveat of a training bias on public evaluations
BixBench and LABBench2: public measurement
BixBench is a bioinformatics benchmark maintained by Edison Scientific: 53 analysis scenarios, 296 questions, agent placed in front of an empty Jupyter notebook
On this bench, GPT-Rosalind scores 0.751 pass@1, ahead of GPT-5.4 at 0.732, GPT-5 at 0.728, Grok 4.2 at 0.698, and Gemini 3.1 Pro at 0.550
The gap with the generalist peak is 2 points, not ten
LABBench2 weighs 1,900 tasks in 11 families, and GPT-Rosalind beats GPT-5.4 in 6 families, with the biggest gain on CloningQA
The Dyno Therapeutics test: the real signal
The most cited measure comes from an evaluation at Dyno Therapeutics, specializing in designing AAV capsids for gene therapy
Dyno provided unpublished, novel RNA sequences to avoid benchmark contamination
On sequence-function prediction, GPT-Rosalind’s top ten submissions reached above the 95th percentile of human experts
On generation, the score drops to 84th percentile, a result to be read with caution
This figure measures a specific sub-task, not the model’s overall performance on all biological R&D
Fine-tuning to skepticism
The most interesting differentiation point isn’t a number, it’s a training choice
OpenAI conditioned GPT-Rosalind to reject weak targets rather than validate by default
The model is trained to say “this hypothesis lacks evidence, I don’t validate it” instead of crafting a pleasing response, a stance that saves three months of useless deliverables for a team validating a poorly framed brief
In a context where a poorly prioritized target can cost over $2 billion in the clinical cycle, shifting from a compliant model to a disagreeable one has direct economic value
This is the first time OpenAI publicly delivers this stance

Mapping bio-AI verticals: GPT-Rosalind orchestrates, others execute
Useful analogy: the 2026 bio-AI stack works like a starred kitchen
Structural predictors are the station chefs, GPT-Rosalind is the brigade chef who sequences the service
Each has its specialty, none replaces the others
Proprietary and open source structural predictors
AlphaFold 3 remains the structural reference: 3D prediction of protein-ligand-DNA-RNA complexes, 50% gain on PoseBusters compared to AF2
Isomorphic Labs has signed deals totaling nearly $3 billion with Eli Lilly and Novartis
Evo 2 addresses long genomics: 40 billion parameters, 1 megabase context, useful for regulatory regions
ESM3 unifies sequence, structure, and function in a single multimodal model
Chai-1 replicates AlphaFold 3 quality in open source, with or without multiple alignment
Boltz-1 is even more hackable, training and architecture published, currently the most transparent model of the lot
Roles in an R&D workflow
Reading grid for an R&D manager:
- GPT-Rosalind: orchestrator, literature synthesis, hypotheses, planning
- AlphaFold 3: 3D structure of complexes, restricted pharma access
- Evo 2: long genomics open source up to 1 Mb
- ESM3: multimodal protein design with constraints
- Chai-1 and Boltz-1: open alternatives to AlphaFold 3, locally installable
A biotech CTO announcing to their board “we’re switching from AlphaFold 3 to GPT-Rosalind” makes an architectural grammar mistake: it’s more about switching from “ad hoc scripts” to “orchestrator that properly calls AlphaFold 3”
Restricted access, unclear pricing, sovereignty, and biosafety
The economic and legal aspect, little covered by the French press, determines what a French biotech SME can do this week
The preview: free, US Enterprise, safety review
During the research preview, using GPT-Rosalind consumes neither tokens nor credits
Access is conditioned on four cumulative criteria:
- Enterprise with Enterprise contract: ChatGPT, Codex, or API Enterprise
- US headquarters or legal entity: EU structures are not supported
- Legitimate biological research use oriented towards human health
- SOC 2 Type 2, HIPAA, BAA controls available, biosafety audit
OpenAI has not published any post-preview pricing, and the reasonable assumption remains a custom Enterprise license
FR/EU sovereignty and biosafety
For a French-speaking lab, the equation is clear: no direct access before EU opening
Available routes are limited to detours via a qualified US partner, typically an Amgen subsidiary or a Broad Institute collaborator
Regarding GDPR, health data falls under Article 9, and contracting must provide for a transfer outside the EU
More than 100 researchers have signed an open letter calling for stricter control of sensitive biological data
OpenAI responded with a system of “high-precision flags” that triggers an alert as soon as dual-use thresholds are crossed

Three concrete decisions to make this week
The French-speaking reader has three likely profiles, each with a different decision to make
Biotech SME and academic lab
Concrete case: a 12-person French biotech working on an anti-colorectal cancer antibody
Before: three weeks of cross-referencing literature and hypothesis meetings
With GPT-Rosalind via a Broad Institute partnership: 2 hours for 40 papers synthesized, 6 hypotheses ranked, and a validation protocol
The ROI depends on the entry cost into a US partnership, often with shared intellectual property
For an INSERM or CNRS lab without a US subsidiary, hosting ESM3 on the Jean Zay HPC remains viable and preserves data sovereignty
At Owkin or InstaDeep, the decision has already been made: continue investing in the proprietary open source stack, and monitor GPT-Rosalind for literature synthesis, not for patient data
French-speaking AI dev: what return on investment
For a freelancer or an IT service company building biotech consulting offers, adding a GPT-Rosalind workflow involves two investments
The first: mastering the free Codex plugin, accessible without the Trusted Access Program
The second: setting up a Chai-1 or Boltz-1 demo in parallel, to offer the open source version to clients blocked by US access
The hybrid stack of Codex plugin plus local open source covers 80% of use cases for a biotech SME without dependence on trusted access
Training represents about 5 days of ramp-up for a data engineer already comfortable with LangChain
Forty-eight hours after launch, three signals deserve attention: the publication of the first independent reviews, the announcement of an EU opening, and a public post-preview pricing
The strategic question remains: when is a vertical model better than a generalist plus in-house RAG?
For further reading, see the 2026 technical guide to RAG applied to research
Frequently asked questions about GPT-Rosalind
Is GPT-Rosalind a direct competitor to AlphaFold 3?
No, GPT-Rosalind reasons and AlphaFold 3 predicts 3D structures, they coexist in an R&D stack where the former calls the latter as a tool
Can I access GPT-Rosalind from France today?
Not directly, access is reserved for qualified US companies via the Trusted Access Program, a French lab must go through a US partnership or wait for EU opening
How much does GPT-Rosalind cost?
During the research preview, usage does not consume tokens or credits for approved companies, no public post-preview pricing has been announced
Who are the announced partners?
Amgen, Moderna, Thermo Fisher, Novo Nordisk, Allen Institute, Broad Institute, Los Alamos, NVIDIA, Oracle Health, Benchling, UCSF School of Pharmacy, and Retro Biosciences
What does “fine-tuning to skepticism” mean?
OpenAI trained the model to reject weak targets or under-documented hypotheses rather than crafting a pleasing response
On which tasks does GPT-Rosalind outperform GPT-5.4?
On BixBench at 0.751 versus 0.732, and on 6 of the 11 LABBench2 families with the biggest gain on CloningQA
Can the model generate a new protein?
Not directly, de novo generation remains the domain of ESM3 or Chai-1, GPT-Rosalind can frame the request and call these tools
What open source alternatives exist today?
For 3D structure Chai-1 and Boltz-1 installable locally, for protein design ESM3, for long genomics Evo 2
What is the dual-use risk mentioned by researchers?
More than 100 scientists are calling for stricter control of training data due to fears of misuse on pathogens, OpenAI responded with high-precision flags
Which published figures are independently verifiable?
BixBench and LABBench2 scores are based on consultable Edison Scientific benchmarks, the Dyno test uses unpublished sequences and remains to be corroborated
Related Articles
Claude 4.7 costs more: 7 ways to keep your API bill in check
Anthropic launched Opus 4.7 on April 16, 2026, with the most reassuring formula possible: unchanged pricing, $5 per million tokens for input, $25 for output. The Claude API bill sent…
Building agents that don’t break: OpenAI’s new stack explained
By 2026, no digital company questions whether to integrate an AI agent somewhere: customer service, quote generation, ticket sorting, contract analysis. The real question has become: how to build them…