GPT-5.5 OpenAI: official benchmarks and feedback

OpenAI released GPT-5.5 on April 23, 2026, just six weeks after GPT-5.4.

The model is touted as the most capable yet for coding, long tasks, and autonomous tool use.

The communication is more measured than for GPT-5 in August 2025, which is a good thing: it allows us to look at the benchmarks, real-world feedback, and what actually changes in the subscription.

Here’s a factual update at 24 hours, without the hype.

What OpenAI is announcing

GPT-5.5 is primarily a ChatGPT deployment, not an API release.

The model is available to Plus, Pro, Business, Enterprise, and Edu subscribers, with a GPT-5.5 Thinking variant designed for heavy tasks: long documents, extended reasoning, multi-step agents.

Go users (€8/month) can access GPT-5.5 Thinking via the + icon in the chat box, with a limit of 10 messages every 5 hours.

Free users remain on GPT-5.4: nothing changes for them model-wise.

Above that, GPT-5.5 Pro is reserved for Pro, Business, Enterprise, and Edu plans.

This is the advanced version of the model, with a higher reasoning budget, targeted at the longest workflows: complex refactors, dense document analysis, multi-source research.

GPT-5.5 is primarily a ChatGPT update: the full API comes later, and only GPT-5.5 Pro is already listed on developer pricing at $30 / $180 per million tokens.

For the public API, GPT-5.5 is not yet available as quickly as GPT-5.4, whose obsolescence pace is already questioned: The page openai.com/index/introducing-gpt-5-5/ mentions GPT-5.5 Pro at $30 input and $180 output per million tokens, but availability for all developers is staggered over time.

The announcement also emphasizes token efficiency: OpenAI states that GPT-5.5 consumes significantly fewer tokens to accomplish the same tasks in Codex, while maintaining the same token latency as GPT-5.4.

The benchmarks that matter

OpenAI publishes a series of scores placing GPT-5.5 above GPT-5.4 and above Claude Opus 4.7 on several agentic evaluations.

Here are the official figures released on April 23, 2026.

Benchmark	GPT-5.5	GPT-5.5 Pro	What it measures
Terminal-Bench 2.0	82.7%	n/a	Command line agents, multi-step tasks
SWE-Bench Pro	58.6%	n/a	Real GitHub ticket resolution
GDPval	84.9%	n/a	Knowledge work tasks
OSWorld-Verified	78.7%	n/a	Using a computer like a human
FrontierMath Tier 4	35.4%	39.6%	Research math, open problems
BrowseComp	84.4%	90.1%	Multi-source web research

Three things to keep in mind before drawing conclusions.

First bias: these benchmarks are published by OpenAI with its own evaluation harnesses.

The comparison with Claude Opus 4.7 reads differently depending on who runs the tests: Anthropic publishes 64.3% on SWE-Bench Pro for Opus 4.7, versus 58.6% for GPT-5.5, and 73.8% on CyberGym where OpenAI measures Opus 4.7 at 73.1%.

The gaps are real, but methodological before being absolute.

Second bias: Terminal-Bench 2.0 and OSWorld-Verified measure agents executing, not the quality of the code produced.

A model can dominate an agentic benchmark and produce a refactor that needs to be reviewed line by line in production.

Third bias: GDPval and BrowseComp are relatively recent evaluations, designed for frontier models.

The very high scores also reflect the fact that the tests were calibrated while the models improved: we’re looking at a moving ceiling.

Reading a frontier benchmark without considering who published it and with what harness is like reading a comparative advertisement.

That said, 82.7% on Terminal-Bench 2.0 remains a strong signal: it’s 13 points above Opus 4.7 in OpenAI’s measure, and terminal agents are a real use case for many dev teams.

User feedback at 24 hours

On r/codex, r/ChatGPT, Hacker News, and X, the initial feedback is more technical than for GPT-5 in August 2025.

No wave of complaints about the model becoming cold this time: the early-adopter audience is mostly developers and power users, not the general public.

Three patterns emerge in the discussions.

Positive signal on short code loops.

Feedback on r/codex describes GPT-5.5 as cleaner on the first draft, with fewer correction cycles on targeted tasks: component implementation, isolated bug fix, scoped PR review.

Anthony Maio’s analysis, compiling threads from r/codex, r/hermesagent, and Hacker News, sums it up in one sentence: “if a model saves one or two correction loops every time you hand it a scoped task, you feel that immediately”.

Mixed signal on long-duration execution.

For repo-wide tasks, multi-file changes, or agents running for more than an hour, the community remains cautious.

The recurring complaint since GPT-5: fidelity to instructions degrades after 20 agent turns, and nothing yet confirms that GPT-5.5 clearly addresses this point.

Strong signal on speed and token efficiency.

Pietro Schirano, CEO of MagicPath, reported a branch merge with hundreds of frontend changes completed in about 20 minutes by GPT-5.5 on a main branch that had diverged significantly.

Other early-access partners (OpenAI announces nearly 200) describe the same thing: the model works faster on medium tasks and consumes fewer tokens for a result equivalent to GPT-5.4.

Developers using Codex daily don’t talk about a generational leap: they talk about a model that saves them 10 to 30% of time on their usual loops.

The honest assessment: measurable incremental gain, not a generational leap.

Codex: the real leap?

The most interesting part of the announcement lies in one word: Codex.

GPT-5.5 is deployed in Codex CLI, the VS Code extension, Codex Cloud, and the GitHub code review bot, with a clear usage profile: long-running agents.

Greg Brockman had already mentioned in September 2025 internal Codex sessions that ran up to seven hours on complex refactors, a capability other models hadn’t reached at the time.

With GPT-5.5, OpenAI emphasizes two axes.

Token efficiency: the model consumes significantly fewer tokens to complete the same Codex tasks, according to the OpenAI community page on April 23.

For a team paying for its API by volume, this is the most important metric: a more expensive model per token can cost less in use if it uses half as many tokens.

Long-duration execution: The logic of agentic coding pushes towards workflows where the model plans, executes, tests, corrects, and then submits a PR, without continuous human intervention, a field OpenAI detailed in its revamp of the Agents SDK harness and sandbox.

This is exactly the use case where Anthropic and OpenAI are really battling it out in 2026, and where classic benchmarks (SWE-Bench Verified) are starting to saturate.

The real test: can GPT-5.5 in Codex handle a 6-hour refactor without drifting?

The answer will be measured in weeks, not hours.

What remains behind

Several points deserve to be clearly highlighted before any enthusiasm.

The full API comes later.

Only GPT-5.5 Pro is listed with a public price of $30 / $180 per million tokens, and generalized API availability is staggered.

Teams building products on GPT-5.4 cannot switch instantly yet.

Safeguards are strengthened, with practical consequences.

OpenAI has subjected GPT-5.5 to a targeted red-teaming on cybersecurity and biology capabilities, as part of its Preparedness Framework.

A bio bug bounty has been opened with rewards up to $25,000, and access to certain capabilities is conditioned on a “trust-based access” system that may restrict legitimate uses for academic researchers or defensive security teams.

Free and Go plans are underserved.

Free users remain on GPT-5.4: no model change, no new ceiling.

Go users access GPT-5.5 Thinking with a strict quota of 10 messages per 5-hour period, making it more of a trial feature than a work tool.

The implicit message: GPT-5.5 is for those who pay at least Plus, and especially for those who pay Pro or more.

The gap with Claude Opus 4.7 remains contextual.

Opus 4.7 retains the advantage on SWE-Bench Pro (64.3% vs. 58.6%), on MCP-Atlas (79.1% vs. 75.3%), and on refactor tasks where understanding the intent behind the code is crucial.

GPT-5.5 takes the lead on Terminal-Bench 2.0, agentic tasks, and long-context retrieval.

It’s not a model that crushes everything: it’s a model that wins some categories and loses others.

Who should upgrade?

The choice depends on the usage profile, not a global score.

Developer coding daily in Codex.

Upgrading to ChatGPT Pro (€103/month) is the clearest decision: GPT-5.5 in Codex wins on Terminal-Bench 2.0, consumes fewer tokens, and handles long sessions.

If you’re on Plus, moving to Pro is justified if Codex represents more than 2-3 hours per day in your workflow.

Product team or agency.

The Business plan (€21/user/month) gives access to GPT-5.5 Thinking via the model selector, with centralized account management.

For a team of 5 people already using ChatGPT as an internal tool, upgrading to Business has a quick ROI: the quality of responses on long tasks offsets the cost per seat.

Curious user, mixed use (writing, research, prototyping).

Plus (€23/month) remains the best entry ticket: you get GPT-5.5 Thinking, GDPval at 84.9%, BrowseComp at 84.4%, and enough to seriously test the model on real cases.

For heavy coding tasks, complement with a Claude plan or stick with Opus 4.7 depending on your preferences.

Free user.

Nothing to do: you’re on GPT-5.4, the model doesn’t change, and upgrading to Go (€8/month) gives symbolic access to GPT-5.5 Thinking (10 messages / 5 h).

If you’re not currently limited by GPT-5.4, it’s not worth the cost.

For more on subscription decisions, our complete comparison of Claude vs ChatGPT subscriptions details prices, quotas, and use cases for each plan of both offerings.

Three questions to ask before upgrading

Are my current use cases limited by GPT-5.4, or am I just looking to test something new?
Do I use Codex more than an hour a day? If yes, Pro pays for itself.
Does my work depend more on the quality of the code produced (Opus 4.7) or on agentic execution speed (GPT-5.5)?

The real question isn’t “is GPT-5.5 better than GPT-5.4”.

The real question is “does GPT-5.5 change my workflow enough to justify a plan upgrade?”.

For most Plus users, the answer is no in the short term: the model is more capable, but the leap isn’t significant enough to warrant an immediate switch.

For devs living in Codex, the answer is yes, and the Pro upgrade is worth it this week.

FAQ

When was GPT-5.5 released?

OpenAI announced GPT-5.5 on April 23, 2026, with immediate deployment for Plus, Pro, Business, Enterprise, and Edu subscribers in ChatGPT and Codex.

Is GPT-5.5 available in the API?

Only GPT-5.5 Pro is publicly listed with an API price of $30 / $180 per million tokens (input/output), and generalized API availability for all developers is staggered in the weeks following the announcement.

What’s the difference between GPT-5.5 and GPT-5.5 Pro?

GPT-5.5 Pro uses a higher reasoning budget and targets the most difficult tasks: complex refactors, dense document analysis, multi-source research.

GPT-5.5 Pro gains 4.2 points on FrontierMath Tier 4 (39.6% vs. 35.4%) and 5.7 points on BrowseComp (90.1% vs. 84.4%) compared to standard GPT-5.5.

Is GPT-5.5 better than Claude Opus 4.7?

It depends on the task.

GPT-5.5 wins on Terminal-Bench 2.0 (82.7% vs. 69.4% according to OpenAI), on OSWorld-Verified, and on long-context retrieval.

Opus 4.7 retains the advantage on SWE-Bench Pro (64.3% vs. 58.6%), on MCP-Atlas, and on multi-file refactors where the quality of the code produced is paramount.

Do free users have access to GPT-5.5?

No: Free users remain on GPT-5.4.

Go users (€8/month) access GPT-5.5 Thinking with a quota of 10 messages every 5 hours.

What’s changing in Codex?

GPT-5.5 consumes significantly fewer tokens to accomplish the same tasks in Codex, with token latency equivalent to GPT-5.4.

The model also handles long sessions better, with documented agentic runs lasting several hours.

What are the strengthened safeguards?

OpenAI has subjected GPT-5.5 to targeted red-teaming on cybersecurity and biology capabilities, as part of its Preparedness Framework.

A bio bug bounty has been opened with rewards up to $25,000, and access to certain advanced capabilities is conditioned on a trust-based access system.

Should I upgrade to Pro now?

Yes, if you use Codex more than an hour a day and your workflow depends on long agentic runs.

No, if you’re a Plus user doing writing, research, and prototyping: the difference with GPT-5.4 exists, but doesn’t justify moving from €23 to €103 per month.

How many partners tested GPT-5.5 before release?

OpenAI reports collecting feedback on real use cases from nearly 200 early access partners before the public launch on April 23, 2026.

Is GPT-5.5 part of a shorter release cycle?

GPT-5.5 is released six weeks after GPT-5.4, confirming a rapid update pace at OpenAI since late 2025.

Direct competition with Anthropic (Opus 4.7 released on April 16, 2026) and Google (Gemini 3.1 Pro) largely explains this tightened publication cycle.

GPT-5.5: what’s really changing (official benchmarks + 24h feedback)