OpenAI's GPT-5.6 Sol beats Claude on coding benchmarks — but it's locked to US-approved partners for now

June 27, 2026

OpenAI released its GPT-5.6 family of models on Saturday, but most developers won’t get access anytime soon. The company is holding back public availability while it complies with US government requirements that limit the initial rollout to a small group of “trusted partners.”

The lineup comes in three tiers. Sol is the flagship — $5 per million input tokens and $30 per million output tokens. Terra sits in the middle at $2.50 and $15. Luna, built for speed and cost, costs $1 and $6 at the same volumes. OpenAI also improved its prompt caching system, making repeated tokens cheaper and more predictable.

On the benchmarks, Sol delivered the strongest results. It scored 88.8% on Terminal-Bench 2.1 in standard mode, edging past Claude Mythos 5 at 88.0%. With the new Ultra mode — which uses sub-agents to break down complex tasks — Sol hit 91.9%. OpenAI also introduced a Max reasoning intensity setting that lets users trade compute for deeper analysis on hard problems.

Model	Input ($/1M tokens)	Output ($/1M tokens)	Terminal-Bench 2.1
Sol	5.00	30.00	88.8% (91.9% Ultra)
Terra	2.50	15.00	—
Luna	1.00	6.00	—

GPT-5.6 Sol benchmark performance

The improvements go beyond coding. On GeneBench v1, a biology benchmark, Sol used fewer tokens than GPT-5.5 while returning better results. For researchers running large-scale bioinformatics workloads, that translates into a meaningful efficiency gain.

In cybersecurity, the model showed real gains on long-chain security tasks. On ExploitBench, Sol produced results close to Mythos Preview while using roughly a third of the output tokens. That’s a useful improvement for vulnerability researchers working through multi-step exploits.

Security benchmark comparison

Safety is handled through a layered system. Each model includes built-in refusal mechanisms, a real-time classifier on generated output, account-level risk screening, differential access controls, monitoring, and enforcement. When the system detects a high-risk situation, it pauses generation and passes the content to a larger reasoning model for review. If a violation is confirmed, the content is blocked before the user sees it.

Safety architecture diagram

OpenAI said it plans to make GPT-5.6 Sol, Terra, and Luna publicly available in the coming weeks. Separately, Sol will land on Cerebras hardware in July, where it will run at up to 750 tokens per second. That deployment will start with a limited set of customers.