ChatGPT on Local: What you need to know before using it

Imagine being able to run an AI model as powerful as ChatGPT directly on your computer, without an internet connection, without usage limits, and without paying a cent. It seems too good to be true? Think again!

OpenAI has once again shaken up the world of AI by releasing GPT-OSS, its first open-source models since 2019.

After years of jealously guarding its creations behind paid APIs, OpenAI is making a U-turn with two revolutionary models: gpt-oss-20b and gpt-oss-120b.

These technological marvels run locally on your machine, rivaling the best proprietary models on the market, and the icing on the cake: they are completely free under the Apache 2.0 license.

Two Powerhouses for Your PC (or Mac)

The GPT-OSS models are no slouch. Built on a particularly clever Mixture-of-Experts architecture, they only use a fraction of their billions of parameters for each query. The result? Remarkable efficiency.

The GPT-OSS-20B has a total of 21 billion parameters but only activates 3.6 billion per token.

It’s like having a team of 32 experts per layer, but only consulting the 4 most relevant ones each time. With its 24 layers and a context of 128,000 tokens, it fits comfortably in 16 GB of RAM.

Its big brother, the GPT-OSS-120B, takes the concept even further with its 117 billion parameters (5.1 billion activated per token).

Its 36 layers each have 128 experts, always with the same approach: only wake up those who have something useful to say. You’ll need about 80 GB of RAM to run it.

The most impressive part? These giants are quantized in 4-bit MXFP4, which divides their memory requirements without sacrificing performance.

OpenAI has really thought of everything to make AI accessible to as many people as possible.

An Architecture That Doesn’t Cut Corners

Under the hood, GPT-OSS hides fascinating technical innovations. Each layer juggles between global attention and a sliding window of 128 tokens, offering both an overview and optimized calculations.

To handle long contexts without exploding memory, the engineers added “attention sinks” (mechanisms that prevent congestion when tokens leave the active window).

Positional encoding uses RoPE (Rotary Positional Embedding), a proven technique that allows the model to naturally understand the position of words in very long texts.

All of this works with an optimized prompt format called “Harmony“, which improves user interaction.

In short, GPT-OSS takes all the expertise accumulated by OpenAI on its internal models o1, o3, and o4-mini, but in a version you can install at home.

Performance That Makes the Competition Pale

On paper, the GPT-OSS figures are staggering. The 120B model matches or surpasses OpenAI’s o4-mini on most academic benchmarks.

See our latest comparison of OpenAI models: Choosing the Best ChatGPT Model for Your Projects in 2025: Complete Guide and Comparison

In mathematics (AIME 2024/2025), complex problem-solving (MMLU, Humanity’s Last Exam), or even health questions (HealthBench), it holds its own against its proprietary cousin.

The 20B model, although more modest, surpasses o3-mini on these same tests. Not bad for a model you can run on a decent gaming PC!

But the most impressive part is the execution speed. The GPT-OSS-120B spits out its first token in about 8 seconds, then maintains a breakneck pace of 260 tokens per second.

For comparison, o3-mini tops out at 158 tok/s under the same conditions. In other words, GPT-OSS costs ten times less to run while being faster.

Three Thinking Speeds to Choose From

Like its cousins in the “o” series, GPT-OSS offers three reasoning modes: Low, Medium, and High. In fast mode, it prioritizes fluidity for casual discussions.

In intensive mode, it deploys long internal reasoning chains before responding, perfect for complex analyses.

Just write “Reasoning: high” at the beginning of your prompt to trigger genius brain mode. Handy when you switch from a relaxed conversation to a math problem that makes you sweat!

The Superpowers (and Minor Weaknesses) of GPT-OSS

What Really Shines

GPT-OSS excels in everything that requires pure logic. Mathematics, programming, science.

It unfolds its thought process step by step with crystal clarity. Its STEM-focused training gives it impressive rigor on structured problems.

On the tool side, it natively integrates the ability to call external functions.

With LM Studio or Ollama, you can create local assistants that execute code, search the web, or manipulate your files, all without leaving your machine.

The inference speed deserves special mention. On modern hardware, the experience is almost real-time.

Even the 20B model remains fluid on a well-equipped PC with Apple Silicon or a 16-32 GB GPU.

And then there’s the total freedom offered by the Apache 2.0 license. No restrictions, no royalties, no surveillance.

You download, modify, deploy wherever you want. This technological sovereignty feels good after years of dependence on APIs!

The Minor Drawbacks

GPT-OSS grew up mainly with scientific English. Its French is correct, but it can sometimes lack naturalness on literary or cultural subjects.

Its general knowledge is also frozen in time; without web access, it is unaware of events after 2025.

In terms of security, total freedom comes at a price. Without the proprietary safeguards of ChatGPT, these models require more vigilance on sensitive queries. (Handle with care in production!)

Finally, creatives will note that GPT-OSS favors logic over inspiration. It excels in reasoning but may lack that spark on purely artistic tasks.

It’s the price of performance-focused training rather than creativity.

Where GPT-OSS Will Change the Game

The practical applications are endless. In development, imagine an ultra-responsive code assistant running locally in your IDE. No more waiting or worrying about quotas; you code, it helps, end of story.

In research and finance, confidentiality finally becomes possible. Analyze your sensitive data without it ever leaving your server.

For regulated sectors like healthcare or legal, it’s revolutionary.

Education will also be transformed. Teachers and students can finally experiment with powerful AI without budget constraints. No more quota limits in the middle of a project!

Quick Installation with LM Studio

Ready to test? LM Studio makes installation surprisingly simple:

Prerequisites

16 GB of RAM minimum for the 20B model (13 GB is actually enough), 64-80 GB for the 120B. A recent GPU or an Apple Silicon Mac is even better.

Installation

Download LM Studio, launch it, type “gpt-oss” in the search, click download.

Once installed, you can adjust the context window, choose your reasoning level, and start chatting.

LM Studio even exposes a local API compatible with ChatGPT – perfect for integrating GPT-OSS into your existing applications.

For more info, read our article Install DeepSeek-R1 Locally with LM Studio: Complete Guide

Local AI vs Cloud

Team Local

Absolute confidentiality, controlled costs after the initial investment, total freedom of customization. Your data stays with you, period.

The Challenges

You need powerful hardware (especially for the 120B), maintenance, and a minimum of technical skills. Managing your AI infrastructure is more complex than a simple API call.

But frankly, when you see the power and freedom offered, it’s well worth the effort. Especially since the ecosystem is already organizing with turnkey solutions (Azure, Hugging Face, Together AI) to simplify deployment.

High-Performance Open Source AI, Now a Reality

GPT-OSS marks a historic turning point. For the first time in years, OpenAI is breaking down its own barriers and offering the world completely free commercial-level models.

It’s a bold bet that could redefine the entire AI ecosystem in 2025 and beyond.

In a few months, your smartphone could have the equivalent of ChatGPT in offline mode. Your businesses could develop custom AI solutions without depending on any cloud service. Emerging countries would finally have access to the same tools as the tech giants.

Of course, GPT-OSS is not perfect. Its general understanding remains less vast than a GPT-4, and it still struggles with creative aspects.

But it meets a fundamental need: that of powerful and secure AI, transparent and truly accessible.

OpenAI now combines open and proprietary models in a win-win strategy. While their APIs maintain the advantage in multimodal and the latest innovations, GPT-OSS democratizes basic AI for everyone.

The result? Innovation will explode at the application layer. When everyone has access to the same AI foundations, it’s the creativity of the developers that makes the difference.

And that’s exactly what AI needed to shift into high gear!

One thing is certain: with GPT-OSS, artificial intelligence has just taken a decisive step towards greater openness and accessibility.

OpenAI’s GPT-OSS: ChatGPT finally free

Two Powerhouses for Your PC (or Mac)

An Architecture That Doesn’t Cut Corners

Performance That Makes the Competition Pale

Three Thinking Speeds to Choose From

The Superpowers (and Minor Weaknesses) of GPT-OSS

What Really Shines

The Minor Drawbacks

Where GPT-OSS Will Change the Game

Quick Installation with LM Studio

Prerequisites

Installation

Local AI vs Cloud

Team Local

The Challenges

High-Performance Open Source AI, Now a Reality

AI NEWSLETTER

Leave a Comment Cancel Reply

CHATGPT prompt guide (EDITION 2024)

Similar posts

Choosing the best ChatGPT model for your projects in 2025 : Complete guide and comparison

GPT-4.1 : How to create the perfect AI agent prompt ?

GPT-o3 and o4-mini : A major step towards AGI