Gemma 4: Apache 2.0 license, benchmarks, open source

On April 2, 2026, Google released Gemma 4 with a change that made more waves than the benchmarks themselves: the move to Apache 2.0 licensing.

This technical detail is actually a major strategic decision.

Since 2024, Gemma models had been distributed under a proprietary permissive license, with restrictions on commercial redistribution that kept Google in control.

Apache 2.0 means full ownership: download, modify, redistribute, commercialize.

No royalties.

No permission needed.

Key takeaways:

Apache 2.0 turns Gemma 4 into true open source: free commercial use, risk-free forking, GDPR-compliant on-premise deployment
The 26B MoE model activates only 3.8B parameters during inference: performance close to the 31B dense model with a fraction of the GPU resources
AIME 89.2% and Codeforces ELO 2150 for the 31B: a generational leap from Gemma 3, not just an incremental improvement
For French SMEs: a 26B quantized runs on RTX 4090 for under 2,000 euros, without cloud subscription, without sending data to the US
Google’s strategy is clear: Gemini for paid cloud, Gemma for edge and open source — both strengthen each other
Mistral remains relevant for those focusing on European sovereignty, but Gemma 4 redefines the performance/cost ratio in the open-source segment

Gemma 4 in 30 seconds

Four models, a native multimodal architecture, an Apache 2.0 license with no commercial restrictions.

Gemma 4 marks the first time Google distributes a model of this scale under Apache 2.0, without hidden restrictive clauses in the terms and conditions.

The adoption figures for Gemma 3 speak for themselves: 400 million downloads and 100,000 variants created by the community on Hugging Face.

These numbers reveal something important: a quality open-source model generates an independent community that Google no longer controls, yet directly benefits from in terms of reputation and adoption.

With Gemma 4, Google bets that total openness accelerates adoption better than any license restriction.

Four models from smartphone to data center

The Gemma 4 range covers a wider hardware spectrum than any open-source model has before.

Model	Total / Active Parameters	Architecture	Context Window	Target Hardware
E2B	~2B / ~2B	Dense	128K tokens	Smartphones, Raspberry Pi
E4B	~4B / ~4B	Dense	128K tokens	High-end smartphones, Jetson Nano
26B MoE (A4B)	26B / 3.8B	MoE 128 experts	256K tokens	Consumer GPUs (quantized)
31B dense	31B / 31B	Dense	256K tokens	High-end GPU/CPU

E2B and E4B explicitly target mobile devices and edge computing: offline translation, on-device code generation, health applications where data must not leave the device.

All models are natively multimodal: text, image, audio, and video on smaller models, text and image on larger ones.

Support for 140 languages is built-in, positioning Gemma 4 as a serious contender for international applications without intermediate translation.

Moving from a text model to one that understands image, audio, and video on a smartphone is the difference between an assistant that reads and one that perceives.

The real game-changer: Apache 2.0

AI licenses are rarely exciting to analyze unless they fundamentally change something.

This is one of those cases.

What the Apache 2.0 license changes

The old Gemma license allowed personal and research use but restricted commercial redistribution: some usage categories were explicitly approved, others not.

The result: many companies avoided Gemma out of legal caution, even for uses that would likely have been accepted.

Apache 2.0 removes this ambiguity: free commercial use, modification, and redistribution without restriction, mandatory inclusion of the license and attribution, automatic patent protection for contributors.

For an IT service or a corporate lawyer, it’s the difference between “probably OK” and “certified OK”.

The most accurate analogy: moving from leasing to full vehicle ownership.

You can modify it, resell it, adapt it into a commercial service, without asking the manufacturer’s permission.

The end of Google’s faux open source

Google has a long history of “open” models with restrictions that made them less free than their competitors.

Gemma 1 and 2 were usable but not comparable to Meta Llama’s license, which was itself criticized but more commercially permissive.

With Apache 2.0, Google places itself on the same playing field as Mistral and Qwen: models that companies can truly adopt without residual legal risk.

It’s also a signal to the Hugging Face community: Gemma 4 is designed to be forked, quantized, fine-tuned, integrated into commercial products without friction.

An Apache 2.0 license on a model of this size is Google telling the entire open-source community: “Do what you want with it, we win every time you use it.”

Performance and benchmarks under scrutiny

The numbers published by Google are impressive.

The pertinent question: understanding what they mean in practice.

Leaps worth noting

On AIME 2026 (a high-level mathematics competition): Gemma 3 27B scored 20.8%, Gemma 4 31B reaches 89.2%.

This isn’t an incremental improvement: it’s a category shift.

On Codeforces (a programming competition), the ELO score jumps from 110 to 2,150 for the 31B, a level comparable to seasoned professional developers.

Benchmark	G4 31B	G4 26B MoE	G4 E4B	G3 27B
MMLU-Pro	85.2%	82.6%	69.4%	67.6%
AIME 2026	89.2%	88.3%	42.5%	20.8%
GPQA Diamond	84.3%	82.3%	58.6%	42.4%
LiveCodeBench v6	80.0%	77.1%	52.0%	29.1%
Codeforces ELO	2,150	1,718	940	110

Limitations to keep in mind: these benchmarks are self-reported by Google.

The Hacker News community immediately launched its own evaluations, and preliminary results confirm the hierarchy, though they don’t always match the exact figures on real-world tasks.

The 26B MoE at 88.3% on AIME with only 3.8B active parameters is the most striking result: it outperforms much heavier dense models in inference.

Comparison table with competitors

Model	License	Native Multimodal	On-device	Max Context	MoE	MMLU-Pro (approx.)
Gemma 4 31B	Apache 2.0	Yes	Yes (E2B/E4B)	256K	Yes (26B A4B)	85.2%
Llama 4	Llama License	Yes	Partial	128K	Yes	~84%
Qwen 3.5	Apache 2.0	Partial	No	128K	Yes	~86.7%
Mistral Small 4	Apache 2.0	No	No	32K	Yes (Mixtral)	Lower

Gemma 4 dominates on three simultaneous dimensions: free license, native multimodality, edge deployment.

Qwen 3.5 slightly surpasses it on pure benchmarks, but lacks on-device capability and has partial multimodal support.

The intelligence-to-parameter ratio: why MoE changes everything

The Mixture of Experts has been around for several years, and Gemma 4 makes the most effective use of it at this performance level.

The principle: the 26B total model contains 128 specialized experts, but activates only 8 on average per token processed, resulting in about 3.8B active parameters per inference.

The clearest analogy: imagine a consultancy with 128 experts, of which only 8 are called upon for each question, depending on their specialty.

The result: performance close to the 31B dense model, with a memory footprint and inference speed akin to a 3.8B model.

On Arena AI, the 26B MoE ranks among the top 6 open-source models, outperforming dense models two to three times heavier to run.

A well-designed MoE not only reduces costs: it changes the category of hardware needed to achieve frontier performance.

Google’s strategy: Gemini sells the cloud, Gemma sells adoption

The question everyone asks: why does Google give away a model that competes with its own Gemini APIs?

The answer lies in a platform logic.

Gemini remains proprietary, accessible via subscription or paid API, optimized for heavy tasks requiring Google’s data center power.

Gemma covers the edge, local, on-premise: use cases where sending data to the cloud is impossible (GDPR compliance, latency, cost, offline).

Both ranges share the same fundamental research: Gemini Nano 4 on Android and Pixel chips use the same base as Gemma E2B/E4B.

Every developer adopting Gemma learns the Google patterns, gets used to the platform’s APIs and tools, and becomes a natural candidate for Gemini cloud when their needs exceed what local can handle.

It’s the Android Trojan horse strategy: dominate the edge to be present everywhere, even where the cloud can’t go.

Agent capabilities: native function calling and JSON

Gemma 4 natively integrates function calling, structured JSON generation, and complex system prompts management.

These three capabilities form the building blocks of autonomous AI agents: calling external functions, producing outputs directly usable by code, maintaining a long context.

A model with a 256K token window can orchestrate multi-step workflows without additional infrastructure.

Integrations like the Java ADK for agents (Android Developer Kit) show that Google is preparing Gemma 4 to be the engine of local agents on Android: assistants that act on your phone without ever sending requests remotely.

For French-speaking developers, the 31B is clearly the agent choice if the hardware allows.

The 26B MoE is the best performance/accessibility compromise for the vast majority of cases.

And for the French? Sovereignty and Mistral

The issue of digital sovereignty has become central for French and European companies since the GDPR.

Gemma 4 under Apache 2.0 addresses this constraint in a way that cloud APIs cannot: data never leaves the company’s infrastructure.

A 26B MoE model quantized in INT4 runs on an RTX 4090, a card available in work configurations for under 2,000 euros.

For a startup or SME handling medical, legal, or financial data, it’s the difference between an impossible AI deployment (sensitive data, GDPR) and an immediately feasible one.

Mistral remains relevant for those who want to support a European champion, and its French roots can weigh in public or semi-public purchasing decisions.

Technically, Mistral Small 4 doesn’t compete with Gemma 4 26B MoE on general benchmarks, multimodal, or context window.

Both can coexist in a technology portfolio: Gemma 4 for reasoning-intensive and multimodal tasks, Mistral for cases where server geographic locality and supporting the European sector are paramount.

How to test Gemma 4 now

The fastest way to test locally is through Ollama, available on Mac, Linux, and Windows.

Three commands are enough:

Installation: brew install --cask ollama (or download from ollama.com)
Download: ollama pull gemma4:26b (choose e2b, e4b, or 31b depending on hardware)
First test: ollama run gemma4:26b "Explain MoE in one sentence"

The E4B is suitable for any recent Mac with Apple Silicon, the 26B MoE requires at least 16 GB of VRAM when quantized, the 31B dense needs 24 GB or more.

Direct integrations exist in LM Studio, Jan, and usual Python frameworks (LangChain, LlamaIndex) via Ollama’s OpenAI-compatible endpoints.

The Anthem blog will follow Gemma 4’s adoption in the French-speaking community: feedback, real use cases, field comparisons.

If you test Gemma 4 locally, share your feedback in the comments: which model, which hardware, which use case.

Our verdict

Gemma 4 isn’t the best open-source model on every individual benchmark.

It’s the most comprehensive open-source model of 2026: the only range to simultaneously cover the smartphone, consumer GPU, and server, with an unambiguous license, native multimodality, and integrated agent capabilities.

The leap from Gemma 3 to Gemma 4 is the most significant Google has ever made in a generation of open models.

For French developers and companies, the equation is simple: Apache 2.0 + on-premise + 256K context + native multimodal solves compliance issues that blocked entire projects.

The central question is no longer “can we use Gemma 4?”: it’s “for which projects to choose which model from the range?”

FAQ

Is Gemma 4 truly open source with Apache 2.0?

Yes: Apache 2.0 allows commercial use, modification, and redistribution without restriction, with only the obligation to include the license and attribution.

It’s a recognized standard understood by corporate legal teams.

What’s the difference between Gemma 4 and Gemini?

Gemini is Google’s proprietary model, accessible via paid API and cloud.

Gemma 4 is open source, designed to run locally or on-premise.

Both share the same fundamental research but target distinct use cases.

Which Gemma 4 model should I choose based on my hardware?

E2B/E4B: smartphones and edge devices. 26B MoE: consumer GPU with a minimum of 16 GB VRAM (RTX 3090/4090). 31B dense: servers or configurations with 24+ GB VRAM for the most demanding tasks.

Is Gemma 4 better than Llama 4?

Both are competitive on general benchmarks.

Gemma 4 wins on on-device deployment (E2B/E4B), the 256K vs 128K context window, native multimodality, and the clarity of the Apache 2.0 license.

Llama 4 retains advantages on several reasoning benchmarks and benefits from a more mature community.

Does the 26B MoE consume as little as 3.8B parameters in inference?

In inference, yes: only 3.8B parameters (8 experts out of 128) are activated per token.

The model does load 26B in memory, but the computational cost per token corresponds to a 3.8B model.

This explains its speed and energy efficiency.

Can I use Gemma 4 for a commercial application without paying Google?

Yes.

Apache 2.0 imposes no royalties or commercial restrictions.

You can integrate Gemma 4 into a commercial product, modify it, fine-tune it, and redistribute it without any prior agreement with Google.

Does Gemma 4 comply with GDPR better than cloud APIs?

A local or on-premise deployment of Gemma 4 means your data never leaves your infrastructure.

It’s inherently easier to justify in a GDPR impact assessment than a transfer to servers in the US, regardless of the cloud provider.

Does Gemma 4 pose a serious threat to Mistral?

On technical benchmarks and versatility, Gemma 4 26B MoE surpasses Mistral Small 4 on most dimensions.

Mistral retains strategic advantages: European roots, local teams, relevance for French and European public markets.

Both have their place depending on the context.

Does Gemma 4 natively support audio and video?

The E2B and E4B models natively support text, image, audio, and video.

The 26B MoE and 31B dense focus on text and image.

This is Google’s first range to integrate audio and video on models designed to run on smartphones.

How does Gemma 4 handle 140 languages?

Multilingual support is integrated into training, not added post-processing.

This means better consistency and nuance in languages other than English, including French.

Initial evaluations confirm a significantly higher FR level than Gemma 3.

Gemma 4: Google adopts Apache 2.0, reshaping open-source AI