Muse Spark: 1000 physicians to transform health AI

On April 8, 2026, Meta unveiled Muse Spark, the first model from its Meta Superintelligence Labs. After nine months of complete reconstruction and a shift towards proprietary technology, this native multimodal model marks a clear break from the Llama lineage. But what truly stands out is its novel approach to health reasoning, built with the expertise of over 1,000 physician curators. It’s a technical and editorial gamble that few labs have dared to take.

Key takeaways:

Muse Spark is Meta’s first proprietary model, launched on April 8, 2026, after 9 months of development from scratch.
Its Contemplating mode orchestrates several sub-agents in parallel, not just a simple sequential thought chain.
Over 1,000 physicians curated the health training data, a unique approach not directly matched by OpenAI or Google.
With 58 million output tokens compared to 120M for GPT-5.4, it achieves more with less computation.
It will be deployed as the unique brain of Meta AI across Facebook, Instagram, WhatsApp, and Ray-Ban Meta glasses.

What Muse Spark truly changes in Meta’s AI

Before Muse Spark, Meta was all in on Llama, its open-source model. The result: an 18th place in the Artificial Analysis ranking, far behind OpenAI, Google, and Anthropic. Mark Zuckerberg decided to start from scratch. Alexandr Wang, reappointed as head of Meta Superintelligence Labs, led a complete overhaul in nine months: new architecture, new infrastructure, new datasets.

The result is Muse Spark. Proprietary, initially closed, and natively multimodal: it accepts text, images, and voice from the start. Outputs remain textual at launch, but audio and image extensions are already on the roadmap. This model is not just a chatbot — it is designed as an agent engine for the entire Meta ecosystem.

In practice, Muse Spark now powers Meta AI on the app and site meta.ai in the US, with a gradual rollout planned for Facebook, Instagram, Messenger, WhatsApp, and Ray-Ban Meta glasses. For French users, the launch is expected in the weeks following the US release.

Tip: If you’re a developer, keep an eye on the Muse Spark API preview. Meta has confirmed a future partial opening of the model, which could enable business integrations in the coming months.

The Contemplating architecture: why multi-agents change everything

The major technical innovation of Muse Spark lies in its Contemplating mode. While other models produce a single, sequential thought chain, Muse Spark orchestrates multiple sub-agents in parallel, then merges their conclusions. Each sub-agent independently tackles an aspect of the problem, reducing isolated reasoning errors.

A concrete example: to plan a trip with young children, one sub-agent calculates routes, another analyzes suitable activities, and a third checks budget constraints. The responses are cross-verified before being delivered. This cross-verification mechanism did not exist in Llama.

This architecture natively supports three key capabilities:

Visual Chain-of-Thought: logical reasoning from an image, not just a description.
Tool invocation: access to the web, code execution, calculators, directly from the model.
Multi-agent orchestration: multiple sub-agents working simultaneously on a single complex query.

On the Humanity’s Last Exam benchmark, Muse Spark scores 42.8% correct answers. It’s just behind Gemini 3.1 Pro (45.4%), but ahead of GPT-5.4 in the specific medical field (HealthBench Hard: 42.8 vs 40.1 for GPT-5.4). The Thinking version excels in scientific tests, surpassing premium versions of Anthropic and OpenAI on these targeted benchmarks.

The other numerical advantage: 58 million output tokens compared to 120 million for GPT-5.4 and 157 million for Claude Opus 4.6. This frugality is no accident — it results from constraint-based training that penalizes overly long reasoning. The model learns to be precise, not verbose.

Dozens of distinct luminous streams in cool blue and teal tones converging into a single radiant focal point, each strea…

1,000 physician curators: how Meta secured health reasoning

This is one of the most distinctive aspects of Muse Spark. To train the model on health issues, Meta did not use unsupervised web scraping. The company collaborated with over 1,000 physicians to validate and certify specific medical data. These data cover symptom analysis, nutritional recommendations, and meal evaluation via photo.

An operational example: photograph your plate, Muse Spark identifies the foods, estimates caloric and nutritional intake, and contextualizes this data according to your profile. This type of analysis goes far beyond the capabilities of models that simply describe an image without extracting structured reasoning.

Meta is aware of regulatory limits. The model is designed for factual advice on personal health, not for medical diagnosis. The certified data serve to anchor responses in verified information, reducing the risk of hallucinations on sensitive topics. It’s an approach that anticipates the requirements of the European AI Act on high-risk systems.

This medical curation approach is also a differentiating argument against GPT-5.4 and Gemini 3.1. On the HealthBench Hard benchmark, Muse Spark scores 42.8 compared to only 20.6 for Gemini 3.1 Pro. The gap is significant and shows the direct impact of specialized human curation on the quality of medical reasoning.

Model	HealthBench Hard	Humanity’s Last Exam	Output Tokens
Muse Spark	42.8	42.8%	58M
GPT-5.4	40.1	Slightly higher	120M
Gemini 3.1 Pro	20.6	45.4%	Not disclosed
Claude Opus 4.6	Not disclosed	Lower (Thinking)	157M

Visual multimodality: see, classify, reason

Muse Spark’s visual perception goes beyond mere object recognition. The model identifies entities in an image, associates external knowledge, and reasons logically about what it sees. This is what Meta calls the Visual Chain-of-Thought.

A practical use case: take a photo of an unknown appliance. Muse Spark analyzes its components, identifies the likely model, and generates a structured user manual. No physical manual needed. This capability is directly usable via Ray-Ban Meta glasses, where the integrated camera transmits a real-time feed to the model for immediate contextual analysis.

This hardware integration is strategic. The glasses become a terminal for augmented reasoning: you look at a supermarket shelf, Muse Spark analyzes nutritional labels and advises you in real-time. You meet someone on the street, the model can contextualize the visual environment to enrich a conversation.

Warning: Muse Spark currently only produces textual outputs, despite multimodal inputs. Image or audio generation is planned for a future version. Do not confuse input capabilities (text, image, voice) with output capabilities, which are still limited to text at launch.

Futuristic smart glasses floating in mid-air, their lenses projecting layered translucent medical diagnostic overlays — …

Market positioning and known limitations

Muse Spark holds the 4th place in the Artificial Analysis Intelligence Index, with a score of 52. It surpasses Claude Sonnet 4.6 but remains behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. This positioning is fair: performant, not dominant.

Feedback from developers is mixed on one specific point: access to tools remains more limited than with OpenAI or Anthropic. Tool-use is present in the architecture, but third-party integrations (Python libraries, Zapier, etc.) are not yet publicly documented. The API preview exists, without confirmed pricing at this stage.

Another structural limitation: Muse Spark is natively optimized for the Meta ecosystem. Its contextualization relies on the social graph, user interests, Instagram, and Facebook content. Outside this ecosystem, the added value diminishes. It’s a strength for Meta’s 3 billion users, a constraint for independent professional uses.

Meta is investing between $115 and $135 billion in AI infrastructure spending for 2026, compared to $72.22 billion in 2025. Muse Spark is the first milestone of a lineage, not a finished product. Larger models are already in development at the Meta Superintelligence Labs. To keep up with the latest advances in multimodal AI agents, the article on Genspark AI and intelligent super-agents provides useful additional insights.

Conclusion

Muse Spark represents a clear turning point in Meta’s AI strategy. The abandonment of Llama in favor of a proprietary model built from scratch, medical curation by 1,000 physicians, and parallel multi-agent architecture form a combination that neither OpenAI, Google, nor Anthropic have replicated to date in the same configuration. The HealthBench Hard score of 42.8 compared to 20.6 for Gemini on medical reasoning speaks for itself.

There are limitations: outputs are still textual, initial availability is restricted to the US, and dependence on the Meta ecosystem. But with 3 billion potential users on Facebook, Instagram, and WhatsApp, and an imminent rollout on Ray-Ban glasses, Muse Spark has the potential to become the most used AI model in daily life — not necessarily the most powerful on all benchmarks, but the most integrated into real-world uses. The real competition starts now.

FAQ

What exactly is Muse Spark?

Muse Spark is a large multimodal language model developed by Meta and officially launched on April 8, 2026. It accepts text, image, and voice inputs, currently produces textual outputs, and powers Meta AI across the group’s platforms. It’s Meta’s first proprietary model, developed by the Meta Superintelligence Labs under Alexandr Wang’s leadership, following the abandonment of the open-source Llama strategy.

How does Muse Spark’s Contemplating mode work?

The Contemplating mode orchestrates multiple sub-agents in parallel rather than a single sequential thought chain. Each sub-agent addresses a specific angle of a complex query, then their conclusions are merged to produce a verified and more reliable response. This mechanism is particularly effective for questions in science, mathematics, or health, where single reasoning errors are common.

Why did 1,000 physicians participate in the training?

Meta collaborated with over 1,000 physicians to certify the medical data used during Muse Spark’s training. The goal is to ensure factual responses on sensitive topics like symptom analysis or nutritional recommendations, limiting the risk of hallucinations. This approach is directly reflected in the benchmarks: Muse Spark scores 42.8 on HealthBench Hard, compared to 20.6 for Gemini 3.1 Pro.

Is Muse Spark available in France?

At the official launch on April 8, 2026, Muse Spark is deployed exclusively in the United States via the Meta AI app and site. Expansion to other countries, including France, is planned in the following weeks, with a gradual integration on Instagram, Facebook, Messenger, WhatsApp, and Ray-Ban Meta glasses. No specific date has been announced for the French market.

What are the main limitations of Muse Spark today?

Muse Spark has three main limitations at launch: its outputs are only textual despite multimodal inputs, its availability is restricted to the United States, and its access to third-party tools remains more limited than its direct competitors. Its optimal effectiveness heavily depends on the Meta ecosystem, which can reduce its added value for professional uses independent of this environment.

How Muse Spark is transforming health reasoning with 1000 physician curators

What Muse Spark truly changes in Meta’s AI

The Contemplating architecture: why multi-agents change everything

1,000 physician curators: how Meta secured health reasoning

Visual multimodality: see, classify, reason

Market positioning and known limitations

Conclusion

FAQ

What exactly is Muse Spark?

How does Muse Spark’s Contemplating mode work?

Why did 1,000 physicians participate in the training?

Is Muse Spark available in France?

What are the main limitations of Muse Spark today?

Related Articles

Seedance 2.0: The video generation model shaking up the film industry

Mistral offers 22 ideas to make Europe the AI leader

Ready to scale your business?

Encore quelques questions ?