The Fastest LLLMs Ever Built

Diffusion LLMs: A Breakthrough for Speed and Quality

By using Mercury, you agree to our Terms of Use and have read our Privacy Policy

Powering Cutting-Edge AI Applications

The Mercury Diffusion Models

Blazing fast inference
with frontier quality
at a fraction of the cost.

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

AI Applications Made Possible
with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

AI Applications Made Possible
with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

AI Applications Made Possible with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

Our Models

Mercury Coder

dLLM optimized to accelerate coding workflows

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

More Information

Mercury

General-purpose dLLM that provides ultra-low latency

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

More Information

An Enterprise AI Partner

We’re available through major cloud providers like AWS Bedrock. Talk with us about fine-tuning, private deployments, and forward-deployed engineering support.

Get in Touch