The Fastest LLLMs Ever Built

The Fastest LLLMs Ever Built

The Fastest LLLMs Ever Built

Diffusion LLMs: A Breakthrough for Speed and Quality

By using Mercury, you agree to our Terms of Use and have read our Privacy Policy

Powering Cutting-Edge AI Applications

The Mercury Diffusion Models

The Mercury Diffusion Models

Blazing fast inference
with frontier quality
at a fraction of the cost.

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

The Diffusion Difference From Sequential to Parallel Text Generation

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

See the difference

How it works

AI Applications Made Possible
with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Search

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

AI Applications Made Possible
with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Search

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

AI Applications Made Possible with Mercury

Coding

Stay in the zone with lightning-fast autocomplete, tab suggestions, editing, and more.

Voice

Deliver responsive voice experiences in customer service, translation, sales.

Search

Instantly surface relevant data from any knowledge base, minimizing research time.

Agents

Run complex multi-turn systems while maintaining low latency.

Our Models

Our Models

Mercury Coder

Mercury Coder

dLLM optimized to accelerate coding workflows

dLLM optimized to accelerate coding workflows

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

Mercury

Mercury

General-purpose dLLM that provides ultra-low latency 

General-purpose dLLM that provides ultra-low latency 

Streaming, tool use, and structured output

128K context window

Input $0.25 | Output $1 per 1M tokens

An Enterprise AI Partner

We’re available through major cloud providers like AWS Bedrock. Talk with us about fine-tuning, private deployments, and forward-deployed engineering support.

Integrate in Seconds

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.


Providers

What Customers are Saying

What Customers are Saying

"I was amazed by how fast it was. The multi-thousand tokens per second was absolutely wild, nothing like I've ever seen."

Jacob Kim

Software Engineer

"After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents."

Oliver Silverstein

CEO

"We cut routing and classification overheads to sub-second latencies even on complex agent traces."

Damian Tran

CEO