Diffusion LLMs: A Breakthrough for Speed and Quality
By using Mercury, you agree to our Terms of Use and have read our Privacy Policy
Powering Cutting-Edge AI Applications
Blazing fast inference
with frontier quality
at a fraction of the cost.

Streaming, tool use, and structured output
128K context window
Input $0.25 | Output $1 per 1M tokens

Streaming, tool use, and structured output
128K context window
Input $0.25 | Output $1 per 1M tokens
An Enterprise AI Partner
We’re available through major cloud providers like AWS Bedrock. Talk with us about fine-tuning, private deployments, and forward-deployed engineering support.
Integrate in Seconds
Providers

"I was amazed by how fast it was. The multi-thousand tokens per second was absolutely wild, nothing like I've ever seen."
Jacob Kim
Software Engineer

"After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents."
Oliver Silverstein
CEO

"We cut routing and classification overheads to sub-second latencies even on complex agent traces."
Damian Tran
CEO