
Introducing a New
Generation of Language Models
Inception’s diffusion-based approach to language generation is inspired by advanced AI systems for images and video like Midjourney and Sora and provides unprecedented speed, quality, and generative control.
Introducing a New
Generation of Language Models
Inception’s diffusion-based approach to language generation is inspired by advanced AI systems for images and video like Midjourney and Sora and provides unprecedented speed, quality, and generative control.
Introducing a New
Generation of Language Models
Inception’s diffusion-based approach to language generation is inspired by advanced AI systems for images and video like Midjourney and Sora and provides unprecedented speed, quality, and generative control.
Inception
Introducing the first diffusion large language models
Our
diffusion
large
language
models
(dLLMs)
provide
1
unparalleled
speed,
2
improved
efficiency, and
3
enhanced quality.
Our
diffusion
large
language
models
(dLLMs)
provide
1
unparalleled
speed,
2
improved
efficiency, and
3
enhanced quality.
Our
diffusion
large
language
models
(dLLMs)
provide
1
unparalleled
speed,
2
improved
efficiency, and
3
enhanced quality.
Our
diffusion
large
language
models
(dLLMs)
provide
1
unparalleled
speed,
2
improved
efficiency, and
3
enhanced quality.
1
Speed
5-10x
Faster than traditional LLMs
1
Speed
5-10x
Faster than traditional LLMs
1
Speed
5-10x
Faster than traditional LLMs
1
Speed
5-10x
Faster than traditional LLMs
2
Efficiency
5-10x
Cheaper than traditional LLMs
2
Efficiency
5-10x
Cheaper than traditional LLMs
2
Efficiency
5-10x
Cheaper than traditional LLMs
2
Efficiency
5-10x
Cheaper than traditional LLMs
3
quality
2x
Model size with the same latency and cost
3
quality
2x
Model size with the same latency and cost
3
quality
2x
Model size with the same latency and cost
3
quality
2x
Model size with the same latency and cost
Inception
[ Parallel Text
Generation
Now Live]

Inception
[ Parallel Text
Generation
Now Live]

Inception
[ Parallel Text
Generation
Now Live]

Inception
[ Parallel Text
Generation
Now Live]

Benefits
Diffusion modeling charts a path towards next-generation AI
Diffusion offers benefits beyond speed and efficiency.
Diffusion offers benefits beyond speed and efficiency.
Reasoning
Diffusion models support advanced reasoning capabilities by providing built-in error correction mechanisms to fix mistakes and hallucinations.
Reasoning
Diffusion models support advanced reasoning capabilities by providing built-in error correction mechanisms to fix mistakes and hallucinations.
Reasoning
Diffusion models support advanced reasoning capabilities by providing built-in error correction mechanisms to fix mistakes and hallucinations.
Reasoning
Diffusion models support advanced reasoning capabilities by providing built-in error correction mechanisms to fix mistakes and hallucinations.
Multimodality
Diffusion models provide a unified framework for generative AI, offering strong performance across modalities of data, including images, videos, and text.
Multimodality
Diffusion models provide a unified framework for generative AI, offering strong performance across modalities of data, including images, videos, and text.
Multimodality
Diffusion models provide a unified framework for generative AI, offering strong performance across modalities of data, including images, videos, and text.
Multimodality
Diffusion models provide a unified framework for generative AI, offering strong performance across modalities of data, including images, videos, and text.
Control
Diffusion models deliver control over output structure, making them ideal for function calling and structured data generation.
Control
Diffusion models deliver control over output structure, making them ideal for function calling and structured data generation.
Control
Diffusion models deliver control over output structure, making them ideal for function calling and structured data generation.
Control
Diffusion models deliver control over output structure, making them ideal for function calling and structured data generation.
Inception
Brought to you by the inventors of diffusion models, Flash Attention, Decision Transformers, AlpacaLora, and Direct Preference Optimization.
Find out more
Latest
10 Min Read Feb 2025
Introducing Mercury, the first commercial-scale diffusion large language model.
We trained diffusion large language models that are up to 10x faster and cheaper than traditional autoregressive models, pushing the frontier of quality and speed for language models.
Find out more
Latest
10 Min Read Feb 2025
Introducing Mercury, the first commercial-scale diffusion large language model.
We trained diffusion large language models that are up to 10x faster and cheaper than traditional autoregressive models, pushing the frontier of quality and speed for language models.
Find out more
Latest
10 Min Read Feb 2025
Introducing Mercury, the first commercial-scale diffusion large language model.
We trained diffusion large language models that are up to 10x faster and cheaper than traditional autoregressive models, pushing the frontier of quality and speed for language models.
Find out more
Latest
10 Min Read Feb 2025
Introducing Mercury, the first commercial-scale diffusion large language model.
We trained diffusion large language models that are up to 10x faster and cheaper than traditional autoregressive models, pushing the frontier of quality and speed for language models.
Inception
[ ultra fast and
ultra
efficient]

Inception
[ ultra fast and
ultra
efficient]

Inception
[ ultra fast and
ultra
efficient]

Inception
[ ultra fast and
ultra
efficient]

Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Where Next-Gen AI Begins
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Breakthrough speed, efficiency, and capability provided by diffusion models .
Where Next-Gen AI Begins
Got a question?
Here are some of our most asked questions.
What are diffusion large language models (dLLMs)?
dLLMs are a new type of LLM based on diffusion modeling, the technology that powers all modern image and video generation tools. Traditional autoregressive (AR) models generate units of text (tokens) one at a time. In contrast, we start with random, “noisy” text and iteratively refine that text into a meaningful output. The process is like taking a blurry image and repeatedly increasing the resolution until a clear image emerges.
What are the advantages of dLLMs?
Many reasons:
Diffusion models are faster than AR models because they generate tokens in parallel.
Diffusion models are cheaper than AR models because they utilize the GPU more efficiently. This means fewer GPUs are needed to serve the same number of users. Because diffusion models are faster and cheaper, developers can replace a small AR model with a larger diffusion model without harming the user experience or increasing costs.
Diffusion models don’t commit to every token they generate, meaning they can correct errors and hallucinations. This also means they are better at reasoning.
Diffusion models are ideal for function calling and structured object generation because they can enforce arbitrarily complex syntax.
Diffusion language models are more naturally multimodal because image, video, and audio generation relies on diffusion models. So, they should be more performant on multimodal tasks.
What made dLLMs possible?
People have been unable to train high-quality diffusion models that operate on discrete data, such as text. Inception sprung from breakthrough research we conducted that demonstrated how to train discrete diffusion models.
Who are you?
We’re a team of researchers and engineers out of Stanford, UCLA, and Cornell. We are led by some of the pioneers of diffusion modeling.
How can I get access to your dLLMs?
Our first model is available in a playground. For commercial use, contact us. We support API access and on-premise deployments.
What models are currently available?
Our playground provides access to Mercury Coder, a small code generation model. A base model and a chat model are available for enterprise customers.
How can I learn more?
Got a question?
Here are some of our most asked questions.
What are diffusion large language models (dLLMs)?
dLLMs are a new type of LLM based on diffusion modeling, the technology that powers all modern image and video generation tools. Traditional autoregressive (AR) models generate units of text (tokens) one at a time. In contrast, we start with random, “noisy” text and iteratively refine that text into a meaningful output. The process is like taking a blurry image and repeatedly increasing the resolution until a clear image emerges.
What are the advantages of dLLMs?
Many reasons:
Diffusion models are faster than AR models because they generate tokens in parallel.
Diffusion models are cheaper than AR models because they utilize the GPU more efficiently. This means fewer GPUs are needed to serve the same number of users. Because diffusion models are faster and cheaper, developers can replace a small AR model with a larger diffusion model without harming the user experience or increasing costs.
Diffusion models don’t commit to every token they generate, meaning they can correct errors and hallucinations. This also means they are better at reasoning.
Diffusion models are ideal for function calling and structured object generation because they can enforce arbitrarily complex syntax.
Diffusion language models are more naturally multimodal because image, video, and audio generation relies on diffusion models. So, they should be more performant on multimodal tasks.
What made dLLMs possible?
People have been unable to train high-quality diffusion models that operate on discrete data, such as text. Inception sprung from breakthrough research we conducted that demonstrated how to train discrete diffusion models.
Who are you?
We’re a team of researchers and engineers out of Stanford, UCLA, and Cornell. We are led by some of the pioneers of diffusion modeling.
How can I get access to your dLLMs?
Our first model is available in a playground. For commercial use, contact us. We support API access and on-premise deployments.
What models are currently available?
Our playground provides access to Mercury Coder, a small code generation model. A base model and a chat model are available for enterprise customers.
How can I learn more?
Got a question?
Here are some of our most asked questions.
What are diffusion large language models (dLLMs)?
dLLMs are a new type of LLM based on diffusion modeling, the technology that powers all modern image and video generation tools. Traditional autoregressive (AR) models generate units of text (tokens) one at a time. In contrast, we start with random, “noisy” text and iteratively refine that text into a meaningful output. The process is like taking a blurry image and repeatedly increasing the resolution until a clear image emerges.
What are the advantages of dLLMs?
Many reasons:
Diffusion models are faster than AR models because they generate tokens in parallel.
Diffusion models are cheaper than AR models because they utilize the GPU more efficiently. This means fewer GPUs are needed to serve the same number of users. Because diffusion models are faster and cheaper, developers can replace a small AR model with a larger diffusion model without harming the user experience or increasing costs.
Diffusion models don’t commit to every token they generate, meaning they can correct errors and hallucinations. This also means they are better at reasoning.
Diffusion models are ideal for function calling and structured object generation because they can enforce arbitrarily complex syntax.
Diffusion language models are more naturally multimodal because image, video, and audio generation relies on diffusion models. So, they should be more performant on multimodal tasks.
What made dLLMs possible?
People have been unable to train high-quality diffusion models that operate on discrete data, such as text. Inception sprung from breakthrough research we conducted that demonstrated how to train discrete diffusion models.
Who are you?
We’re a team of researchers and engineers out of Stanford, UCLA, and Cornell. We are led by some of the pioneers of diffusion modeling.
How can I get access to your dLLMs?
Our first model is available in a playground. For commercial use, contact us. We support API access and on-premise deployments.
What models are currently available?
Our playground provides access to Mercury Coder, a small code generation model. A base model and a chat model are available for enterprise customers.
How can I learn more?
Got a question?
Here are some of our most asked questions.
What are diffusion large language models (dLLMs)?
dLLMs are a new type of LLM based on diffusion modeling, the technology that powers all modern image and video generation tools. Traditional autoregressive (AR) models generate units of text (tokens) one at a time. In contrast, we start with random, “noisy” text and iteratively refine that text into a meaningful output. The process is like taking a blurry image and repeatedly increasing the resolution until a clear image emerges.
What are the advantages of dLLMs?
Many reasons:
Diffusion models are faster than AR models because they generate tokens in parallel.
Diffusion models are cheaper than AR models because they utilize the GPU more efficiently. This means fewer GPUs are needed to serve the same number of users. Because diffusion models are faster and cheaper, developers can replace a small AR model with a larger diffusion model without harming the user experience or increasing costs.
Diffusion models don’t commit to every token they generate, meaning they can correct errors and hallucinations. This also means they are better at reasoning.
Diffusion models are ideal for function calling and structured object generation because they can enforce arbitrarily complex syntax.
Diffusion language models are more naturally multimodal because image, video, and audio generation relies on diffusion models. So, they should be more performant on multimodal tasks.
What made dLLMs possible?
People have been unable to train high-quality diffusion models that operate on discrete data, such as text. Inception sprung from breakthrough research we conducted that demonstrated how to train discrete diffusion models.
Who are you?
We’re a team of researchers and engineers out of Stanford, UCLA, and Cornell. We are led by some of the pioneers of diffusion modeling.
How can I get access to your dLLMs?
Our first model is available in a playground. For commercial use, contact us. We support API access and on-premise deployments.
What models are currently available?
Our playground provides access to Mercury Coder, a small code generation model. A base model and a chat model are available for enterprise customers.
How can I learn more?