Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

Bokeh Diffusion is an image generator that allows precise control over the blur effect in the background of an image. This effect, known as bokeh, is often used in professional photography to create a shallow depth of field, making the subject stand out while giving the image a three-dimensional appearance.
For a long time, I have been waiting for an AI tool capable of controlling this bokeh effect, and now it has finally arrived.
What is Bokeh Diffusion?
Bokeh Diffusion is a text-to-image AI model that allows precise control over background blur (bokeh) in generated images. Unlike traditional models that rely on vague prompt engineering for blur effects, Bokeh Diffusion explicitly adjusts the blur level using a defocus parameter while preserving scene consistency.
It achieves this through a hybrid training method that combines real-world and synthetic blur data, enabling flexible depth-of-field control and real-image editing.
Bokeh Diffusion Overview:
Detail | Description |
---|---|
Name | Bokeh Diffusion |
Purpose | Defocus Blur Control in Text-to-Image Diffusion Models |
Paper | arxiv.org/abs/2503.08434 |
GitHub Repository | github.com/atfortes/BokehDiffusion |
Official Website | atfortes.github.io/projects/bokeh-diffusion/ |
Understanding the Bokeh Effect in AI Image Generation
In professional photography, adjusting the bokeh effect enhances the visual appeal of images. With Bokeh Diffusion, AI now enables users to control how clear or blurry the background appears.
To illustrate how this works, let's take a look at some examples:
- If you keep the same text prompt and all other settings identical but adjust the bokeh value, the background clarity changes.
- A photo of a cat remains the same in every aspect, but by adjusting the bokeh from 0 to 30, the background progressively becomes blurrier.
- A drone with a cityscape in the background exhibits similar behavior when the bokeh value is modified.
How Bokeh Diffusion Works
The bokeh effect in this AI model ranges from 0 to 30, where:
- 0 results in a background that is completely clear and detailed.
- 30 produces an extremely blurred background.
Here are some examples demonstrating how different bokeh values influence the background blur:
Subject | Bokeh Value | Background Description |
---|---|---|
Smoothie with market | 0 | Market details remain sharp and clear. |
Car in the city | 1 | Background is still very detailed. |
Red wine on a table | 12 | Background slightly blurred but recognizable. |
Cow on a farm | 14 | Moderate background blur. |
Woman in a park | 18 | Noticeably blurred background. |
Man in a busy street | 29 | Extremely blurred background. |
Seashell on a beach | 29 | Background nearly indistinguishable. |
Comparison with Flux Image Generator
Many AI image generators, such as Flux, allow users to specify background blurriness in prompts. However, in practice, Flux does not offer the same level of control.
It often applies a generic blur effect, making most backgrounds uniformly blurry.
In contrast, Bokeh Diffusion provides full control over the depth of field, allowing for:
- Completely clear backgrounds.
- Slightly blurred backgrounds.
- Highly blurred backgrounds.
This level of control ensures that the generated images achieve the desired effect with precision.
The Mechanism Behind Bokeh Diffusion
Bokeh Diffusion operates using the following components:
- Text Prompt – The main input provided by the user.
- Bokeh Parameter – A specialized setting that determines the background blur level.
- Grounded Self-Attention Component – Ensures that the subject remains sharp and consistent while only altering the background blur.
GitHub Repository and Future Updates
The developers have released a GitHub repository for Bokeh Diffusion and have indicated that more updates will be coming soon. Those interested in experimenting with this tool should stay tuned for further developments.
Bokeh Diffusion is a step forward in AI image generation, offering unparalleled control over depth of field. As AI continues to evolve, this capability opens up new possibilities for photographers, designers, and digital artists looking to enhance their visuals with precise background adjustments.
Bokeh Diffusion Method:
Bokeh Diffusion employs a unique combination of three core components to achieve lens-like bokeh effects without altering the scene's structure:
- Hybrid Dataset Pipeline: This method integrates real-world images, which provide authentic bokeh effects and diversity, with synthetic blur augmentations to create contrastive pairs. This dual approach ensures that the defocus realism is anchored while providing robust examples for training.
-
Defocus Blur Conditioning: A physically interpretable blur parameter, ranging from 0 to 30, is injected through decoupled cross-attention in the deeper layers of the U-Net architecture. This technique maintains semantic features while precisely controlling the defocus level.
-
Grounded Self-Attention: A "pivot" image is used to anchor the scene layout, ensuring consistent object placement across varying blur levels. This mechanism prevents unintended shifts in content when adjusting the defocus, maintaining the integrity of the scene.
These components work in harmony to provide unparalleled control over the depth of field, allowing users to create images with precise background adjustments while preserving the original scene's structure.
Related Posts

3DTrajMaster: A Step-by-Step Guide to Video Motion Control
Browser Use is an AI-powered browser automation framework that lets AI agents control your browser to automate web tasks like scraping, form filling, and website interactions.

Browser-Use Free AI Agent: Now AI Can control your Web Browser
Browser Use is an AI-powered browser automation framework that lets AI agents control your browser to automate web tasks like scraping, form filling, and website interactions.

Caracal AI: Free Tool for Handwritten Text Recognition, Extract text from Images
Caracal is a text recognition project that has been widely cloned and fine-tuned by users for specific purposes. The project leverages advanced technology for text recognition tasks, as highlighted in the provided transcript snippet.