Bria FIBO: Open-Source JSON-Native Text-to-Image for Production

Table Of Content
- What is Bria FIBO?
- Overview of Bria FIBO
- Key Features of Bria FIBO
- Installation and Local Setup
- Environment and Hardware
- Prerequisites
- Steps to Install and Run Locally
- How FIBO Works
- Training Approach
- JSON-Native Control
- VLM-Guided Prompting
- Operational Modes
- Generate
- Refine
- Inspire
- Local Demo: Process and Observations
- Launch and Resource Footprint
- JSON Prompt Generation
- Image Quality and Adherence
- Refinement and Reference-Based Workflows
- Refinement Behavior
- VRAM During Refinement
- Inspire Mode Notes
- Bria Fibo Workflow: Step-by-Step Guide
- From Idea to Image
- Refining an Existing Image
- Inspiring from a Reference
- Bria Fibo Practical Considerations
- Prompt Design
- Performance and Stability
- Reproducibility
- Licensing
- Architecture and Control Principles
- Controllability
- Predictability
- Disentanglement
- Integration and Extensibility
- VLM for Guided Prompting
- API-Friendly Design
- Observations from Local Use
- Troubleshooting Tips
- Suitability for Production Workflows
- Summary
- Conclusion
Text-to-image systems are abundant, yet many feel unpredictable and hard to control. Bria FIBO takes a different path by prioritizing precise, structured control over artistic spontaneity. It is the first open-source JSON-native text-to-image model trained entirely on long, structured captions, often over 1,000 words.
In this article, I install it locally, explain how it works, and walk through its production-ready controls. I also share practical observations on prompt handling, modes, hardware needs, and image refinement behavior.
What is Bria FIBO?
Bria FIBO is a JSON-native text-to-image model designed for professional workflows. Instead of treating prompts as loose guidance, it interprets structured JSON instructions to generate images with fine-grained control.

With 8 billion parameters, it aims to deliver consistent prompt adherence and full reproducibility. Its training on long, structured captions enables control over lighting, camera parameters, color tones, and composition.
Overview of Bria FIBO
| Aspect | Details |
|---|---|
| Model | Bria FIBO |
| Modality | JSON-native text-to-image |
| Parameters | 8B |
| Training data format | Long, structured captions (often >1,000 words) |
| Core focus | Controllability, predictability, disentanglement |
| Prompt style | Structured JSON instructions |
| Primary modes | Generate, Refine, Inspire |
| Reproducibility | Full |
| Guidance | Integrates a VLM for guided prompting |
| Interfaces | Local UI (e.g., Gradio), API-friendly JSON |
| Local hardware used here | Ubuntu + NVIDIA RTX A6000 48 GB VRAM |
| Typical VRAM usage (observed) | ~32 GB loaded; ~35 GB during generation; up to ~46 GB during refinement |
| Licensing | Check the official model card for permitted uses |
Key Features of Bria FIBO
- JSON-native prompting: Inputs are structured and explicit, enabling precise control over attributes without collapsing the entire scene.
- Professional control: Lighting, focal properties, camera angles, composition, and color can be specified at a granular level.
- Three operational modes: Generate from scratch, Refine an existing output, or Inspire from a reference image.
- Prompt adherence: Strong alignment with instructions due to its training on long, structured captions.
- Reproducibility: Designed for consistent outputs under identical settings.
- VLM-guided prompting: A vision-language model helps turn short ideas into detailed JSON structures for image generation.
- Production focus: Built for workflows that need predictability and control rather than loose artistic interpretations.
Installation and Local Setup
Environment and Hardware
I ran FIBO on Ubuntu with a single NVIDIA RTX A6000 (48 GB VRAM). The model loads fully onto the GPU, and VRAM usage varies by mode and operation.
- Model load: ~32 GB
- Typical generation: ~35 GB
- Intensive refinement operations: up to ~46 GB
Actual usage may vary based on batch size, resolution, and refinement depth.
Prerequisites
- A recent Linux distribution (Ubuntu in my case)
- Python environment with standard ML tooling
- CUDA-compatible NVIDIA GPU with sufficient VRAM
- Internet access for the initial model download
Steps to Install and Run Locally
-
Prepare the environment
- Install GPU drivers, CUDA, and Python dependencies.
- Create and activate a virtual environment.
-
Fetch the repository
- Clone the official FIBO repository.
- Install project dependencies using the provided requirements files.
-
Launch the local UI
- Run the Gradio-based interface from the repository.
- On first run, the model weights will download automatically.
-
Access the interface
- Open the local UI at http://localhost:7860.
- Confirm the model is loaded onto the GPU and ready to process prompts.
-
Monitor resources
- Use nvidia-smi to watch VRAM consumption during load, generation, and refinement.
How FIBO Works
Training Approach
FIBO is trained entirely on long, structured captions. This approach supports detailed and disentangled control—lighting separate from composition, camera angle separate from styling, and so on. The result is fewer conflicts between attributes and more predictable outputs.
JSON-Native Control
Instead of a free-form text prompt, FIBO interprets structured JSON. This lets you specify attributes individually rather than hoping the model infers them. Parameters are clear and modular, supporting stepwise refinement without disturbing unrelated elements.
VLM-Guided Prompting
A vision-language model helps expand short ideas into comprehensive JSON prompts. This is especially helpful for users who want precise control without composing long, detailed instructions from scratch. You can start with a short phrase, then convert it into a structured prompt before generating.
Operational Modes
Generate
Produce a new image from a JSON prompt. You can begin with a compact text idea, let the VLM generate a structured JSON prompt, and then render. This mode is ideal when starting from scratch and you want control over scene layout, lighting, and lens-style attributes.
Refine
Adjust specific elements of an existing image. You might keep the composition but change colors, lighting, or focal properties. Refinement applies those targeted updates while preserving most other aspects of the scene.
Inspire
Guide generation using a reference image. This can inform composition, style, or mood. You supplement the reference with a structured JSON prompt to define what should follow the reference and what should change.
Local Demo: Process and Observations
Launch and Resource Footprint
Once the UI is running at localhost:7860, the model loads to roughly 32 GB of VRAM. During the first generation run, VRAM typically rises to around 35 GB. Refinement operations can push usage higher, up to approximately 46 GB, depending on resolution and settings.
This footprint sets a clear expectation for local deployments. A 48 GB GPU is sufficient for standard single-image operation at default settings.
JSON Prompt Generation
The VLM produces a structured prompt from a short phrase. The output JSON includes explicit descriptions for composition, lighting, depth of field, aesthetic controls, and other attributes. This pre-generation step is fast and yields a prompt that is both detailed and targeted.
The handoff from text to JSON is central to FIBO’s control. Instead of stacking more text into a single sentence, the structured approach defines attributes cleanly and reduces ambiguity.
Image Quality and Adherence
In practice, the model adheres closely to specified parameters. Lighting conditions, positional constraints, and styling cues are respected without collapsing unrelated elements. Outputs present well without the typical artificial sheen associated with many generative images.
This aligns with the model’s stated goals: predictability, control, and disentanglement.
Refinement and Reference-Based Workflows
Refinement Behavior
Refinement can apply exact changes—such as color modifications—to an existing image. In testing, the change was applied correctly, though subtle deviations in shapes and facial details were observed. Most features were preserved, but some drift occurred on human faces.
Overall, refinement is effective for targeted editing but may need iteration for human feature fidelity, depending on the change requested.
VRAM During Refinement
Refinement runs can be VRAM-intensive. I observed a rise up to roughly 46 GB when applying changes. This should be factored into planning for batch operations or higher resolutions.
Inspire Mode Notes
Using a reference image influences composition and style to a useful degree. Prompt adherence remains strong, especially for lighting and environmental cues defined in the JSON. If the reference is weak or off-style, the influence may appear subtle.
Bria Fibo Workflow: Step-by-Step Guide
From Idea to Image
- Enter a short textual idea.
- Generate the JSON prompt using the VLM.
- Review the structured attributes for lighting, composition, and camera parameters.
- Adjust fields as needed for precision.
- Generate the image.
Refining an Existing Image
- Select the image to adjust.
- Specify the exact attributes to change in the JSON (e.g., colors, lighting).
- Keep unrelated fields stable to preserve composition.
- Run refinement and verify the result.
- Iterate with minor adjustments for exact visual targets.
Inspiring from a Reference
- Upload or select a reference image.
- Provide a concise description of the desired direction.
- Generate or edit the JSON to define which attributes follow the reference and which should change.
- Render and assess alignment to the reference-driven attributes.
- Iterate for precision.
Bria Fibo Practical Considerations
Prompt Design
- Think in structured attributes, not long prose.
- Use JSON fields to isolate changes and avoid unintended global shifts.
- Keep iterative edits small to preserve desired elements.
Performance and Stability
- Expect ~32 GB VRAM at idle with the model loaded.
- Budget ~35 GB for single-image generation at default settings.
- Allow for spikes up to ~46 GB during intensive refinement.
- Monitor with nvidia-smi during multi-step workflows.
Reproducibility
FIBO is built for repeatable results. With the same seed and parameters, you can expect the same output. This consistency is valuable for production pipelines and QA processes.
Licensing
If you plan to use outputs commercially, check the official model card and license. Confirm the terms for local and API-based usage before deploying to a production context.
Architecture and Control Principles
Controllability
FIBO treats each attribute as a distinct control. Lighting, camera angle, color palette, and composition are disentangled. This reduces conflicts and makes targeted edits more reliable.
Predictability
Because the model is trained on structured, explicit captions, it respects detailed instructions. The JSON schema aligns with how production teams think about scene construction, making outcomes easier to plan.
Disentanglement
Disentanglement is critical for professional workflows. When you alter one parameter, you aim to avoid unintended shifts elsewhere. FIBO’s training and JSON-native design support this separation.
Integration and Extensibility
VLM for Guided Prompting
The integrated VLM turns a short description into a well-specified JSON prompt. This lowers the barrier for precise control and helps maintain consistency across teams.
API-Friendly Design
FIBO’s JSON-first approach fits naturally into programmatic pipelines. Teams can build input validators, prompt templates, and automated QA checks around the structured schema. The design supports integration with various backends described in official resources.
Observations from Local Use
- The structured JSON prompt is central; it consistently improves control and adherence.
- The UI flow—idea to JSON to image—is straightforward.
- Image quality presents well for production-style needs.
- Human faces in refinement steps may require extra iteration to avoid minor drift.
- VRAM usage is significant but predictable across modes.
Troubleshooting Tips
- If the model fails to load, verify GPU drivers and CUDA compatibility.
- If generation stalls, check VRAM headroom and reduce resolution or batch size.
- If edits affect unintended areas, narrow the JSON changes and lock unrelated fields.
- For consistent outputs across runs, fix the seed and keep parameters identical.
Suitability for Production Workflows
FIBO aligns with requirements where precision, repeatability, and controllability matter. Teams can define strict prompt schemas, version prompts, and maintain reproducible outputs across stages. The model’s JSON-native approach makes it practical to add linting, schema validation, and automated testing around prompts.
For organizations building catalogs, editorial assets, or repeatable visuals, this approach brings clarity to prompt design and reduces iteration time compared to free-form text prompts.
Summary
- Bria FIBO is a JSON-native text-to-image model built for professional control.
- It is trained on long, structured captions to enable detailed, disentangled attributes.
- It supports three modes: Generate, Refine, and Inspire.
- A VLM helps transform short ideas into comprehensive JSON prompts.
- Local tests showed strong prompt adherence, natural presentation, and predictable VRAM use.
- Refinement is powerful but may require iteration for precise human features.
- Reproducibility and JSON-first design make it apt for production pipelines.
Conclusion
Bria FIBO focuses on precision, predictability, and reproducibility. Its JSON-native design, VLM-guided prompting, and three operational modes provide a clear workflow for controlled image generation. With suitable hardware and disciplined prompt design, it fits well into production contexts that demand consistency and granular edits.
For deployment or commercial use, consult the official model card and license.
Related Posts

Best AI OCR Models 2025: Use‑Case Guide & Comparison
Compare top AI OCR models for 2025. Real‑world picks on accuracy, speed, and cost for images, PDFs, and scans to text—find the best fit for your workflow.

ChatGPT Atlas vs Perplexity Comet: Our Test Winner
Hands-on testing reveals a clear winner between ChatGPT Atlas and Perplexity Comet. See the side-by-side comparison, pros & cons, and our no-hype verdict.

ChatGPT 5 vs Gemini vs Claude vs Grok: Ultimate AI comaparison
We pit ChatGPT 5, Gemini, Claude, and Grok head‑to‑head—testing reasoning, coding, and hallucinations. See the benchmarks, real results, and which AI comes out on top.
