Bria FIBO: Open-Source JSON-Native Text-to-Image for Production

November 5, 2025

0 views

10 min read

Table Of Content

What is Bria FIBO?
Overview of Bria FIBO
Key Features of Bria FIBO
Installation and Local Setup
Environment and Hardware
Prerequisites
Steps to Install and Run Locally
How FIBO Works
Training Approach
JSON-Native Control
VLM-Guided Prompting
Operational Modes
Generate
Refine
Inspire
Local Demo: Process and Observations
Launch and Resource Footprint
JSON Prompt Generation
Image Quality and Adherence
Refinement and Reference-Based Workflows
Refinement Behavior
VRAM During Refinement
Inspire Mode Notes
Bria Fibo Workflow: Step-by-Step Guide
From Idea to Image
Refining an Existing Image
Inspiring from a Reference
Bria Fibo Practical Considerations
Prompt Design
Performance and Stability
Reproducibility
Licensing
Architecture and Control Principles
Controllability
Predictability
Disentanglement
Integration and Extensibility
VLM for Guided Prompting
API-Friendly Design
Observations from Local Use
Troubleshooting Tips
Suitability for Production Workflows
Summary
Conclusion

Text-to-image systems are abundant, yet many feel unpredictable and hard to control. Bria FIBO takes a different path by prioritizing precise, structured control over artistic spontaneity. It is the first open-source JSON-native text-to-image model trained entirely on long, structured captions, often over 1,000 words.

In this article, I install it locally, explain how it works, and walk through its production-ready controls. I also share practical observations on prompt handling, modes, hardware needs, and image refinement behavior.

What is Bria FIBO?

Bria FIBO is a JSON-native text-to-image model designed for professional workflows. Instead of treating prompts as loose guidance, it interprets structured JSON instructions to generate images with fine-grained control.

Bria FIBO Open Source JSON Native Text to Image for Production

With 8 billion parameters, it aims to deliver consistent prompt adherence and full reproducibility. Its training on long, structured captions enables control over lighting, camera parameters, color tones, and composition.

Overview of Bria FIBO

Aspect	Details
Model	Bria FIBO
Modality	JSON-native text-to-image
Parameters	8B
Training data format	Long, structured captions (often >1,000 words)
Core focus	Controllability, predictability, disentanglement
Prompt style	Structured JSON instructions
Primary modes	Generate, Refine, Inspire
Reproducibility	Full
Guidance	Integrates a VLM for guided prompting
Interfaces	Local UI (e.g., Gradio), API-friendly JSON
Local hardware used here	Ubuntu + NVIDIA RTX A6000 48 GB VRAM
Typical VRAM usage (observed)	~32 GB loaded; ~35 GB during generation; up to ~46 GB during refinement
Licensing	Check the official model card for permitted uses

Key Features of Bria FIBO

JSON-native prompting: Inputs are structured and explicit, enabling precise control over attributes without collapsing the entire scene.
Professional control: Lighting, focal properties, camera angles, composition, and color can be specified at a granular level.
Three operational modes: Generate from scratch, Refine an existing output, or Inspire from a reference image.
Prompt adherence: Strong alignment with instructions due to its training on long, structured captions.
Reproducibility: Designed for consistent outputs under identical settings.
VLM-guided prompting: A vision-language model helps turn short ideas into detailed JSON structures for image generation.
Production focus: Built for workflows that need predictability and control rather than loose artistic interpretations.

Installation and Local Setup

Environment and Hardware

I ran FIBO on Ubuntu with a single NVIDIA RTX A6000 (48 GB VRAM). The model loads fully onto the GPU, and VRAM usage varies by mode and operation.

Model load: ~32 GB
Typical generation: ~35 GB
Intensive refinement operations: up to ~46 GB

Actual usage may vary based on batch size, resolution, and refinement depth.

Prerequisites

A recent Linux distribution (Ubuntu in my case)
Python environment with standard ML tooling
CUDA-compatible NVIDIA GPU with sufficient VRAM
Internet access for the initial model download

Steps to Install and Run Locally

Prepare the environment
- Install GPU drivers, CUDA, and Python dependencies.
- Create and activate a virtual environment.
Fetch the repository
- Clone the official FIBO repository.
- Install project dependencies using the provided requirements files.
Launch the local UI
- Run the Gradio-based interface from the repository.
- On first run, the model weights will download automatically.
Access the interface
- Open the local UI at http://localhost:7860.
- Confirm the model is loaded onto the GPU and ready to process prompts.
Monitor resources
- Use nvidia-smi to watch VRAM consumption during load, generation, and refinement.

How FIBO Works

Training Approach

FIBO is trained entirely on long, structured captions. This approach supports detailed and disentangled control—lighting separate from composition, camera angle separate from styling, and so on. The result is fewer conflicts between attributes and more predictable outputs.

JSON-Native Control

Instead of a free-form text prompt, FIBO interprets structured JSON. This lets you specify attributes individually rather than hoping the model infers them. Parameters are clear and modular, supporting stepwise refinement without disturbing unrelated elements.

VLM-Guided Prompting

A vision-language model helps expand short ideas into comprehensive JSON prompts. This is especially helpful for users who want precise control without composing long, detailed instructions from scratch. You can start with a short phrase, then convert it into a structured prompt before generating.

Operational Modes

Generate

Produce a new image from a JSON prompt. You can begin with a compact text idea, let the VLM generate a structured JSON prompt, and then render. This mode is ideal when starting from scratch and you want control over scene layout, lighting, and lens-style attributes.

Refine

Adjust specific elements of an existing image. You might keep the composition but change colors, lighting, or focal properties. Refinement applies those targeted updates while preserving most other aspects of the scene.

Inspire

Guide generation using a reference image. This can inform composition, style, or mood. You supplement the reference with a structured JSON prompt to define what should follow the reference and what should change.

Local Demo: Process and Observations

Launch and Resource Footprint

Once the UI is running at localhost:7860, the model loads to roughly 32 GB of VRAM. During the first generation run, VRAM typically rises to around 35 GB. Refinement operations can push usage higher, up to approximately 46 GB, depending on resolution and settings.

This footprint sets a clear expectation for local deployments. A 48 GB GPU is sufficient for standard single-image operation at default settings.

JSON Prompt Generation

The VLM produces a structured prompt from a short phrase. The output JSON includes explicit descriptions for composition, lighting, depth of field, aesthetic controls, and other attributes. This pre-generation step is fast and yields a prompt that is both detailed and targeted.

The handoff from text to JSON is central to FIBO’s control. Instead of stacking more text into a single sentence, the structured approach defines attributes cleanly and reduces ambiguity.

Image Quality and Adherence

In practice, the model adheres closely to specified parameters. Lighting conditions, positional constraints, and styling cues are respected without collapsing unrelated elements. Outputs present well without the typical artificial sheen associated with many generative images.

This aligns with the model’s stated goals: predictability, control, and disentanglement.

Refinement can apply exact changes—such as color modifications—to an existing image. In testing, the change was applied correctly, though subtle deviations in shapes and facial details were observed. Most features were preserved, but some drift occurred on human faces.

Overall, refinement is effective for targeted editing but may need iteration for human feature fidelity, depending on the change requested.

Refinement runs can be VRAM-intensive. I observed a rise up to roughly 46 GB when applying changes. This should be factored into planning for batch operations or higher resolutions.

Inspire Mode Notes

Using a reference image influences composition and style to a useful degree. Prompt adherence remains strong, especially for lighting and environmental cues defined in the JSON. If the reference is weak or off-style, the influence may appear subtle.

Bria Fibo Workflow: Step-by-Step Guide

From Idea to Image

Enter a short textual idea.
Generate the JSON prompt using the VLM.
Review the structured attributes for lighting, composition, and camera parameters.
Adjust fields as needed for precision.
Generate the image.

Refining an Existing Image

Select the image to adjust.
Specify the exact attributes to change in the JSON (e.g., colors, lighting).
Keep unrelated fields stable to preserve composition.
Run refinement and verify the result.
Iterate with minor adjustments for exact visual targets.

Inspiring from a Reference

Upload or select a reference image.
Provide a concise description of the desired direction.
Generate or edit the JSON to define which attributes follow the reference and which should change.
Render and assess alignment to the reference-driven attributes.
Iterate for precision.

Bria Fibo Practical Considerations

Prompt Design

Think in structured attributes, not long prose.
Use JSON fields to isolate changes and avoid unintended global shifts.
Keep iterative edits small to preserve desired elements.

Performance and Stability

Expect ~32 GB VRAM at idle with the model loaded.
Budget ~35 GB for single-image generation at default settings.
Allow for spikes up to ~46 GB during intensive refinement.
Monitor with nvidia-smi during multi-step workflows.

Reproducibility

FIBO is built for repeatable results. With the same seed and parameters, you can expect the same output. This consistency is valuable for production pipelines and QA processes.

Licensing

If you plan to use outputs commercially, check the official model card and license. Confirm the terms for local and API-based usage before deploying to a production context.

Architecture and Control Principles

Controllability

FIBO treats each attribute as a distinct control. Lighting, camera angle, color palette, and composition are disentangled. This reduces conflicts and makes targeted edits more reliable.

Predictability

Because the model is trained on structured, explicit captions, it respects detailed instructions. The JSON schema aligns with how production teams think about scene construction, making outcomes easier to plan.

Disentanglement

Disentanglement is critical for professional workflows. When you alter one parameter, you aim to avoid unintended shifts elsewhere. FIBO’s training and JSON-native design support this separation.

Integration and Extensibility

VLM for Guided Prompting

The integrated VLM turns a short description into a well-specified JSON prompt. This lowers the barrier for precise control and helps maintain consistency across teams.

API-Friendly Design

FIBO’s JSON-first approach fits naturally into programmatic pipelines. Teams can build input validators, prompt templates, and automated QA checks around the structured schema. The design supports integration with various backends described in official resources.

Observations from Local Use

The structured JSON prompt is central; it consistently improves control and adherence.
The UI flow—idea to JSON to image—is straightforward.
Image quality presents well for production-style needs.
Human faces in refinement steps may require extra iteration to avoid minor drift.
VRAM usage is significant but predictable across modes.

Troubleshooting Tips

If the model fails to load, verify GPU drivers and CUDA compatibility.
If generation stalls, check VRAM headroom and reduce resolution or batch size.
If edits affect unintended areas, narrow the JSON changes and lock unrelated fields.
For consistent outputs across runs, fix the seed and keep parameters identical.

Suitability for Production Workflows

FIBO aligns with requirements where precision, repeatability, and controllability matter. Teams can define strict prompt schemas, version prompts, and maintain reproducible outputs across stages. The model’s JSON-native approach makes it practical to add linting, schema validation, and automated testing around prompts.

For organizations building catalogs, editorial assets, or repeatable visuals, this approach brings clarity to prompt design and reduces iteration time compared to free-form text prompts.

Summary

Bria FIBO is a JSON-native text-to-image model built for professional control.
It is trained on long, structured captions to enable detailed, disentangled attributes.
It supports three modes: Generate, Refine, and Inspire.
A VLM helps transform short ideas into comprehensive JSON prompts.
Local tests showed strong prompt adherence, natural presentation, and predictable VRAM use.
Refinement is powerful but may require iteration for precise human features.
Reproducibility and JSON-first design make it apt for production pipelines.

Conclusion

Bria FIBO focuses on precision, predictability, and reproducibility. Its JSON-native design, VLM-guided prompting, and three operational modes provide a clear workflow for controlled image generation. With suitable hardware and disciplined prompt design, it fits well into production contexts that demand consistency and granular edits.

For deployment or commercial use, consult the official model card and license.

Subscribe to our newsletter

Get the latest updates and articles directly in your inbox.