Extract Any PDF with MinerU 2.5 (Easy Tutorial)

Introduction

MinerU has been one of my go-to tools for turning PDFs into machine-readable data. I also keep an eye on alternatives like Docling and vision-language models from Qwen, but MinerU stands out for its balance of structure preservation, conversion quality, and practicality for dataset creation.

MinerU is open-source and converts PDFs into formats such as Markdown and JSON. I rely on it for building custom datasets because it allows targeted extraction and easy reuse. Over time it has matured into a solid option for technical, legal, and business documents.

MinerU did have one drawback: performance. In heavier production scenarios, rendering and serving were slow and a bit complex due to sglang. The recent integration of vLLM changes that. vLLM is a fast inference engine, and MinerU 2.5 with vLLM significantly improves speed while keeping the same extraction quality.

In this guide, I’ll install and run MinerU 2.5 with vLLM locally, outline what it does well, and share practical notes from testing across languages and document types. I used Ubuntu with a single NVIDIA RTX A6000 (48 GB VRAM), but you can run it on Windows, Linux, or macOS, with or without a GPU.

What is MinerU?

MinerU is an open-source PDF-to-structured-data system. It focuses on preserving layout and structure while producing clean outputs such as Markdown, JSON, HTML for tables, and LaTeX for formulas.

Extract Any PDF with MinerU 2.5

It is built for real-world documents where structure matters: multi-column layouts, tables, images, equations, figures, captions, and scanned pages. It removes noise such as headers, footers, footnotes, and page numbers to keep the text coherent, while still supporting options to keep them if needed.

With vLLM integration, MinerU 2.5 gets a major performance boost. The tool can run through multi-page PDFs quickly, maintain page-level structure, and export useful, ready-to-use outputs.

Table Overview

System Setup Used

Component	Configuration
OS	Ubuntu (Linux)
GPU	NVIDIA RTX A6000, 48 GB VRAM
CPU	Supported; slower than GPU
MinerU	Version 2.5 with vLLM integration
Python Env	Virtual environment recommended
Install Mode	Editable install for core + VLM

Core Capabilities at a Glance

Capability	Details
Output Formats	Markdown, JSON, plain text; HTML for tables; LaTeX for formulas
Layout Handling	Single-column, multi-column, complex layouts
Structure Cleanup	Removes headers, footers, footnotes, and page numbers (configurable)
Objects	Text, images, tables, equations, captions
OCR	Scanned or noisy PDFs; 84 languages
Interfaces	Web UI, CLI, SDK; hosted option via client
Engine	vLLM for fast inference; integrates layout/vision components

Components Initialized at Runtime

Component	Role
Layout/Detection (e.g., YOLO)	Region detection for text, tables, images
OCR (e.g., Paddle-based)	Text recognition for scanned or low-quality pages
VLM engine	Fast reasoning, segmentation, structure-aware extraction
Language Model hints (e.g., Qwen)	Language-aware text tasks and structure understanding

Note: The project initializes multiple models at first run. Model downloads happen automatically on launch.

Observed Language Quality (From Testing)

Language	Result Summary
English	Accurate structure, fast processing, strong table and image handling
Chinese	Strong performance; consistent structure retention
German	Good output; correct tabular data conversion
Swedish	Good overall match with originals
Indonesian	Mixed; script is Latin, some fields correct, minor inconsistencies
Arabic	Weak in Markdown/text extraction; region marking OK
Hindi	Not extracted; region marking OK
Urdu	Not extracted; rendered as image only

Key Features

Structured conversion
- Markdown and JSON outputs for clean downstream processing
- HTML for tables and LaTeX for formulas
Layout-aware processing
- Handles single- and multi-column layouts and complex page structures
- Extracts tables, figures, images, captions, and text blocks with preserved order
Noise removal
- Optional removal of headers, footers, page numbers, and footnotes for coherent text flow
OCR support
- Works on scanned PDFs and noisy text in 84 languages
Speed with vLLM
- Significantly faster inference compared with prior sglang-based runs
Flexible interfaces
- Web UI for interactive use
- CLI for batch processing
- SDK for programmatic integration
- Hosted option (connect via client) if you prefer not to run locally
Local-first privacy option
- Full local execution and data isolation

How it works

MinerU 2.5 orchestrates several components to extract structure and text from PDFs:

Layout detection
- Detects blocks such as paragraphs, tables, images, and formulas
- Models like YOLO contribute to robust region detection
OCR and text recognition
- Applies OCR on scanned or garbled pages
- Helps recover text in noisy or low-resolution documents
Structure analysis
- Groups related elements, resolves multi-column flow, and removes repeated noise (headers/footers) if enabled
Table and formula conversion
- Converts tables into HTML with cell structure
- Converts formulas into LaTeX for reproducibility and downstream typesetting
vLLM-powered reasoning
- Speeds up model inference for segmentation, reading order, and content labeling
- Improves throughput on large or multi-page documents

The result is a set of clean, structured artifacts suitable for downstream data work, content reuse, or search and indexing.

How to use

Prerequisites

OS: Linux, Windows, or macOS
Python: Virtual environment recommended
GPU: Optional but strongly recommended for speed
RAM/VRAM:
- GPU runs are significantly faster; during testing with a 48 GB GPU, VRAM peaked around 25 GB on large, multi-page docs
- CPU runs work but are slower

Installation (vLLM integration)

Follow these steps to install MinerU 2.5 with vLLM locally:

Create and activate a virtual environment
- python -m venv .venv && source .venv/bin/activate (Linux/macOS)
- python -m venv .venv && .\.venv\Scripts\activate (Windows)
Clone the MinerU repository
- git clone <repo-url>
- cd <repo-folder>
Install MinerU with VLM support in editable mode
- pip install -e ".[core,vlm]"
Wait for dependencies to install
- Initial setup can take a few minutes

Launch the Web UI

MinerU includes a demo application for the UI:

Go to the demo directory
- cd demo
Launch the demo app
- python demo.py
First run will download required models and initialize the vLLM engine
- Layout detection, OCR, and VLM components are set up automatically
Open the local URL in your browser to access the UI

Alternative Interfaces

CLI
- Batch-run documents from the terminal, ideal for pipelines
SDK
- Integrate MinerU into your Python applications for custom workflows
Hosted
- Connect via the provided client if you prefer managed infrastructure
Local/private
- All steps above keep the entire workflow on your machine

Performance Notes and Tips

GPU memory
- Complex, multi-page PDFs can require substantial VRAM; plan for peaks in the 20–25 GB range for larger runs
CPU runs
- Fully supported but slower; useful for small jobs or environments without GPUs
Model downloads
- Allow the first run to complete all downloads before testing large documents
Noise removal
- Default behavior removes headers/footers; you can configure output to keep them if your use case requires
Exports
- Favor Markdown or JSON for downstream processing
- Use HTML tables and LaTeX formulas when precision is required

Practical results from testing

Once installed and launched with vLLM, MinerU 2.5 processed documents quickly and produced high-quality structure-preserving outputs. Below are condensed observations from a range of documents and languages, following the same order I tested:

English technical/business documents
- Accurate Markdown and text exports with correct tables and images
- Multi-page documents processed in seconds
- VRAM usage peaked a little over 25 GB on the GPU during intensive runs
Structured spec sheets
- Reliable detection of tables and images
- Clean Markdown plus HTML tables, ready for reuse
Image inputs
- Images embedded in PDFs extracted correctly
- Object and layout detection performed well
Arabic
- Region marking worked, but Markdown/text extraction was not usable
Chinese
- Strong output quality with clean structure
- Headers and footers removed by default; configuration options exist to keep them
- Table and formula recognition worked as expected
German
- Good extraction; tabular data cleanly converted
Hindi
- Region marking worked, but text extraction did not produce usable output
Indonesian
- Mixed outcomes; some content extracted correctly, some inconsistencies remained
Swedish
- Visual match with original documents was strong
- Layout and flow preserved
Urdu
- Rendered as image with no usable text extraction
Formulas and math-heavy sections
- LaTeX outputs looked correct and reproducible
- Complex formula regions were detected and converted cleanly
Charts and academic papers
- Images and figures extracted properly
- Overall document structure and labeling showed noticeable improvement over prior runs

Overall, European languages performed well, with English and Chinese particularly strong. Arabic, Hindi, and Urdu were not reliable for text extraction in these tests. Indonesian was mixed. Speed and consistency were notably better with vLLM than earlier sglang-based setups.

FAQs

Can I run MinerU without a GPU?

Yes. CPU runs work, though they’re slower. For heavier workloads or large multi-page PDFs, a GPU is recommended.

How fast is MinerU 2.5 with vLLM?

It’s significantly faster than prior configurations using sglang. Multi-page documents processed in seconds in many cases during testing. Actual speed depends on document complexity, hardware, and batch settings.

What output formats are supported?

Markdown and JSON for general content
HTML for tables
LaTeX for formulas
Plain text when you need quick extraction

Does MinerU handle scanned PDFs?

Yes. MinerU includes OCR across 84 languages for scanned or garbled PDFs. Quality depends on scan resolution and language.

Can I keep headers and footers?

By default, MinerU removes repeated noise (headers, footers, page numbers, footnotes) to keep flows coherent. You can configure the system to keep them if needed.

What languages work best?

From testing:

Strong: English, Chinese
Good: German, Swedish
Mixed: Indonesian
Weak/Not extracted: Arabic, Hindi, Urdu

Does MinerU support complex layouts?

Yes. It handles single- and multi-column documents, figures, tables, captions, and multi-page flows while preserving structure.

Can I use MinerU via CLI or programmatically?

Yes. MinerU supports a CLI for batch workflows and an SDK for Python integration. There is also a hosted option via client, as well as full local-only operation.

What are the model components involved?

MinerU initializes layout detection, OCR, and a vision-language model. At runtime you may see components like YOLO for region detection, Paddle-based OCR, and a vLLM-initialized model. There are hints of Qwen in the stack, and improvements are associated with work around the Intern family of models.

How much VRAM do I need?

It varies by document size and complexity. In testing with large, multi-page documents, VRAM peaks were slightly above 25 GB on a 48 GB GPU. Smaller cases need much less. CPU runs avoid VRAM constraints but are slower.

Conclusion

MinerU 2.5 with vLLM is a strong choice for converting PDFs into structured, reusable data. It preserves document layout, produces clean Markdown/JSON, and includes accurate table and formula conversions (HTML and LaTeX). With OCR across 84 languages, it can recover text from scanned or noisy pages.

The vLLM integration brings a clear speed boost compared with earlier setups, reducing friction for production workflows. In testing, English and Chinese were particularly strong, with European languages generally solid. Arabic, Hindi, and Urdu did not yield usable text; Indonesian was mixed. For many technical, business, and academic documents, MinerU’s structure retention and export fidelity are exactly what’s needed for downstream processing and dataset creation.

You can run MinerU on Linux, Windows, or macOS, on CPU or GPU. The project offers a web UI, CLI, SDK, and a hosted option. If you want local-only processing, the fully private workflow is straightforward. With vLLM in place, MinerU 2.5 is a practical tool for fast, reliable, structure-aware PDF extraction.