Table Of Content
- Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Environment and setup for Discover GLM-OCR: The New Lightweight OCR AI by GLM
- System used
- Set up and run GLM-OCR
- Performance and resource usage with Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Tests and results using Discover GLM-OCR: The New Lightweight OCR AI by GLM
- English text recognition from an image
- Handwritten letter with crossed-out words
- Language support limitations
- Table recognition on an invoice
- Formula extraction to LaTeX
- Structured information extraction with a JSON schema
- Architecture of Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Components
- Training
- Benchmarks
- Final thoughts on Discover GLM-OCR: The New Lightweight OCR AI by GLM

Discover GLM-OCR: The New Lightweight OCR AI by GLM
Table Of Content
- Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Environment and setup for Discover GLM-OCR: The New Lightweight OCR AI by GLM
- System used
- Set up and run GLM-OCR
- Performance and resource usage with Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Tests and results using Discover GLM-OCR: The New Lightweight OCR AI by GLM
- English text recognition from an image
- Handwritten letter with crossed-out words
- Language support limitations
- Table recognition on an invoice
- Formula extraction to LaTeX
- Structured information extraction with a JSON schema
- Architecture of Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Components
- Training
- Benchmarks
- Final thoughts on Discover GLM-OCR: The New Lightweight OCR AI by GLM
Team GLM has released a practical, easy to use, lightweight OCR model. I installed it, ran a series of tests, and assessed its speed, accuracy, resource usage, and limitations.
Discover GLM-OCR: The New Lightweight OCR AI by GLM
It looks like a powerful small pocket rocket designed to read and understand text in complex documents like PDFs with tables, formulas, and mixed layouts. The model is practical. It's open source and can be deployed through VLM, SG Lang, Ollama or Python. I think VLM and SG langola support is still not there but it will be today or tomorrow.
You can use it for two main tasks:
- Extract raw content like text, formulas, and tables from documents.
- Pull structured information like names and dates from an ID card. Any named entity recognition by providing a JSON template. I test both of these use cases below.
Its efficient design makes it fast and affordable to run even in high demand production environments.
Environment and setup for Discover GLM-OCR: The New Lightweight OCR AI by GLM
System used
- OS: Ubuntu
- GPU: Nvidia RTX 6000 with 48 GB VRAM

Set up and run GLM-OCR
I installed the prerequisites, then wrote a small script:
- Import the installed libraries.
- Specify the model.
- Download the model and load it onto a GPU.
- Use a simple prompt template that points to a local image.
- Set the task to text recognition and process the input with hyperparameters.
- Decode the model output and print the result.

Performance and resource usage with Discover GLM-OCR: The New Lightweight OCR AI by GLM
- Model size: 2.65 GB.
- Inference speed: very quick. The model came back with responses in a jiffy.
- VRAM consumption: about 2.5 GB during inference. You can easily run it on a decent modern CPU.

Tests and results using Discover GLM-OCR: The New Lightweight OCR AI by GLM
English text recognition from an image
I pointed the prompt to a simple English text image and asked for text recognition. It returned the output very quickly.

Handwritten letter with crossed-out words
I tested a handwritten letter with some crossed out words.
- It read the text well.
- It did not hallucinate crossed-out words, which is good.
- It captured small details like dots.
- It missed an apostrophe in one place.

Language support limitations
From the model card, this is a bilingual model for English and Chinese. I tested Hindi, Polish, French, and Arabic. As expected, it did not support these other languages. At times it hung or printed gibberish. This is a big limitation that needs to be fixed because the competition in OCR has gone a long way up.

Table recognition on an invoice
I provided an invoice image and ran table recognition.
- It detected just the table.
- It returned all values correctly in my checks.
- Accuracy of text and table recognition is quite good given the model size.

Formula extraction to LaTeX
I tested formula extraction. It returned the formula as LaTeX. It was spot on.

Structured information extraction with a JSON schema
I used a predefined JSON schema and asked the model to extract fields from the same invoice.
- The output was a properly formatted JSON object containing the extracted invoice data.
- This shows the model's ability to understand document structure and extract specific fields into a machine-readable format.
- I am very impressed by the speed.

Architecture of Discover GLM-OCR: The New Lightweight OCR AI by GLM
Components
- Visual encoder: they are using cog with visual encoder, pre-trained on large-scale image text data to process and understand document images.
- Cross-modal connector: a very lightweight connector that bridges vision and language by downsampling visual tokens.
- Text decoder: their own GLM.5 billion as a decoder generates text output. It does a very fine job.

Training
- The model is trained using multi-token prediction loss and stable full tasks reinforcement learning, which improves both training efficiency and recognition accuracy.
Benchmarks
I haven't got any benchmarking information yet. I don't see much on their model card because the model was just released.
Final thoughts on Discover GLM-OCR: The New Lightweight OCR AI by GLM
GLM-OCR is practical, small, fast, and accurate on text, tables, formulas, and structured extraction. The bilingual scope is a clear limitation right now, and broader multilingual support would make it much more useful. Even so, it already performs well for English and Chinese documents, and I expect it to keep evolving.
Related Posts

KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs
KugelAudio Open: European Open-Source TTS That Surpasses ElevenLabs

How to Set Up Qwen3-Coder-Next and OpenClaw with llama.cpp Locally
How to Set Up Qwen3-Coder-Next and OpenClaw with llama.cpp Locally

How to Run OpenClaw Instantly Without Any Installation
How to Run OpenClaw Instantly Without Any Installation

