Sonu Sahani logo
Sonusahani.com
Discover GLM-OCR: The New Lightweight OCR AI by GLM

Discover GLM-OCR: The New Lightweight OCR AI by GLM

0 views
5 min read
#AI

Team GLM has released a practical, easy to use, lightweight OCR model. I installed it, ran a series of tests, and assessed its speed, accuracy, resource usage, and limitations.

Discover GLM-OCR: The New Lightweight OCR AI by GLM

It looks like a powerful small pocket rocket designed to read and understand text in complex documents like PDFs with tables, formulas, and mixed layouts. The model is practical. It's open source and can be deployed through VLM, SG Lang, Ollama or Python. I think VLM and SG langola support is still not there but it will be today or tomorrow.

You can use it for two main tasks:

  • Extract raw content like text, formulas, and tables from documents.
  • Pull structured information like names and dates from an ID card. Any named entity recognition by providing a JSON template. I test both of these use cases below.

Its efficient design makes it fast and affordable to run even in high demand production environments.

Environment and setup for Discover GLM-OCR: The New Lightweight OCR AI by GLM

System used

  • OS: Ubuntu
  • GPU: Nvidia RTX 6000 with 48 GB VRAM

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 12s

Set up and run GLM-OCR

I installed the prerequisites, then wrote a small script:

  • Import the installed libraries.
  • Specify the model.
  • Download the model and load it onto a GPU.
  • Use a simple prompt template that points to a local image.
  • Set the task to text recognition and process the input with hyperparameters.
  • Decode the model output and print the result.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 118s

Performance and resource usage with Discover GLM-OCR: The New Lightweight OCR AI by GLM

  • Model size: 2.65 GB.
  • Inference speed: very quick. The model came back with responses in a jiffy.
  • VRAM consumption: about 2.5 GB during inference. You can easily run it on a decent modern CPU.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 162s

Tests and results using Discover GLM-OCR: The New Lightweight OCR AI by GLM

English text recognition from an image

I pointed the prompt to a simple English text image and asked for text recognition. It returned the output very quickly.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 284s

Handwritten letter with crossed-out words

I tested a handwritten letter with some crossed out words.

  • It read the text well.
  • It did not hallucinate crossed-out words, which is good.
  • It captured small details like dots.
  • It missed an apostrophe in one place.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 243s

Language support limitations

From the model card, this is a bilingual model for English and Chinese. I tested Hindi, Polish, French, and Arabic. As expected, it did not support these other languages. At times it hung or printed gibberish. This is a big limitation that needs to be fixed because the competition in OCR has gone a long way up.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 289s

Table recognition on an invoice

I provided an invoice image and ran table recognition.

  • It detected just the table.
  • It returned all values correctly in my checks.
  • Accuracy of text and table recognition is quite good given the model size.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 335s

Formula extraction to LaTeX

I tested formula extraction. It returned the formula as LaTeX. It was spot on.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 376s

Structured information extraction with a JSON schema

I used a predefined JSON schema and asked the model to extract fields from the same invoice.

  • The output was a properly formatted JSON object containing the extracted invoice data.
  • This shows the model's ability to understand document structure and extract specific fields into a machine-readable format.
  • I am very impressed by the speed.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 404s

Architecture of Discover GLM-OCR: The New Lightweight OCR AI by GLM

Components

  • Visual encoder: they are using cog with visual encoder, pre-trained on large-scale image text data to process and understand document images.
  • Cross-modal connector: a very lightweight connector that bridges vision and language by downsampling visual tokens.
  • Text decoder: their own GLM.5 billion as a decoder generates text output. It does a very fine job.

Screenshot from Discover GLM-OCR: The New Lightweight OCR AI by GLM at 458s

Training

  • The model is trained using multi-token prediction loss and stable full tasks reinforcement learning, which improves both training efficiency and recognition accuracy.

Benchmarks

I haven't got any benchmarking information yet. I don't see much on their model card because the model was just released.

Final thoughts on Discover GLM-OCR: The New Lightweight OCR AI by GLM

GLM-OCR is practical, small, fast, and accurate on text, tables, formulas, and structured extraction. The bilingual scope is a clear limitation right now, and broader multilingual support would make it much more useful. Even so, it already performs well for English and Chinese documents, and I expect it to keep evolving.

sonuai.dev

Sonu Sahani

AI Engineer & Full Stack Developer. Passionate about building AI-powered solutions.

Related Posts