Introducing Dots.OCR: Revolutionary Multilingual Document Processing

Dots.OCR is a groundbreaking 1.7B parameter vision-language model that unifies layout detection and content recognition for multilingual document processing. Despite its compact size, it achieves state-of-the-art performance across text, tables, and reading order while supporting 100+ languages with unprecedented accuracy.

Revolutionary Vision-Language Architecture

Dots.OCR represents a paradigm shift in document processing technology. Unlike traditional OCR systems that rely on complex multi-model pipelines, our approach unifies layout detection and content recognition within a single 1.7B parameter vision-language model. This streamlined architecture achieves state-of-the-art performance while maintaining computational efficiency and simplicity.

Unprecedented Multilingual Capabilities

Supporting over 100 languages and scripts, Dots.OCR demonstrates robust parsing capabilities for low-resource languages where traditional OCR systems often fail. Our model achieves decisive advantages across both layout detection and content recognition on multilingual document benchmarks, making it truly universal for global document processing needs.

State-of-the-Art Performance

Dots.OCR achieves SOTA performance for text, tables, and reading order on OmniDocBench while delivering formula recognition results comparable to much larger models like Doubao-1.5 and Gemini 2.5-Pro. Despite using a compact 1.7B LLM foundation, our model consistently outperforms competitors in comprehensive benchmarks across multiple document types and languages.

Unified Architecture Advantage

By leveraging a single vision-language model, Dots.OCR offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Task switching is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO.

Efficient and Fast Performance

Built upon a compact 1.7B LLM foundation, Dots.OCR provides faster inference speeds than many other high-performing models based on larger foundations. This efficiency makes it practical for real-world deployment scenarios where both accuracy and speed are critical requirements.

Open Source and Research-Driven

Dots.OCR is available as an open-source project, enabling researchers and developers worldwide to contribute to advancing document intelligence. Our research-driven approach ensures continuous improvement and adaptation to emerging challenges in multilingual document processing. The model, code, and benchmarks are freely available to foster innovation in the OCR community.

Getting Started with Dots.OCR

Ready to experience the future of document processing? Visit our GitHub repository at github.com/rednote-hilab/dots.ocr for installation instructions, or try our live demo at dotsocr.xiaohongshu.com. Whether you're a researcher, developer, or enterprise user, Dots.OCR provides the tools you need for intelligent multilingual document processing.