Giveaway: SUBSCRIBE our youtube channel to stand a chance to win an iPhone 17 Pro

DeepSeek-OCR Model Processes Complex Text Using 20x Fewer Tokens



This week Alibaba Cloud announced an artificial intelligence (AI) training optimization that reduces NVIDIA GPU usage by up to 82%. Today, DeepSeek launched the DeepSeek-OCR model that is capable of processing large and complex amounts of text but uses up to 20x fewer tokens than before.


With DeepSeek-OCR, training larger language model (LLM) models can be done without increasing the computing costs of AI data centers. This is done by processing data in the form of images instead of text. Researchers found that a document in the form of compressed images, the number of tokens needed to process it is smaller than the same document in the form of text.


DeepSeek-OCR can read not only text but also charts, chemical equations, simple geometric figures and natural images. In real-world use, a single A100-40G graphics card can support the generation of more than 200,000 pages of training data for both linear language models (LLM) and visual language models (LVM) per day.


Today’s announcement shows that China has a different paradigm for training AI models. In the West, more powerful and faster AI chips are needed, while in China, which is currently restricted from access to more advanced chips, a more efficient training system is being developed to train models using less powerful AI chips.


DeepSeek-OCR was publicly launched and is available on Github and HuggngFace today.

Previous Post Next Post

Contact Form