TechCrunch News 03月07日
Mistral’s new OCR API turns any PDF document into an AI-ready Markdown file
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Mistral推出新的OCR API,可将PDF转为文本文件。它是多模态的,能检测图文混合内容,输出为Markdown格式,适用于多种场景,性能优于其他同类产品。

🌐Mistral OCR是多模态API,可将PDF转文本文件,能检测图文混合内容并创建边框。

📋输出为Markdown格式,便于开发者添加链接、标题等格式化元素,大语言模型训练依赖此格式。

🛡Mistral OCR性能优于谷歌、微软和OpenAI的API,适用于多种场景及数据处理需求。

Large language models work particularly well with raw text. Companies that want to create their own AI workflow know that it has become extremely important to store and index data in a clean format so that this data can be reused for AI processing.

That’s why Mistral is launching a new API today for developers who handle complex PDF documents. Mistral OCR is an optical character recognition API that can turn any PDF into a text file.

Unlike most OCR APIs, Mistral OCR is a multimodal API, meaning that it can detect when there are illustrations and photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output.

Similarly, Mistral OCR doesn’t just output a big wall of text. The output is formatted in Markdown, a formatting syntax that developers use to add links, headers and other formatting elements to a plain text file.

Large language models rely heavily on Markdown for their training data set. When you use an AI assistant, such as Mistral’s Le Chat or OpenAI’s ChatGPT, they often generate Markdown to create bullet lists, add links or put some elements in bold. Assistant apps seamlessly format the Markdown output into a rich text output.

“Over the years, organizations have accumulated numerous documents, often in PDF or slide formats, which are inaccessible to LLMs, particularly RAG systems. With Mistral OCR, our customers can now convert rich and complex documents into readable content in all languages,” Mistral co-founder and chief science officer Guillaume Lample said.

“This is a crucial step toward the widespread adoption of AI assistants in companies that need to simplify access to their vast internal documentation,” he added.

Mistral OCR is available on Mistral’s own API platform or through its cloud partners (AWS, Azure, Google Cloud Vertex, etc.). And for companies working with classified or sensitive data, Mistral also offers on-premises deployment.

According to the Paris-based AI company, Mistral OCR performs better than APIs from Google, Microsoft and OpenAI. The company has tested its OCR model with complex documents that include mathematical expressions (LaTeX formatting), advanced layouts or tables. It is also supposed to perform better with non-English documents.

Image Credits:Mistral

Given that Mistral OCR does one thing and one thing only, the company believes it is also faster than what’s out there. That’s not a surprise if you compare it with a multimodal large language model like GPT-4o, which also has OCR capabilities.

Mistral is also using Mistral OCR for its own AI assistant Le Chat. When a user uploads a PDF file, the company uses Mistral OCR in the background to understand what’s in the document before processing the text.

Developers will also use Mistral OCR with a RAG system to use multimodal documents as input in an LLM. And there are many potential use cases. For instance, I could see law firms using it to help them swift through huge volumes of documents.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mistral OCR PDF转换 Markdown格式 多模态API
相关文章