Mistral OCR 4

A tangential observation: the video on the linked page wasn't what I expected. I thought Mistral was a european AI company, so I didnt expect the video to be filmed in San Francisco featuring three people who don't seem to be european.

I'm not against them being a global organization, that's wonderful. I was just surprised. I expected a parisian office and european accents.

Not well tested. It switched all U.S. (") double quotation marks to UK-style (') single quotation marks, ignoring the source document. Useless in the US.

It'll be interesting to see how this ranks against https://github.com/baidu/Unlimited-OCR

Tested with Malayalam, normal handwriting got accurate but a slight different style got detected as kannada. Have samples if required, which sarvam got done with 99% accuracy leaving one text error.

It's cheap at $4/1k, but I'm hesitant to even benchmark this one again since the previous versions were all "98% accurate based on internal benchmarks of 4 pdfs" and ended up falling short of almost everything else on the market [1].

Even in this one, they just report that OlmOCRBench and OmniDocBench have "known limitations" and that's why they report flagship numbers from their internal benchmark.

https://getomni.ai/blog/benchmarking-open-source-models-for-...

"A note on out-of-scope use. OCR 4 is a document-understanding model, not a decision-maker. It is not intended for medical diagnosis, legal advice or judgment, high-stakes financial decisions, safety-critical systems, real-time/latency-sensitive processing, or non-document inputs (raw audio, video, etc.). "

Can't wait for the "oh so innovative" manager who will suggest during the next meeting "Ok... but what if WE used it for high-stakes financial decisions on non-document inputs like a photo from my phone?"

I guarantee you somebody on HN is going to comment about this "idea" next week.

The comparisons rank it against GPT and Gemini but not Claude. Is Claude's vision support simply not competitive when it comes to OCR tasks?

Little on differences other than bounding boxes and double the price compared to their previous OCR v3 model from December - https://mistral.ai/news/mistral-ocr-3/ - other benchmarks were used back then.

This runs for free on CPU https://github.com/kouhxp/textsnap

Recently I tied OCR with Opus 4.8. (I know, not technically right tool for the job). All I needed to do was extract dates from receipts. It got about 20% of the dates wrong yet rated all as “high confidence”.

Should have probably tried a more OCR specific model

I was processing 55 year old paper files, most of them severely degraded, with its predecessor model. I was very impressed! I also tried Abbyy Finereader but it didn't even come close in my experience.

Does anyone know of OCR benchmarks that include hand-written documents? I'm currently using Gemini pro 3 for this, and error rates are quite good, but it's a little bit pricey, and I'd be interested in a cheaper model that could perform as well, but almost all the OCR benchmarks I'm aware of (and I believe all the ones included in this announcement) are about printed/typeset text.

Is there a complete list of the languages they support, and benchmarks by language, instead of just "Rare Languages"?

This has been a niche where Mistral has actually been successful. Btw, Hindi and Japanese are bucketed in "Rare Languages," which is odd.

I wonder how it does compare to reducto, pulse, extendai.

Way too expensive. Google vision OCR (which they failed to compare against), is $1.50 per 1k pages. Vs $4 from Mistral.

Are there benchmarks for how this performs on charts, or maybe more accurately, plots? I've yet to find a model that can digitize a plot into X,Y points with some accuracy in my use case of digitizing old datasheets.

Is there something wrong with their certificate? Chromium is saying https isn't valid

Not opensource right?

1000 pages for $4? damn how does it compare to llama parse I wonder

Do these models (this one or its competitors) do handwriting recognition?

starting y axis from 50 and 95 is a bit mileading

After paying for Mistral and using it for a while I genuinely hated it. It's a productivity black hole and can't realistically compete with anyone. I chose it only because it was European, but no. I'd rather let my one year subscription go to waste than use anything 'Mistral'.

Not well tested. It switched all U.S. (") double quotation marks to UK-style (') single quotation marks, ignoring the source document. Useless in the US.

Tested with Malayalam, normal handwriting got accurate but a slight different style got detected as kannada. Have samples if required, which sarvam got done with 99% accuracy leaving one text error.

Even in this one, they just report that OlmOCRBench and OmniDocBench have "known limitations" and that's why they report flagship numbers from their internal benchmark.

https://getomni.ai/blog/benchmarking-open-source-models-for-...

This runs for free on CPU https://github.com/kouhxp/textsnap

Is there a complete list of the languages they support, and benchmarks by language, instead of just "Rare Languages"?

I wonder how it does compare to reducto, pulse, extendai.

starting y axis from 50 and 95 is a bit mileading

I'm not against them being a global organization, that's wonderful. I was just surprised. I expected a parisian office and european accents.

Unfortunately Europeans are terrible customers for making money. They ask a lot of questions and they're very stingy with their wallets. Americans on the other hand ...

~Any borderline-large European tech company will have an office on the US west coast, for sales if nothing else. And probably sales engineering. The timezone difference is eight to ten hours; there is really no way around it.

(I did work for one which had an office in Vancouver, instead; same tz.)

To the best of my knowledge, most of the founding team started their careers in the US ( meta,etc..) and their primary investors are US VCs. In that regard, they smartly benefit on both side : US funding and European brains

There is even like an american flag flying high in the background

It'll be interesting to see how this ranks against https://github.com/baidu/Unlimited-OCR

I guarantee you somebody on HN is going to comment about this "idea" next week.

Why would anybody do that you would simply get terrible results compared to dozens of other more capable models. It's for converting to text not answering questions. Just seems like you need some sort of weird angle to bring out an anti AI stance

“I delegated critical financial decisions to my OCR software, and you won’t believe what happened next.”

The comparisons rank it against GPT and Gemini but not Claude. Is Claude's vision support simply not competitive when it comes to OCR tasks?

I think until Fable, Claude's vision was significantly worse than GPT and Gemini in my personal experience. I eval almost every vision model since I work on screenshot to code conversion project: https://github.com/abi/screenshot-to-code.

Should have probably tried a more OCR specific model

> All I needed to do was extract dates from receipts

Was this... not basically a solved problem like 30 years ago? I'm pretty sure the shareware OCR tool that came with a black and white scanner I had at one point would do better than 20% wrong.

Opus is very good at OCR. Way better than the small 1-4B VLMs. If Opus failed, most likely those smaller models will fail as well.

I do not believe this story.

Opus 4.8 scanned hundreds of PDFs for me recently with the worst handwriting imaginable. 100% successful, other than one record where even I could not figure out what was written.

I used Abbyy Finereader for several years. I loved it. I completed some large projects with it. Modern VLMs put classic FineReader to shame for processing low-resolution/degraded/non-standard text.

I'm personally using the small Qwen 3.5 models. If you have an OCR problem, Mistral OCR 4 is probably great. Open weights models that you can run on a laptop may also work great.

This has been a niche where Mistral has actually been successful. Btw, Hindi and Japanese are bucketed in "Rare Languages," which is odd.

I read that as "languages under-represented in the training set".

Way too expensive. Google vision OCR (which they failed to compare against), is $1.50 per 1k pages. Vs $4 from Mistral.

interesting - an equivalent Azure Document Intelligence service (scanning with layout) is 10$/1k

Is there something wrong with their certificate? Chromium is saying https isn't valid

Looks good to me on both brave (on android) and firefox (on windows 11). Lets see what ssl labs says (it is running now)

https://www.ssllabs.com/ssltest/analyze.html?d=mistral.ai&la...

Looks good so far, A+ on ipv4 as well as ipv6

Edit: I also asked Gemini 3.1 Pro to analyze the certificate and it looks good

It looks like you have shared an `about:certificate` URL containing a chain of three Base64-encoded X.509 TLS/SSL certificates. This specific chain is used to secure connections to *mistral.ai*.

Here is the decoded breakdown of the certificate chain you provided:

## Certificate Chain Overview

This is a standard three-tier certificate chain issued by Google Trust Services for the Mistral AI domain.

---

### 1. Leaf Certificate (End-Entity)

This is the specific certificate issued to the website to verify its identity and encrypt traffic.

* *Subject (Common Name):* `mistral.ai` * *Subject Alternative Names (SANs):* `mistral.ai`, `workers.mistral.ai` * *Issuer:* WE1 (Google Trust Services) * *Valid From:* June 13, 2026 * *Valid To:* September 11, 2026 * *Key Type:* Elliptic Curve (ECDSA)

### 2. Intermediate Certificate

This certificate acts as a bridge between the website's certificate and the trusted Root CA.

* *Subject:* WE1 (Google Trust Services) * *Issuer:* GTS Root R4 (Google Trust Services LLC) * *Valid From:* December 13, 2023 * *Valid To:* February 20, 2029 * *Key Type:* Elliptic Curve (ECDSA)

### 3. Root Certificate

This is the foundational trust anchor pre-installed in browsers and operating systems.

* *Subject:* GTS Root R4 (Google Trust Services LLC) * *Issuer:* GTS Root R4 (Self-signed) * *Valid From:* June 22, 2016 * *Valid To:* June 22, 2036 * *Key Type:* Elliptic Curve (ECDSA)

Not opensource right?

The weights do not appear to be downloadable, "contact sales for self hosting"

1000 pages for $4? damn how does it compare to llama parse I wonder

I was just using infinity parser 2 (flash, to be fair) for pennies self-hosted to run through thousands of pages of documents with remarkable confidence. I decided to use https://huggingface.co/datasets/allenai/olmOCR-bench to determine what was the best OCR tool, yesterday, but I've got no idea what the best is now. What is the dominant OCR eval right now? Between Baidu and Mistral this morning, I wonder if there's a new tool to switch to..

(jerry from llamaindex here) we're gonna benchmark on ParseBench and report the results!

Or Apples local OCR/Vision models?

Do these models (this one or its competitors) do handwriting recognition?

Yes, we've been using Transkribus for this extensively. My wife is a historian who spends quite a bit of time sorting through old letters and diaries, and it has been a considerable quality of life improvement.

Even if you are able to read someone's scratches, having a model to do the bulk lifting saves your eyes a lot of squinting. One thing that makes Transkribus useful for research vs a chat interface is that it can line up its interpretation alongside the original image so you can examine its work directly.

In the sense that you can get similarity scores for individual characters referenced against a known database of characters written by various individuals. You can get stylometry scores out of small LLMs that do demographic segmentation based on writing style using the same methods.

They won't have the capacity to be fed an image of handwritten text and say "Ahh, this is a note written by Winston Churchill!". You could very easily use these models and your agent framework of choice, like Hermes, the Segment Anything models, and other foss tooling to build a dedicated, specialist handwriting recognition system. Or facial recognition, or fingerprint recognition, etc - these sorts of things can be done very procedurally, without a lot of interpretive AI.

Yes, we have successfully used Mistral OCR for digitizing handwritten forms. You always have low percentage that need human review and adjustment, but overall Mistral has been highly accurate (their price is amazing, too).

If you mean handwriting to text then yes

There is even like an american flag flying high in the background

“I delegated critical financial decisions to my OCR software, and you won’t believe what happened next.”

> All I needed to do was extract dates from receipts

Was this... not basically a solved problem like 30 years ago? I'm pretty sure the shareware OCR tool that came with a black and white scanner I had at one point would do better than 20% wrong.

Opus is very good at OCR. Way better than the small 1-4B VLMs. If Opus failed, most likely those smaller models will fail as well.

I do not believe this story.

Opus 4.8 scanned hundreds of PDFs for me recently with the worst handwriting imaginable. 100% successful, other than one record where even I could not figure out what was written.

I used Abbyy Finereader for several years. I loved it. I completed some large projects with it. Modern VLMs put classic FineReader to shame for processing low-resolution/degraded/non-standard text.

I'm personally using the small Qwen 3.5 models. If you have an OCR problem, Mistral OCR 4 is probably great. Open weights models that you can run on a laptop may also work great.

I read that as "languages under-represented in the training set".

interesting - an equivalent Azure Document Intelligence service (scanning with layout) is 10$/1k

Looks good to me on both brave (on android) and firefox (on windows 11). Lets see what ssl labs says (it is running now)

https://www.ssllabs.com/ssltest/analyze.html?d=mistral.ai&la...

Looks good so far, A+ on ipv4 as well as ipv6

Edit: I also asked Gemini 3.1 Pro to analyze the certificate and it looks good

It looks like you have shared an `about:certificate` URL containing a chain of three Base64-encoded X.509 TLS/SSL certificates. This specific chain is used to secure connections to *mistral.ai*.

Here is the decoded breakdown of the certificate chain you provided:

## Certificate Chain Overview

This is a standard three-tier certificate chain issued by Google Trust Services for the Mistral AI domain.

---

### 1. Leaf Certificate (End-Entity)

This is the specific certificate issued to the website to verify its identity and encrypt traffic.

### 2. Intermediate Certificate

This certificate acts as a bridge between the website's certificate and the trusted Root CA.

* *Subject:* WE1 (Google Trust Services) * *Issuer:* GTS Root R4 (Google Trust Services LLC) * *Valid From:* December 13, 2023 * *Valid To:* February 20, 2029 * *Key Type:* Elliptic Curve (ECDSA)

### 3. Root Certificate

This is the foundational trust anchor pre-installed in browsers and operating systems.

* *Subject:* GTS Root R4 (Google Trust Services LLC) * *Issuer:* GTS Root R4 (Self-signed) * *Valid From:* June 22, 2016 * *Valid To:* June 22, 2036 * *Key Type:* Elliptic Curve (ECDSA)

The weights do not appear to be downloadable, "contact sales for self hosting"

(jerry from llamaindex here) we're gonna benchmark on ParseBench and report the results!

Or Apples local OCR/Vision models?

If you mean handwriting to text then yes

Opposite advice. It's very useful to me for dev and general tasks.

Been using Claude in parallele, it's better not not that much, just 10x (or 100x ?) more expensive.

Mistral's coding models aren't on par with current SOTA US and Chinese models if that's what you're referring to, but I rather like their OCR models.

> After paying for Mistral and using it for a while I genuinely hated it

For OCR?

Same, I got a refund 3 days later. It is unusable.

The armies of people desperate to defend mistral, scouring the internet for any of the hundreds of negative posts made about it daily is pathetic. There's a reason it needs 'fanboys' and 'defenders'... it sucks. Id have loved to use a European alternative, but Europeans need to get serious and actually offer an alternative that has value other than "it's trash, but it has a Made in Europe badge".

what did you use it for and when?

Unfortunately Europeans are terrible customers for making money. They ask a lot of questions and they're very stingy with their wallets. Americans on the other hand ...

(I did work for one which had an office in Vancouver, instead; same tz.)

Mistral just hired as CMO a Seattle based former Amazon/Google VP¹ , so seems their US based presence is growing.

¹ The one locally famous for being sued by Amazon for non compete back when non compete were a thing: https://www.geekwire.com/2020/amazon-sues-former-aws-marketi...

And US users spend much more than their EU counterpart

I think his comment is referring to a scenario where a decision is made on financial numbers that are misrecognized. E.g. 9.0% actual is OCR’d as 90%

Mistral just hired as CMO a Seattle based former Amazon/Google VP¹ , so seems their US based presence is growing.

¹ The one locally famous for being sued by Amazon for non compete back when non compete were a thing: https://www.geekwire.com/2020/amazon-sues-former-aws-marketi...

I think his comment is referring to a scenario where a decision is made on financial numbers that are misrecognized. E.g. 9.0% actual is OCR’d as 90%

And US users spend much more than their EU counterpart

How long have you been testing this? Have you noted a large improvement? I tested Opus for this quite a while ago (maybe 4.5? Whatever was out about a year ago), and it performed quite poorly on my use case.

I do not believe this story, because of the message I just posted above.

That's not really productive lol, I'm glad it worked for you but these models are non-deterministic and 'YMMV' very much applies everywhere. I had it parse receipts (in fairness, in variable lightning), all taken from iPhone cameras in the past year. And yeah, not a great job, about 20% failed to get the date correct. (Not outrageously wrong, e.g 05/20/2026 becomes 05/23/2026.

YMMV, glad it worked for you.

I believe it. Makes me curious what your prompt was that got such a good result out of Opus.

thanks I'm going to have to check whats going on with my setup then

I think OP meant converting handwriting to text, not identifying a person based on their handwriting style! (but that sounds quite interesting)

Yep that's what I mean, thanks :)

thanks I'm going to have to check whats going on with my setup then

Mistral's coding models aren't on par with current SOTA US and Chinese models if that's what you're referring to, but I rather like their OCR models.

> After paying for Mistral and using it for a while I genuinely hated it

For OCR?

Same, I got a refund 3 days later. It is unusable.

I believe it. Makes me curious what your prompt was that got such a good result out of Opus.

Yep that's what I mean, thanks :)

I think OP meant converting handwriting to text, not identifying a person based on their handwriting style! (but that sounds quite interesting)

I do not believe this story, because of the message I just posted above.

YMMV, glad it worked for you.

Are you sure you weren't using Sonnet or a low-effort reasoning mode?

I have put together an internal benchmark on 1000s of business documents with weird tables, structure, etc. that I run on every relevant model release. Opus 4.8 performs very very well. But it is obviously overkill for the task (and expensive at doing so). I just wanted to respond to the OP.

Opposite advice. It's very useful to me for dev and general tasks.

Been using Claude in parallele, it's better not not that much, just 10x (or 100x ?) more expensive.

what did you use it for and when?

Sure, well for me it isn't. It has been awful for even toy tasks that opencode's free plan did without an issue. The general sentiment about it is that it is really bad. I wish I knew before paying.

Are you sure you weren't using Sonnet or a low-effort reasoning mode?

Sure, well for me it isn't. It has been awful for even toy tasks that opencode's free plan did without an issue. The general sentiment about it is that it is really bad. I wish I knew before paying.

I'm assuming that the reason I didn't have good success rate is because it was not scanned documents, but photographs, and lighting conditions weren't always ideal. I think scanned business documents are a happy-case scenario in a way. (obv, you seem to run it against some complex documents, so that's impressive)

I’m curious what your findings are for the best model for your use case

Today, we're releasing Mistral OCR 4, featuring bounding boxes, block classification, and inline confidence scores alongside extracted text. The model supports 170 languages across 10 language groups, runs in a single container for fully self-hosted deployments, and serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines. OCR 4 is a small, focused model, and this post covers what's new, how it performs on public and internal benchmarks, the known limitations of those benchmarks, and guidance on when to use the model API versus Document AI.

Highlights

Breakthrough performance. Independent annotators prefer OCR 4 over every leading OCR and document-AI system tested, with win rates averaging 72%, alongside the top overall score on OlmOCRBench (85.20). See Benchmarks below for methodology and known scoring limitations.
Segmentation, not just text. Alongside the extracted text, OCR 4 returns bounding boxes, typed-block classification (titles, tables, equations, signatures, and more), and inline confidence scores. Bounding boxes, our most-requested capability, localize text for in-context highlighting and reliable data pipelines. At the same time, block types and confidence scores drive source-grounded citations, redactions, and human-in-the-loop verification.
Integrated with Mistral Search Toolkit (public preview). OCR 4 is an ingestion component of Search Toolkit, Mistral's open-source, composable search framework, announced at the AI Now Summit. Its structured output supplies citation-ready inputs to the toolkit's ingestion, retrieval, and evaluation workflow for RAG and enterprise search.
Multilingual coverage. Support for 170 languages across 10 language groups, with measurable gains on rare and low-resource languages where several competing systems degrade.
Run on your own infrastructure. OCR 4 is compact enough to deploy on a single container, keeping document data in your environment for residency, sovereignty, and compliance, while supporting cost-efficient, high-throughput batch processing. Self-managed deployment is available to enterprise customers.

Overview

Mistral OCR 4 extracts and structures content from a wide range of documents. Where previous generations focused on converting a page into clean text and tables, OCR 4 returns a structured representation of the document. Each block is localized with a bounding box, classified by type, and inline confidence scores are generated per-page and per-word. Downstream systems, therefore, have access not only to what the document says but also to where each element sits, what role it plays, and how confident the model is in each region.

This structure supports several downstream workloads:

Semantic chunking for RAG: clean, classified blocks become better retrieval units.
Structural primitives for agents: agents move from reading documents to acting on them (form filling, invoice processing, compliance checks).
Structured content for connectors: consistent, typed output for ingestion and indexing pipelines.

OCR 4 accepts common enterprise formats, including PDF, DOC, PPT, and OpenDocument, and supports 170 languages across 10 language groups, including rare and low-resource languages that many systems handle poorly. As a compact model deployable in a single container, it is suited to both cost-sensitive and high-volume deployments. It can run fully self-hosted, allowing organizations with data-sovereignty requirements to keep document data within their own infrastructure.

Developers integrate the model via API, and teams can use Document AI in Mistral Studio for an application-level, no-code path to the same engine. Mistral OCR 4 through the API is priced at $4 per 1,000 pages, with a 50% Batch-API discount, reducing the cost to $2 per 1,000 pages. Document AI is priced at $5 per 1,000 pages.

Benchmarks

“We benchmarked Mistral OCR 4 against the leading agentic document parsers across a chart and figure dense financial QA dataset and reached equivalent accuracy at roughly 8x lower cost and 17x lower latency. For production use cases at scale, that delta compounds fast."
- Aidan Donohue, AI Engineer, Rogo

To evaluate OCR 4, we compared it against leading AI-native OCR models, frontier general-purpose models, enterprise document services, and our own Mistral OCR 3.

Human Preference Evaluations

Automated benchmarks carry the scoring artifacts described above, so we complemented them with a head-to-head human evaluation on documents chosen to reflect real usage. We assembled 600+ documents across 12+ languages, sourced from third-party vendors to represent real industry use cases, and asked independent annotators to blindly rank each competitor's output against OCR 4's, document by document.

Annotators preferred OCR 4 in the majority of documents across all systems tested. Because these are human judgments on realistic documents rather than string comparisons against fixed references, they sidestep much of the annotation and formatting noise that affects automated scores.

Overall Performance

“Mistral OCR is roughly 4x faster per page than our incumbent provider, an impressive result for the high-volume docketing workflows where speed is critical to managing our customers' IP timelines.”
- Ivan Mihailov, AI engineer, Anaqua

In addition to placing first in our human preferences, OCR 4 achieves the top overall score amongst the models we tested on the public OlmOCRBench (85.20) and leads our internal Crawl Multilingual evaluation (.98), ahead of both AI-native and enterprise solutions.

On OmniDocBench, OCR 4 achieves a score of 93.07. We report this figure with a caveat: both OlmOCRBench and OmniDocBench have known limitations in how they score certain outputs, and a single aggregate number can both understate and overstate real-world performance.

When we audited the mismatches behind our scores, most were not model errors but artifacts of how the benchmarks compare output. The recurring categories:

Ground-truth errors. Some reference annotations are themselves incorrect: missing or extra text, transcriptions of redacted regions, or typos (for example, a cited author's name misspelled in the reference but read correctly by the model from the page). The output matches the source document, yet it is still marked wrong.
Equivalent math notation. Different LaTeX that renders identically is counted as a mismatch, The rendered equation is correct; the string comparison is not.
Equation segmentation. Whether an expression is emitted as a single equation or split into several inline fragments affects the match, even when the rendered content is identical, because the matcher cannot align the pieces.
Multi-column reading order. Words split across a column boundary (for example, "certifi-cates") and column-ordering assumptions cause correct extractions to be scored as reading-order failures.
Block-type attribution. The benchmark does not expect headers/footers in the output. To resolve this we strip headers footers from our output before scoring. But the test then checks for a string that also happens to be the title of the page which should actually be present and flags it incorrectly.

These artifacts concentrate in mathematical, scientific, and multi-column documents, and they more often penalize correct output than reward incorrect output. We therefore treat the aggregate score as directional rather than definitive.

We report these numbers to indicate where OCR 4 stands, and recommend evaluating on your own documents.

Performance Details

Crawl Multilingual breakdown. On our internal multilingual evaluation, OCR 4 leads across all eight language groups — English, Western Europe, Eastern Europe, Middle Eastern, Chinese, East Asian, Southeast Asian, and rare languages. The gap is widest for rare and low-resource languages, where many competing systems degrade sharply, while OCR 4 maintains high accuracy.

Recommended use cases

OCR 4 supports both high-volume pipelines and interactive document workflows, including:

Document parsing and extraction: complex, multilingual documents.
Retrieval-Augmented Generation (RAG): structured, classified, citation-ready content for semantic chunking and source-grounded answers. With Search Toolkit, OCR 4 output can be fed directly into retrieval pipelines.
Agentic workflows: providing agents with the structural primitives to complete tasks such as form filling, invoice processing, and compliance checks, especially in legal, financial services, and healthcare.
Structured data pipelines using confidence scores to enable efficient use of human verifiers: form/invoice extraction, redactions, and compliance-driven processes.
Enterprise search and knowledge bases: OCR as a data-source component for custom ingestion and entity extraction.

Early users are applying OCR 4 to turn invoices into structured fields, digitize company archives, extract clean text from technical and scientific reports, and power enterprise search.

A note on out-of-scope use. OCR 4 is a document-understanding model, not a decision-maker. It is not intended for medical diagnosis, legal advice or judgment, high-stakes financial decisions, safety-critical systems, real-time/latency-sensitive processing, or non-document inputs (raw audio, video, etc.).

OCR 4 API: Understanding Your Options

Mistral's OCR 4 is available through a single API endpoint. Every request runs the same underlying OCR model and always returns extracted content, bounding boxes, block types, confidence scores, and markdown-structured text. What varies is how much you layer on top.

Use OCR 4 in pure extraction mode when you want to:

Embed fast, accurate document extraction directly into your application, agent, or data pipeline.
Work directly with the raw response, bounding boxes, block types, and confidence scores to drive custom downstream logic.
Run high-volume or batch ingestion with full control over throughput and cost via the Batch API.
Self-host for strict data-privacy, sovereignty, or compliance requirements.

Activate Document AI capabilities (same endpoint, additional parameters) when you want to:

Return structured JSON in a schema you define — pass a JSON schema alongside your document, and the OCR output is fed to mistral-small-2603 to generate content shaped to your spec.
Annotate detected images with structured JSON by passing an image annotation schema, triggering an additional vision-language model call per image.
Use a custom prompt alongside a JSON schema to guide how the extracted content of the full document is interpreted or summarized.
Enable business users, solutions teams, or pilots to produce structured results without writing downstream parsing logic.

The practical decision rule: if you need raw extracted content, use OCR 4 as-is. If you need the output reshaped into a structured format, annotated with domain-specific fields, or processed with a custom instruction, add the Document AI parameters to the same call. You always get the OCR result regardless; Document AI simply adds structured layers on top of it.

Now available

“The availability of Mistral Document AI with OCR 4 in Microsoft Foundry marks an important milestone in our partnership. Together, we’re enabling customers to bring advanced, structured document understanding directly into their AI workflows, combining Mistral’s innovation with Microsoft’s enterprise platform to deliver scalable, trusted solutions for real-world business needs.”
-Kimmi Grewal, VP, AI Ecosystem Partnerships, Microsoft

Both Mistral OCRv4 and Document AI (powered by OCRv4) are available via API through Mistral Studio, Amazon SageMaker, Microsoft Foundry, and coming soon Snowflake Parse Document. For organizations with stringent data-privacy requirements, OCR 4 also offers a self-hosting option so sensitive information stays within your own infrastructure. To explore self-deployment, let us know.

Get started

We offer a few ways to get started and learn more quickly.

Try OCR 4. The new Getting Started with OCR 4 Cookbook walks through a first extraction, working with bounding boxes, and block classification.
OCR 4 webinar. We'll cover what's new in OCR 4 with demos and Q&A on July 7th at 6:00 PM CET. Register for the OCR4 in Production webinar.
Contact Sales for more information.

OCR 4

Premier

The world's best document extraction and understanding model.

OCR

Multimodal

Text-to-text

OCR

$4 / 1000 pages

Batch-API

$2 / 1000 pages

Document AI

$5 / 1000 pages

OCR in production.

Learn about the new features in the OCR4 release and how can they be used inside workflows and search toolkit to get production grade indexing.

Hacker Times