Whenever any progress is made, this is the logical conclusion. And yet, those who decide about how your time is being used, have an opposing view.
Yet, it occurs to me that that "guess and check" is exactly what I'm doing when trying to read my 6yo's writing. Often I will do a pass to detect the main sounds, but then I start thinking of what was current on his thoughts and see if I can make a match. Not surprisingly, often I do.
Ideally something that I can train with my own handwriting. I had a look at Tesseract, wondering if there’s anything better out there.
Ah, maybe I'll pick up Qin seal when I retire, if I retire.
Then again, getting this result from a heavily-generalized SOTA model is pretty incredible too.
Transkribus got a new model architecture around the corner and the results look impressive. Not only for trivial cases like text, but also for table structures and layouting.
Best of all, you can train it on your own corpus of text to support obscure languages and handwriting systems.
Really looking forward to it.
We almost solved OCR 20 years ago. Then we spent 20 years on the last percentage. We see the same in self-driving cars.
Am I nuts or is this wrong, not “perfect”?
It doesn’t look crossed out at all to me in the image, just some bleeding?
Still very impressive, of course
Hopefully next generations will feel the same about legal contracts, law in general, and Java code bases. They're incomprehensible not because of fonts but because of unfathomable complexity.
Exactly. Some people forget we live in a capitalist society, which does not prioritize or support the contentment of the masses. We exist to work for the owners or starve, they're not going to pay us to enjoy ourselves.
IMO, cyber security, for example, will have to become a government mandate with real penalties for non-compliance (like seat belts in cars were mandated) in order to force organizations to slow down, and make sure systems are built carefully and as correctly as possible to protect data.
This is in conflict with the hurtling pace of garbage in/garbage out AI generated stuff we see today.
Personally I found magistral-small-2509 to be overall most accurate, but it completely fails on some samples, while qwen3-vl-30b doesn't struggle at all with those same samples. So seems training data is really uneven depending on what exactly you're trying to OCR.
And the trade-off of course is that these are LLMs so not exactly lightweight nor fast on consumer hardware, but at least with the approach of using multiple you greatly increase the accuracy.
Same here, for diaries/journals written in mixed Swedish/English/Spanish and with absolutely terrible hand-writing.
I'd love for the day where the writing is on the wall for handwriting recognition, which is something I bet on when I started with my journals, but seems that day has yet to come. I'm eager to get there though so I can archive all of it!
When does a character model become a language model?
If you're looking at block text with no connections between letter forms, each character mostly stands on its own. Except capital letters are much more likely at the beginning of a word or sentence than elsewhere, so you probably get a performance boost if you incorporate that.
Now we're considering two-character chunks. Cursive script connects the letterforms, and the connection changes based on both the source and target. We can definitely get a performance boost from looking at those.
Hmm you know these two-letter groupings aren't random. "ng" is much more likely if we just saw an "i". Maybe we need to take that into account.
Hmm actually whole words are related to each other! I can make a pretty good guess at what word that four-letter-wide smudge is if I can figure out the word before and after...
and now it's an LLM.
I once visited a high school where they had a wall of signatures from every graduating senior going back to the 1920s or so. The "personality" evident in the signatures showed a steady decline, from very stylish in the oldest ones to mostly just poorly printed names in the 2020s.
"The comparison between handwriting and typing reveals important differences in their neural and cognitive impacts. Handwriting activates a broader network of brain regions involved in motor, sensory, and cognitive processing, contributing to deeper learning, enhanced memory retention, and more effective engagement with written material. Typing, while more efficient and automated, engages fewer neural circuits, resulting in more passive cognitive engagement. These findings suggest that despite the advantages of typing in terms of speed and convenience, handwriting remains an important tool for learning and memory retention, particularly in educational contexts."
https://pmc.ncbi.nlm.nih.gov/articles/PMC11943480/
You are literally handicapping yourself by not thinking with pen and paper, or keeping paper notes.
The future is handwriting with painless digitization for searchability, until we invent a better input device for text that leverages our motor-memory facilities in the brain.
When you sit down to think about it, what does it really even mean to do "more research"? What concrete phenomenon are you observing to decide what that is?
Across the journey from "subsistence agriculture", there have been countless approaches to nurturing innovation and discovery, but abstracting it into an abstract game measured by papers published and citations received is extremely novel and so far seems to correlate more with a waste and noise than it does discovery. Science and research is not in a healthy period these days, and the model that you describe, and seem to take for granted or may even be celebrating, plays a big role in why.
Because all the trends seem to indicate that to make a living people are working longer hours, holding multiple concurrent jobs (eg https://gameofjobs.org/are-americans-now-more-likely-than-ev...), and holding off retirement.
The UN estimates that around 500 million households or 2 billion people are still subsistence farmers. In 2025.
Fat lot of good competition has done them, especially when they don’t have enough surplus to participate in a market economy to begin with.
Historical handwriting, Gemini 3 is the only one which gave a decent result on a 19th century minutes from a town court in Northern Norway (Danish gothic handwriting with bleed through). I'm not 100% sure it's correct, but that's because it's so dang hard to read it to verify it. At least I see it gets many names, dates and locations right.
I've been waiting a long time for this.
They're very good at it.
Things may well reach a point elsewhere in the world finding out that some software is for sale in the European Union is itself a marker of quality, and therefore justifies some premium.
https://g.co/gemini/share/e173d18d1d80
This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.
Now the service economy is turning into the sharing economy, I think the only thing we are sharing is the greater profits and they are taking the lions share.
> ...those who decide about how your time is being used...
which stops individuals from:
> [spending] more time thinking, writing, playing piano, and taking walks — with other people.
Which it seems you would agree with. I don't see where they asserted whether this was a problem to address.
I do not think the executive class is actually in on the power of AI to increase productivity, but rather to increase reliance.
I actually think that will be the case. We're designing society for the technology, not the technology for the people in it. The human brain wasn't built to fit whatever gap is left by AI, regardless of how many words the technologists spew to claim otherwise.
For instance: AI already is undermining education by enabling mental laziness students (why learn the material when ChatGPT can do your homework for you). It seems the current argument is that AI will replace entry-level roles but leave space for experienced and skilled people (but block the path to get there). Some of the things LLMs do a mediocre but often acceptable job at are the things one needs to do to build and hone higher-level skills.
Which is exactly my experience with handwriting through my school years. When handwriting notes during lectures all focus goes to plotting down words, and it becomes impossible to actually focus on the meaning behind them.
So no, no retirees or students or unemployed or disabled in that figure.
So I'm not completely disagreeing with you, but I also am not too pessimistic, either. We will adapt, and benefit through the adoption of AI, even though some things will probably be lost, too.
Please share. I am out of the loop and my searches have not pointed me to the state of the art, which has seen major steps forward in the past 3 or 4 years but most of it seems to be closed or attached to larger AI products.
Is it even still called OCR?
Software providers are also likely to be specifying narrow “fit for purpose” statements and short (ish) support window. If costs go up too much, people will be using “inappropriate” and/or EOL stuff because the “right thing” is too expensive.
To be clear, this is a step in the right direction but is not the panacea.
The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")
Состояние удовл-
t N, кожные
покровы чистые,
[л/у не увел.]
В зеве умерен. [умеренная]
гипер. [гиперемия]
В легких дыха-
ние жесткое, хрипов
нет. Тоны серд-
[ца] [ритм]ичные.
Живот мяг-
кий, б/б [безболезненный].
мочеисп. [мочеиспускание] своб. [свободное]
Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
Condition satisfactory.
Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged.
In the throat [pharynx], moderate hyperemia [redness].
In the lungs, breathing is rigid [hard], no rales [crackles/wheezing].
Heart tones are rhythmic.
Abdomen is soft, painless.
Urination is free [unhindered].
Diagnosis: ARD (Acute Respiratory Disease)."I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.
“What doesn’t kill you, makes you stronger”. We will adapt and benefit, or we will not — time will tell.
It's most likely "но кашель сохр-ся лающий" ("but barking cough is still present"), not "кожные покровы чистые" ("the skin is clean"). Diagnose is probably wrong too. Judging by symptoms it should be "ОРЗ", but I have no idea what's actually written there.
Still, it's very, very impressive.
This one works, you can check the versions https://pastebin.com/QPjGHN8j
Not a chance, sorry.
With GPS we have seen people confidently drive past road closed signs and around barriers off bridges.
With self-driving technology, we have seen them defeat safe guards so they can sit in the back while the car accelerates up to 70 in a subdivision.
[0] https://ieeexplore.ieee.org/abstract/document/10832237
[1] https://arxiv.org/abs/2412.14737
Letter from George Boole to his sister Maryann, 18 December 1850. Archives of University College Cork, IE/BL/PP/BP/1/A/1/1/51.
To summarize the theme of this newsletter recently: Use AI, not too much, mostly to connect with the intelligence of other human beings, not AI. Previously I’ve discussed using AI to discover books and articles in the library, and artworks in the museum; in this newsletter, how AI can help us work with the letters and manuscripts in archives and special collections.
“All goes in the usual monotonous way.” That is the depressed sigh of George Boole in a letter to his sister Maryann in 1850. It was the spark for my book Equations from God: Pure Mathematics and Victorian Faith. Boole, the English mathematician who gave us the logic at the heart of the digital device you are reading this on, was teaching in Cork, Ireland at the time. On a cold December day, he wrote to Maryann about his feelings of profound loneliness. In a city that was on edge from religious strife and famine, he played piano at home to an empty room, and took long walks by himself. At the end of the day, he retreated to his equations, which seemed to transcend the petty differences of humanity.
But before developing my thesis about the fervent emotions behind Boole’s seemingly cold mathematical logic, I first had to read his damn handwriting. Talk about monotony! There were hundreds of letters and notebooks in his drifty scrawl. In retrospect, Boole’s handwriting is actually not that bad; I’ve encountered far worse since reading his in Cork. And it helped that I had taken a brief course on paleography, the art of deciphering handwritten historical documents. But it would have saved me a lot of time getting to the interesting interpretive phase of my research if a computer could have converted his handwriting into machine-readable text, as it already could for typeset text through a process called optical character recognition (OCR).
Since I wrote that book, university and industry labs have been trying to solve the incredibly difficult problem of handwritten text recognition (HTR). OCR quickly approached 99% accuracy for digitized books, whereas even the best HTR systems struggled to reach 80% — two incorrect words out of every ten. The issue is obvious: unlike the rigorous composition of books, handwriting is highly variable by author, and words are often indeterminate and irregularly arranged on a page.
This led to a number of experimental approaches, from crowdsourcing to machine learning. At the Roy Rosenzweig Center for History and New Media, we digitized the Papers of the War Department, 1784-1800, so that knowledgeable researchers and members of the public could transcribe them. Around the same time, University College London did the same with the manuscripts of the philosopher Jeremy Bentham. Later, the tranScriptorium Project and Transkribus used neural networks to improve accuracy, but required a significant training corpus.
Even with considerable prep work, error rates remained stubbornly high. Here’s Transkribus’s best guess at George’s letter to Maryann, above:
I am sorry to hear of Miss Davis
gent affliction. When you write
to her tele her so and also invite
her to visit you when she fal
that she can do so. I now
wish binvite you to visit me.
As mother is so much better she
may I think spare you and
the weather being colo in England
is change of air would I think
do you good Herr we have
days of the most lovely character
about twice a week At other
times it is well with intervals
of bright weather
but such day
вр/1/51<7>
Cock Dec 18 1850
My dear Sister
Your letter this morning
was a very welcome one & I wish
that I had anything interesting
to say in return But I have not
Ale gres in the usual Monotonous
way I give my lecture read
write take a long walk occasion
by & one or time a week go
out letter this is very much
the history of my life from
week to week I think I am
upon the whole I have woke
More recently, I’ve used this letter as a quick test of AI models. They have gotten incrementally better over the past three years, but I was frankly stunned when I put the letter into Gemini 3 Pro this week and asked it to have a go at the transcription:
Right Page
[Top Right Corner] Cork Dec 18 1850
[Salutation] My dear Sister
[Body] Your letter this morning was a very welcome one & I wish that I had anything interesting to say in return. But I have not All goes in the usual monotonous way. I give my lecture, read write take a long walk occasionally [try] & once or twice a week go out to tea This is very much the history of my life from week to week I think that upon the whole I have worked
Left Page
[Body] I am sorry to hear of Miss Davis’ [or Davin's] great affliction When you write to her tell her so and also invite her to visit you when she feels that she can do so. I now wish to invite you to visit me As mother is so much better she may I think spare you and the weather being cold in England a change of air would I think do you good Here we have days of the most lovely character about twice a week At other times it is wet with intervals of bright weather, but such days
Notes:
"Cork... England": The context suggests the writer is in Cork, Ireland, inviting their sister to visit from England to escape the cold.
"occasionally [try]": On the right page (line 8), the handwriting becomes very scribbled. It appears to say "take a long walk occasionally try & once or twice..." or possibly "occasionally by & once or twice...".
"Miss Davis": The name on the first line of the left page is difficult to decipher with certainty; it looks like "Davis," "Davin," or perhaps "Savin."
Gemini transcribed the letter perfectly: it figured out that the right side is the beginning of the letter, not the left (the letter actually continues on the other side of the paper, which accounts for the discontinuity between the two sides we are viewing); it left off the periods where Boole also (oddly) omitted this punctuation; and it includes a self-reflective analysis of where it might be wrong and provides alternative readings.
Even wilder, when you click on a “show thinking” tab, Gemini provides a long discourse on its approach and minute details about word choices:
Assessing Image Features
I'm now identifying key image features. The paper appears folded, with the date and salutation clearly on the right side, confirming it as the front page. The handwriting is typical 19th-century cursive, though legible. The location, Cork, and the addressee, "My dear Sister," are identified. This is helping me to prepare for accurate transcription.
Refining Text Analysis
I'm now zeroing in on a tricky word: "occasionally". I initially considered "exercise" or "every day", but the context and image suggest "occasionally," although the "y" is faint. Close inspection reveals loops indicating it's the intended word, despite the messy script. I am confident that it means "occasionally."
…
Refining Word Possibilities
I've been bouncing around many possibilities for that unclear phrase, but now I think I've cracked it. I initially tried "accompanied by," then "exercise", "excursion", and even "swimming". The letters "e" or "o" seem key to the preceding word. Focusing on the letter shapes, I am now confident the words I have are "take a long walk occasionally".
Determining Word Boundaries
I am now focusing on determining word boundaries, particularly in the phrase "take a long walk...". Initial guesses included "evening" or "exercise" for the illegible portion. But I'm now leaning towards "every" preceding the blurry term, with "evening" as the subsequent word, though "exercise" is still a contender. This leads to the phrase, "take a long walk every evening," or perhaps a variant like "exercise" or "excursion." I have reviewed the handwriting to narrow down the term that comes after "walk."
…
This thinking goes on for almost 2,000 words, and what’s remarkable is that it is essentially a verbalization of what you’re taught to do in a paleography class: assess the overall document first, determine key features, study letter shapes and strokes across the letter to refine your understanding of the particular script, consider context and word/phrase possibilities, think about the coherence of content, grammar, and usage, identify any contractions, proper names, and other oddities, etc.
Encouraged by this AI progress, I dropped one of the letters from the War Department archive into Gemini:
Roy Rosenzweig Center for History and New Media, Papers of the War Department, Charles Carroll to Alexander Hamilton, Doc no. 1799010160001.
And here’s Gemini 3 Pro’s transcription:
[Top Left] Dr [Dear] Sir pg 241.
[Top Right] Annapolis 1st Jan^y 1799
[Body] The Count de Moeliens observations on the opposite side were
transmittedto you in my letter dated early in last month. Not having been favored with y^r [your] answer, I take the liberty of transmitting a copy of his sentiments & The Count is anxiously waiting your determination – To what I have said in his favor I can add nothing. He is in my opinion a man of strict honor and integrity, and as such may be confided in: he is also, as far as I am able to judge a man of good sense. Wishing you many happy new years I am with sentiments of great regard and respect Dear Sir Y^r most hum. Serv^t [Your most humble Servant][Signature] Ch. Carroll of Carrollton
Notes on the text:
"transmitted": In the second line of the body, the word "transmitted" is crossed out in the original text, but the sentence is grammatically incomplete without it (or a similar verb). It is likely the author meant to replace it to avoid repetition with the word "transmitting" appearing a few lines later but forgot to insert the new word.
"y^r": This is a common 18th-century abbreviation for "your."
The Signature: Charles Carroll was the only Catholic signer of the Declaration of Independence and was known for signing his name as "Charles Carroll of Carrollton" to distinguish himself from his father and other relatives with the same name.
Another perfect job. Maybe too easy?
Time to bring out the final boss: Jane Austen. A letter to her sister Cassandra held in the Morgan Library:
The Morgan Library & Museum, Jane Austen’s Letters, 033. To Cassandra Austen, June 20–22, 1808, pages 4 and 1
May your courage always rise, Gemini.
Right Page (The beginning of the letter)
[Header] Godmersham, Monday June 20^th
[Salutation] My dear Cassandra
[Body] I will first talk of my visit to Canterbury, as Mrs. J. A.'s [Jane Austen's sister-in-law] letter to Anna cannot have given you every particular of it which you are likely to wish for. I had a most affectionate welcome from Harriot & was happy to see her looking almost as well as ever. She walked with me to call on Miss Bridges, when Mr. K. Valpy [?]...(The text continues but becomes heavily obscured by the vertical cross-writing)
Ok, this one’s hard for even a seasoned Austen scholar, with bleeding from the other side of the letter and extensive cross-writing. (Austen turned the page and continued writing at 90º to the first part of the letter to get more onto a single sheet; this was not uncommon in the nineteenth century, and rather annoying.) Gemini does the right thing here: rather than venture a guess like a sycophantic chatbot, it is candid when it can’t interpret a section of the letter. Nevertheless, it went on to get most of the horizontal text correct and some key phrases from the cross-writing too.
At this point, AI tools like Gemini should be able to make most digitized handwritten documents searchable and readable in transcription. This is, simply put, a major advance that we’ve been trying to achieve for a very long time, and a great aid to scholarship. It allows human beings to focus their time on the important, profound work of understanding another human being, rather than staring at a curlicue to grasp if it’s an L or an I. Could we also ask Gemini to formulate this broader understanding? Sure we could, but that’s the line that we, and our students, should resist crossing. The richness of life lies in the communion with other humans through speech, the written word, sounds, and images.
If we want to use AI more extensively in the classroom, we can actually take it in the opposite direction, folding its HTR insights back into human comprehension. Take Gemini’s 2,000-word thinking analysis of Boole’s script. I could imagine using that with students in a paleography class to help them understand the steps in the process of deciphering a letter or manuscript. After a few examples, guided by this clever narrative feature of AI, students could become independently able to read these texts.
Was it nice to spend a month in Cork trying to do this deciphering on my own, without AI assistance? It was! Cork’s a wonderful city. Did the long time it took to read each letter provide some mental space for initial thoughts for the book? Maybe! But if you talk to historians now, they will admit that they can’t spare the expense and time of leafing through letters over a month in a foreign city. Most simply take photos of documents in quick trips to the archives and review them later, at home. This makes tools that help one gather and organize archival documents, like Tropy and Sourcery, even more valuable — especially with AI now able to make these digitized copies searchable and readable.
If AI can diminish some of the monotony of research, perhaps we can spend more time thinking, writing, playing piano, and taking walks — with other people.