1. AI is a great boon for all tasks and specialties we don’t have the skills to do ourselves. Understandable, since (A) we’re ill equipped to see the flaws in its output because it isn’t our area of expertise, and (B) it often can unlock great gains because if we trust it, we then don’t have to pay and wait for humans to do that thing.
2. AI is a terrible replacement for me - my skills are at such a high level that it’s almost theoretical that it’ll ever be good enough to replace me for 90% of what I get paid to do. It’s a tool at best.
This is why I use AI for all my medical questions and doctors use AI to write software, and we both smirk at the quality the other person is getting from it.
Period.
You could do a machine translation if you want, but you better pore over every word in case you end up on the witness stand.
I read two translations of the book "The Master and Margarita". My first read was so boring I couldn't help but stop reading before the end of the first chapter. I can't find the copy and the name of the person who translated it, but this one had all the Russian nicknames translated. It kept talking about a guy called homeless. I thought it was just a bad book and dismissed it for years. I couldn't understand what all the fuss was about with this book.
But then, I stumbled upon the translation by Diana Burgin and Katherine Tiernan O'Connor. Although I don't speak Russian, I think this is as good as it gets. They did a phenomenal job.
You can see the same effect with the mechanical translation of the book "We" by Yevgeny Zamyatin, where the government is called "United State" easily confused with the "United States". The translation that called it "One State" was so much better.
There are now bags being sold marked "Lawn Suits", when it was supposed to be Lawn Topdressing
Maybe McDonalds is big enough to not care about their reputation, maybe they are happy about the free clout from people making fun of them but they certainly chose to cheap out on translations.
Come to Montreal. Only 2H away and you can get by decently well without a car.
Maybe my brain works differently than the author, but I'm surprised at this statement. Gym clothes don't change recognition for me, it's about the face, body, posture, clothes don't really enter into it. For me it is nonsensical enough to be suspicious.
And for a human centric perspective, not recognizing who someone is sad, it's knowing that you probably won't meet them again so it's not worth it, the community isn't there. Where community and interpersonal relationships between people are something we still hold dearly.
Maybe a publisher will replace the translator of the next Dan Brown best seller with Mythos? Who cares other than those buying it, getting money out of it?
A few examples
Audio book narration. Human narrators are paid a seemingly ridiculous amount of money to literally read a book out loud. We have the tech to replace them, it’s actually pretty dang good, and it is substantially cheaper to do with computers. It’s pretty accurate too. In the audio book industry though, if you take your book seriously you have a real person read it. The best one you can find that you like. Readers enjoy hearing good narrators and the total value one narrator can bring is very high mostly because the value scales well.
Another real world example that doesn’t scale well, call centers. Customers want humans, but execs have tried to replace them with automation in every way possible. The margins of a business get squeezed because the value of the human touch doesn’t scale well in this case.
Translation falls a bit in the middle. I’m sure ChatGPT is good enough for some people. If you are a restaurant and need to understand what you are ordering at the local authentic Italian restaurant it’ll do the job. If you have a bad food allergy? Maybe not, you are willing to pay for accuracy because that’s what a human brings
So the answer to the question posed in the article, can’t you just upload it to ChatGPT? Maybe yea maybe no
Yes. Effective tools increase the supply of roofs made. More supply means lower prices per roof. But because the same number of roofs need to get worked on, the increase in roofs per roofer means less roofers will be needed.
I could feel the heads of those around the table that had been teaching this material for a decade starting to explode as this was exactly what others in the thread have described: it looked good until vetted by experts, then it was easy to poke holes as it was just not right
The problem in the public service is that the experts who can review the output are leaving or being nudged out.
A list of "Examples AI will silently fail at" would be a lot more interesting, and might just convince your next potential client to _not_ use AI.
Translation is a gigantic boon for business, but just as important for human connection, for culture, science, art, and entertainment. The value of automatic and cheap translation between all languages, this tower of Babylon, is immeasurable.
Human translators will always be better than any AI at their job. But they don't have unlimited time and energy, and they aren't cheap. AI makes good to great translations available to everybody.
It seems silly to imagine that there is some fundamental barrier between human intelligence and AI, and that AI could never do many of the things that humans can do. Inferring intent, gauging sentiments, factoring in cultural values, etc. all the things cited as stuff humans can do but AI can't, AI can currently do if given enough context. But more importantly, all those things aren't magical tasks that can only occur inside a human skull, they are a product of information processing, its just the information processing that has been hard to make computers good at, but so far it appears AI keeps getting better.
I'm all for humans having special value that is not attached to their ability to perform useful work. However denying the abilities of AI models seems to be a common mistake many people are making, and sadly reality catches up to these people before they can emotionally prepare.
I know a translator between two Eastern European languages, and some jobs require use of specialized dictionaries. Using LLMs in such cases would be very unreliable and would require even more effort to check and correct than doing it correctly in the first place. Plus, I really doubt that US tech firms are training LLMs on language spoken by "only" 6 million people.
As for entertainment, anyone who grew up in Eastern Europe with pirated movies with nasal monotone translations, or machine-translated video games knows how much those take away from the experience. Sure, "AI could do better", but could it be consistent and capture cultural nuances and idioms, etc?
Poor woman should really look into pivoting her career or finding a different way of making money. Truth be told, her industry/career is not going to get better. Consistent work will just not fall from the sky.
Being bitter will not improve her situation. Even organizations like UN/OECD are looking into implementing AI in various ways.
Really good blog though. I love life blogs like these! You can go back and live through so many interesting/pivotal moments.
For example, I just read the Lawrence Ellsworth translation of The Three Musketeers, which I very thoroughly enjoyed. I don't speak or read French, but from my understanding Ellsworth's translation is considered one of the more accurate translations of the work.
Out of curiosity, I sic'd Claude Fable on the original French version of The Three Musketeers and told it to translate accurately, but also try and keep the same jovial tone as the original and do not censor anything. After it was done, I didn't read the entire output, but I did compare a few individual chapters between the Ellsworth translation and the Fable translation.
They were honestly remarkably similar. As far as I could tell, nothing was substantially different from the Ellsworth translation and the Fable translation. I do think that the prose for the Ellsworth translation was a bit better, but the prose for the Fable one was actually perfectly readable. Again, I don't speak French so I cannot say for sure, but I do not believe that I would have gotten a significantly different experience had I read the Fable version instead of the Ellsworth version.
Now, it's possible (and likely) that this is somewhat self-fulfilling; Fable might have been trained using Ellsworth's translation and as such it's very directly able to crib from it; sadly since I do not speak any language outside of English, there's sort of a catch-22: the only way I can compare the accuracy of a translation is to compare against other translations, but if other translations exist then that will likely influence the results, and if a translation doesn't already exist then I have no way of auditing it.
I'm still going to continue reading through Ellsworth's translations for the subsequent stories simply because that feels more canonical, and as I said I do think the prose was a bit better.
AI isn’t replacing me. Like a toddler, it
needs to be constantly coached.
Like a toddler, it will grow up.Humans are really bad at noticing trajectories. They see the current situation. They know what the situation was 5 years ago. But for some reason they do not believe that there is a trajectory. They view the present state as the final destination.
Gemini did a pretty good job of translating this to English .
Sure a professional human translator would have done a more nuanced job if I was willing to invest the money and time . But ...
* tajdar e haram originally by Payam Saihalwi, later versions by the Sabri Brothers and recently by Asif Aslam
not because their skills are no longer relevant, but because they are taking a principled stance defending now irrelevant skills.
> “Oh, I can’t! It’s really not reliable enough.”
Gell-Mann Amnesia strikes again.
"Expertise in one field does not carry over into other fields. But experts often think so. The narrower their field of knowledge the more likely they are to think so." - Robert Heinlein
In this case, the gym buddy doesn't think that she's an expert in the other field, but dismisses it as something ChatGPT can do with ease.
She writes: “I adapt, I localize, and I find the best way to convey the original message so it makes sense and feels natural. I research terminology. I make sure it’s consistent throughout.”
I’m sure she has other important insights into what enables her to do her job well. The problem is whether or not such insights can be incorporated into an AI-driven translation system, too.
Since early this year, I have been experimenting with a variety of agentic systems for language-related tasks, including dictionary-writing, research on topics in the philosophy of language, essay-writing, and translation. Other than the dictionary [1], I am keeping the results private, so they haven’t been evaluated by others. But my personal assessment is that agentic systems given suitable high-level guidance can be very good at such tasks now.
If I were still freelancing and I had a large translation job to do for a client, here is the outline of the prompt I would give to Claude to get it started:
“Use this private GitHub repository to build a system for translating [genre of text] from [Language1] to [Language2]. The directory samples/ contains examples of the type of document to be translated, high-quality human translations of those documents, and texts in [Language2] that are in writing styles that I believe to be appropriate for this genre of translation. The file guidelines.md contains my general instructions about the needs of my client and my preferences for how you should translate texts along various axes (natural vs. literal, informal vs. formal, preferred dialect in [Language2], consistency vs. variety in terminology translation, etc.). Begin building (1) a knowledge wiki for this project using Karpathy’s LLM-wiki framework and (2) a system inspired by Karpathy’s Autoresearch, AutoResearchClaw, etc. for testing and recursively improving both the functioning of the system and the quality of the translations. For the actual translation, editing, checking, etc., use not only your own ability and the knowledge assembled in (1) but also outsource such tasks to other frontier models through OpenRouter, and use adversarial evaluations among those models and yourself to check and recursively improve the system design, the prompt-writing for other models, and any translations created by the system. My OpenRouter API key is available in this environment. You may spend up to $xx per day in API calls until this project is ready to do real translations; before beginning a real job, give me an estimate for how much the API calls will cost for that job. The initial build-out of this project will take many sessions, so write a prompt called resume-prompt.md that I can point you to at the start of a scheduled Routine to have you work on this. Commit and squash-merge to main at the end of each session. I will be checking in occasionally to view your progress and to ask you to run translation tests, and I will offer guidance then on how to improve the pipeline further and make the translations closer to what my client needs. If you have any questions before you begin, please ask me.”
That being said, something with essence like a novel definitely still needs to be done by a human.
AI produces output that is very convincing to a non-expert, and (dangerously), it's so good at looking like an expert, they might believe that it is an expert. But the moment you ask someone to use it for something they're an expert in themselves, the holes appear wide, consistent & obvious.
My favourite moment of seeing this in action was watching AI-worrier TV host/comedian Bill Maher. He has spent years talking about the dangers of AI taking everyone's jobs, destroying civilisation, ruining the economy, starting wars, "it's just getting better and better all the time", and so on. But one night he let slip a tell. "It's no good at writing jokes. Not yet, anyway". There you go, Bill... connect those dots...
There is real utility in it being a tool to help experts apply their expertise, as in this story where it speeds up some tasks to help the translator do part of the work, enhance their expertise, allow them to be more productive.
It's a better screwdriver, a better hammer, in the hands of somebody who knows what needs a screwdriver or a hammer. It doesn't replace them. It can't replace them. It's a tool that enhances the human, not an alternative.
I don't understand why this is not widely understood yet, but I'm sure it will in due course.
And I don't expect this to change. Even if the latest model scores 100% on every benchmark, all that really tells us is that it's now more productive/efficient than it was before at helping experts do that work, not that it can replace everyone in that category of work.
It seems inevitable that we ask for more AI assistance on topics we don't understand. And therefore have the least context to correct. Result: a flood of poor quality information.
In areas we DO understand, we'll either not ask AI at all, or treat its results with a higher degree of skepticism. Result: a lack of high quality information.
Inevitably this means a higher volume of non-expert prompts gets translated into the next generation of internet content. AIs are pumping out more novice-level text and less expert guidance.
The result will be an internet full content written from the perspective of an ignoramus; not addressing any complex issues, staying surface level on every topic. Which will cascade into future models, etc.
Every month a new guy discovers LLMs; discovers a skill the current LLMs require to get good results; and writes about the future jobs that will always be available for smart people like HIM, that are SKILLED in using LLMs.
The next generation of AIs doesn't need his fancy prompt. The image model goes from needing to type in just the right set of weird words and cryptic sorcerous invocations, to most people being able to type in English what they want and get a pretty good result.
There are still tasks that require careful invocation. But they are a much smaller fraction of all the tasks people are trying to do, or you can get a bleh result without the elaborate invocation to get it really good. And to improve on the bleh result you need to be substantially more of an expert than back when the Guy was memorizing a rule about adding "trending on Artstation" to the image prompts, as would always require a human paid to do that.
Another generation of AIs comes out. The next generation of Clever Skills is obsolete. Image models just obey the instructions for compositing panels without mixing them up, and you don't need to be an expert to get them to do it right. Another human value-add is gone. A wider set of tasks require no human expert.
Now a new Guy notices LLMs have become useful in his field for the first time. He discovers they require SKILL to use CORRECTLY. He posts about how there will always be jobs for humans who are SKILLED in using LLMs like HIM.
But it is not an infinite cycle. It is not the same each time it repeats. Now the Guy is a highly paid programmer or a career mathematician in 2026, instead of a graphic artist in 2023.
In six months the models will no longer require his vaunted Skills.
And by then there will be another Guy.
But the process doesn't continue forever. The Guys are coming from fields that were harder and harder for AIs. The brief centaur eras are shorter and shorter.
Today it is writers who are laughing at how bad the LLMs are at their job, and who will perhaps soon be posting about how it takes Skill to get an LLM to do their job Correctly. But the models are coming faster, and the eras of kinds of human value-add in each field are shortening.
There is a point when you run out of Guys, either because the centaur eras are too short for people to develop SKILLs and post to Twitter about them; or because there are not lands left for AIs to conquer; or because ordinary people are not reassured by some Nobel laureate proclaiming there will always be jobs for Nobel laureates with the SKILLS to prompt robotized biology labs Correctly.
But we'll never run out of amateur economists who assert entirely without a brief contemporary example that there will always be jobs for humans skilled at operating AIs!
We'll run out of professional economists saying it when nobody is paid for that work anymore.
I guess we'll also run out of amateur economists when they're dead.
This person is in the first stage of grief (denial); artists are several stages ahead. Most customers are not going to care about the difference in translation quality unless it's in a regulated sector.
I'm curious, do you have a graduate degree in mathematics?
I agree but it's useful to remember that 1. brains and especially the human brain are enormous and 2. individual tokens carry significantly more meaning than individual tiny muscle twitches so even extremely primitive "cognition" can look like it's doing more work than it actually is.
It's worth noting that you can substitute "dollars" for "context" in that sentence, which seems to be where many of these impressive achievements are coming from. As ever, it's unclear whether these models will get cheaper while remaining better, since all of the recent breakthroughs appear to be of the "think more" kind. For translation specifically, I'd be very surprised if the "think more" LLMs would help given the per-unit cost expected of the output.
Mathematics is famously rigorously defined, it's roughly analog to AI beating humans at chess. Sure it's impressive, but it's also something you'd expect machines to be good at.
Specifically: LLMs make it really easy to misunderestimate the complexity of fields other than your own. (You can see this with a lot of vibecoded projects, for example – once they hit the wall of complexity, they stall out or start finding ugly patches for fundamental design issues, etc.)
I don't think this sort of cultural change will happen short-term, though.
Every critique of AI assumes to some degree that contemporary implementations will not, or cannot, be improved upon.
Lemma: any statement about AI which uses the word "never" to preclude some feature from future realization is false.
Lemma: contemporary implementations have already improved; they're just unevenly distributed.
There is an interesting third group emerging: People who acknowledge the quality problem, but think they can deal with it by applying more AI to the output.
This takes the form of people who spin up a lot of "agents" and give them personalities like security director or quality director (which are unnecessarily complex and maddeningly unpredictable ways to trigger an LLM session for doing a security review or a quality check pass).
It also includes the person who knows that their app is full of bugs, but thinks it's not a problem because they can have the AI fix the bugs as they show up. People in this class haven't encountered security breaches or data loss bugs yet. They think it's all about having Claude fix that div that isn't centered or handle that error code that shows up some times.
Each time the frontier models get better, I see another wave of AI doubters suddenly become believers. People say things like, "AI couldn't code last year, but now I use it for everything!" Interesting. Now we know how that the person who said this has the coding skills of a Claude Opus 4.5 or whenever the frontier was when they flipped.
Meanwhile, the rest of us keep using AI as simple tools, like the person in the article. I wonder how long it will take before computers can program better than me, and I flip too.
I'm not sure how to formulate it yet but it seems there is some Peter Principle/Gell-Mann Effect corollary that is AI-related we can say here.
Perhaps: "AI rises to the level of its users' incompetence."
Or: "Confidence in AI output is inversely proportional to one's ability to verify it"
Likewise, AI is oblivious to it's own mistakes, much like said professionals can be at times.
Not that AI is actually thinking, but rather the collective corpus of text yields greater insights (knowledge of the crowd, not wisdom of the crowd) than a lower-average person in that same industry.
Most? Perhaps it's depression, but I look back at my career and wonder if any code I've ever been paid to write is beyond what current AI can do.
Sure, this leaves me with the non-coding tasks of UX taste, and code review + a few other forms of QA (and, when self-employed, project management, game design, etc.), but man, I'm someone who actually learned to read in part on the Commodore 64 user manual (as in, trying to understand what PEAK and POKE meant concurrent with having "Jack and Jill go up the hill" picture books).
(And no, I'm not claiming LLMs make bug-free code, I see the bugs LLMs make during my code review of their output and some of them are awful, hence "this leaves me with …").
Kinda conceptually similar to how typos and grammatical mistakes aren't a big deal if you're shooting off a quick text or email, but publishing if you've got typos in your advertising copy, in your resume, on your medicine label, etc. it's a real bad look.
As one of such people, I think there is a nuance to it. AI is great when you’re translating something to yourself. But when translating things for others, more caution and human judgement is needed. Espesially when translating instruction manuals, where bad wording could cause someone to injure themself.
Reminds me of the first time I saw a coding agent stumble through an issue in 2023 maybe? and went "this is a big deal", similarly when OG gpt started making jokes that actually kinda worked.
Updated modern version of the classic "make me a greentext", apologies for slop-posting, but it seems relevant:
> be me
> senior software engineer
> in charge of making sure the tickets get, in fact, implemented
> occasionally have to open the IDE and write some code myself
> one day i open the IDE and the ticket is already closed
> the agent did it overnight
> no steering, no review notes, nothing left for me to do
> distress.jpg
> ask my manager what to do
> he says "just focus on the high-level architecture stuff"
> i say "what high-level architecture stuff"
> he says "i don't know, you're the senior engineer"
> rage.jpg
> quit my job
> become a prompt engineer, nice and simple, just tell it what to build
> first day on the job, sit down to write the prompt
> AI already wrote itI can confidently say that LLMs do a better job than the average traditionally published fictions in my country, at least when the original works are in English. Every single time I watch a subbed movie there will be some lines noticeably wrong.
https://www.reddit.com/r/funny/comments/3e786n/chinese_hair_...
On the other hand, a lot of people become extremely put off by the smallest sign of ai slop. And the llms have a tendency to impart their style to any text they touch.
I still love the tool, but remain as convinced as ever that AGI does not lie at the end of this particular path.
Of course as for the poor OP... is this a majority of what working translators are paid to do?
I suspect a lot of translation is just grunt work - technical and business documents. The lack of a cohesive voice with considered style is perhaps not really much of an issue in those. The expectations are just much lower; text that conveys the basic meaning is a much lower bar to clear.
She's probably better than a bot at that stuff, at least for now, but my concern is that it won't be "enough" better for businesses to justify her continued employment. And this is my general feeling about this stuff across society, in basically all domains.
Maybe AGI is possible and we'll have software defined human intelligence that's completely autonomous but that's not coming in the next slightly better RL trained LLM and if existed likely wouldn't be under our control anyway
In my experience this is a real problem. Just yesterday I asked my LLM to create a piece of software that could help me build an 'ambilight-like experience' through my home assistant. It did something that seems to work as I expected, but there is a lot of theory that I just brushed past. It would be pretty easy for me to assume that I would be able to replicate this feature from scratch 'now that I understand the problem'.
Well, once folks like Linus Torvalds concede, this doesn't carry much sting.
It's not about just skill. It's a matter of skill, time, and how critical the software you are writing is. There is a lot of software that is not critical. That is not close to security mechanisms. And that even if the code quality is not the highest, it does not matter.
Even if you are the best coder in the world, you would already become more productive by using ai. Things that in the past you might have not coded yourself but delegated to an intern, or things that you wouldn't even delegate to an intern because they are just too boring to do like some refactorings.
Like I had this project at work that was written without typescript strict mode turned on. When I turned it on, it had over 700 errors. I might be better than AI to fix every single of one these errors. But my time is worth more than that in doing other things. But I can, and did, ask AI to fix every single one. And then I reviewed it batches, and something that my team wanted to do for multiple years and nobody had the time for, finally got done.
There are large portions of my codebases that are essentially extremely verbose grunt work. My UI stack, IaC YAML, thin CRUD routes, etc.
I know what the code is supposed to look like when it’s done being written, but it’s going to take me for freaking ever to type it all out.
I can just few shot it now in an hour. Plan -> feedback loop -> build -> review loop.
Does it try to do weird stuff? Yeah. And then I’m just like “that’s weird, no, the components should be broken up like XYZ” and then it’s not weird anymore. Occasionally (1% of the time) I just do a quick refactor myself instead of trying to tell the agent harness what to do.
I can get something fairly close to the ballpark of what I would have done but in like single digit percentage of the time.
And the result is that I can spit out a bunch of purpose built tools (personal tools, internal tools for teams, etc.) that I never would have been able to justify building otherwise.
A year ago the AI output was so bad that getting it up to my standards took more than writing it myself from scratch. And nowadays it is faster for me to start with AI output and iterate from there to reach quality submission.
The ninety-ninety[0] rule was a thing talked about 40 years ago, long before anyone thought of AI coding. AI can nowadays make the first 90% of the task very fast and good enough. The last 10% is still the hardest part of coding by far.
[0]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule
Update: in case it’s not obvious, I am sorry. I could not help it.
Even small, dumb, local models are excellent at translation already. Frontier models are on par or better than the human translations we've tested them against at work.
Ah yes, the known unknowns.
The discussion reminds me of a talk Zizek gave in which he discusses the speech Rumsfeld gave regarding the evidence Iraq supplying weapons to terrorist[0].
Zezik argues the unknown knowns are far more interesting (and the reason why USA was losing in Iraq). While Rumsfeld focused on the unknown unknowns.
I've noticed that domain experts who implicitly know the the known unknowns of their field distrust LLMs because they can identify their shortcomings. Those subtle mistakes LLMs make. I argue this is why domain experts using LLMs get such a boost. They can identify and avoid pitfalls sometimes before they happen. But in other fields the same people are in awe of LLM capabilities precisely because the known unknowns are a mystery.
The Unknown Unknowns of LLMs are the IMO the most interesting. The so called emergent capabilities of the technology. The use of LLMs in others fields such as biology, eg in protein language models, is really cool.
Everyone focuses on replacement of people workers when I think opening new fields of work for humans should be the goal of LLMs by leveraging the tech to discover.
The other interesting caregory is unknown knows. But that's another topic for another time.
[0] https://en.wikipedia.org/wiki/There_are_unknown_unknowns
Brute Force: if it doesn't work, you're just not using enough.
What if they're right though?
Are you averaging like 2000+ comments a month?
I like this / generally agree. The only wrinkle is that - for some tasks - the verification _is_ "run the script, see if it worked, don't care how... just that it did" which is distinctly different from "not only did it do it correctly, it did so in the most direct and performant way possible".
For a _lot_ of what I use LLMs to build, the former is all I need.
In real life I haven't met a single programmer who doesn't think AI can do their job.
If someone would actually say that I would immediately think they have hubris and overestimate their skills.
Its like basic income, everyone will stop working except from you.
> In my Ottawa life, every Tuesday evening, I take two gym classes back to back—boxing and the pompously named “body sculpt,” which makes me discover muscles I didn’t know I had.
The em-dash matches how you'd speak out loud.
You'd say "I take two classes every Tuesday back to back, boxing and 'body sculpt'. Weird name." (Parts of that sentence did flow oddly, but not because of the em-dash).
Grammarians say you can't make those separate sentences without adding some extra words, and because of blah-de-blah-blah-blah, someone might say you can't join them with a comma. So we have an em-dash.
Rewriting the sentence would make it flow less naturally, not more.
Either way, I'm not reading it, it's a clanker or a clanker collaborationist.
I mean, how would you even write an em dash? There's no button in the keyboard for em dashes, it's not in ascii, it's just not something we write in internet text with, it's a safety watermark put into LLMs by OpenAI to help making LLM generated content identifiable as such.
If for some reason you are an em dash lover that was hurt by the LLM debacle, I'm so sorry for your loss, but look who's on your side, give the em dash a funeral and let it go.
The sad part is that we haven't figured out how to distribute our resources fairly to these people even thought their services aren't required as often. Instead we just take their wages and give them to the top 0.1%
This isn’t a great test, because Claude almost certainly has multiple translations of The Three Musketeers in its training data.
I have also taken to being sloppier in my prose, as I’ve had stories rejected for being “written by AI” - when they’re shorts I wrote more than a decade ago. Reworked them to sound like a moron, accepted. Sigh.
It's functional? I wouldn't say it's poetic, I wouldn't want any AI translator translating art, like say a book or poem, I'd be so uncertain that it would correctly bridge the concepts
A good translator can make stylistic choices that elevate the work and make it fit in their language
(Having read lots of well translated manga and anime, also from what I understand there's a few books I've been told by my bilingual friend's are just chef's kiss quality translations)
Considering translating meaningful art is of some value, on that score I don't think we're there yet
The `cp` program on my computer also has the remarkable ability to produce a faithful translation of The Three Musketeers when provided one as input.
Translation is hard. If you're familiar with reading translations from specific languages MTL works have a very specific smell to them, it's a bit hard to describe but it's there. A good translation is miles (kilometers, for those outside of the US) above MTL.
That's not to say that perhaps the latest LLMs will have better translation abilities, but that they are generally crap currently. Maybe they are fine for something very short, but absolutely not for longer content.
Yourself included??
Expected Value (Upside, given time/cost savings + Downside, given %reliability).
So, every task falls under a spectrum
The most egregious example I came across recently was where a friend enthused about some manga he was reading and I agreed to read a few chapters, only to discover that the translator has decided to render the countryside accents of western Japan (engaging with a protagonist visiting from Tokyo) by having them say 'y'all' and 'bless your heart' and other Southern USA tropes. I get the aspiration of the translator, but it was excruciatingly unpleasant to read. At that point, why not just say the protagonist was from New York and on vacation in Florida, or draw in some meshback caps on some of the characters and add alligators here and there in the background?
Software is no different. Even without AI, you already have buggy compilers and buggy OSes and buggy libraries. You just tend to accept the risk because you have some idea of what the failure modes are and can work around it or manage the risk in some other way (buy literal insurance.)
But it requires taste and engineering to do it right, and on the right things. It'll be an interesting few years.
Notable papers describing performance improvements with prescribed roles and personas:
- ExpertPrompting: Instructing Large Language Models to be Distinguished Experts (2023) https://arxiv.org/abs/2305.14688 (if you're going to only read one paper here, maybe read this one but know there has been a lot of follow up with more modern models.)
- Expert Personas Improve LLM Alignment but Damage Accuracy (2026) https://arxiv.org/abs/2603.18507
- When Does Persona Prompting Actually Help? (2026) https://arxiv.org/abs/2605.29420
- Unveiling Power on Combining Prompt Engineering Techniques: An Experimental Evaluation on Code Generation (2025) https://doi.org/10.5753/sbbd.2025.247251
- A Pattern Language for Persona-based Interactions with LLMs (2025) https://www.dre.vanderbilt.edu/~schmidt/PDF/Persona-Pattern-...
A TLDR of my *admittedly heavily biased* mental model (so take it with a grain of salt): personas do improve task alignment and precision to measurable effect but with observed negative impact to accuracy and knowledge grounding. Overall, this makes it quite suitable and preferred for code generation scenarios. (Don't over-index on 'accuracy' here as meaning "bad code", it's more about verbosity/jargon reducing clarity of higher order goals like business objectives and system architecture.)
Outside of code generation, personas have the interesting effect of increasing implicit biases and stereotypes. It's not hard to imagine something like "you are a left|right wing politician ..." or "you are a senior-citizen|teenager ..." influencing token space construction considerably.
I'm pretty sure the Ellsworth translation is in the corpus. You basically instructed claude to regurgitate it.
The llms all have the more famous books memorized. You can trick them to recite them more or less word for word.
Crucially the full translation was part of ChatGPT’s training set. Recall is a pretty solved problem in machine learning.
How well does it translate a French novel published yesterday? Where neither the original novel nor any translations are in the training set yet? Or might not even exist!
I tried asking ChatGPT to translate a letter I wrote in Slovenian this weekend. It got the general gist but missed a lot of the nuance. Completely missed several of the little touches of tone where the right choice of synonym conveys a whole bunch of information.
This reminds me of the adage, that ChatGPT is really great at everything except my own work.
It can be reasonably argued that some poetry can be impossible to translate from some languages to others. A poem might be explained, but by a lenghty, dissecting explanation, that completely loses the point of it.
I have a few periods during my daily routine where I’m waiting somewhere away from the computer and need a break from email.
A lot of my comments have double digit upvotes and some get into the mid hundreds. I try to actually read articles and provide thoughtful comments, which gets upvoted a lot more than the throwaway.
> Are you averaging like 2000+ comments a month?
52000 / 3 years would be under 1500 points per month or 48 points per day. That could be done with 1-2 helpful comments per day on popular threads.
Yes! Personas demonstrated measurable improvement in a few different ways, with caveats of course. The common intuition is that personas influence token space in beneficial ways.
I'll come back here later on desktop and link a few (still) relevant papers on this topic.
However to me it seems completely reasonable that it would work, because my understanding of what happens is the model interprets what you said as:
Look for a group of people who are considered to be expert growth hackers by the world at large and answer my questions as though they were answering them.
So assuming that there are a set of questions that can best be answered by people that most other people identify as expert growth hackers then yes, I believe assigning a personality in this way should obviously work.
After that, cargo cults do what they do best.
But the problem is that for many people they now believe it's ok to present a 10k line vibe-coded PR that only has been verified against external behavior, and some Senior Engineer needs to review it, in time, under pressure, without too much push-back, and lastly, it's the Senior Engineer that gets paged at 2am because something has fallen over.
Also, those scripts tend to start a life of their own, and because it looks good enough, people don't look at them again.
I recall a bug of someone vibe-coding a cleanup script for folders older than $x (on Windows).
Get the CreationDate, and sort. Delete older than $x. Except CreationDate can be null and null is always smaller than $x.
Oops.
…if not, they’ve found developer work that ai can’t do yet, no?
Don't care, only time I've measured them was personal curiosity about hand-written projects, and one time I was trying to work out how many blank comments a co-worker had put into their codebase*.
How valuable are features? Management kept giving me them, and I always just assumed they'd decided which ones were important. But I've seen git histories of apps where the same feature was added twice, 5 years apart, by the same developer.
> In the same vein, when was the last time someone put an AI on a ralph loop, posted the result on r/vibecoding and ended up with actual users.
How often do the megacorps currently boasting that 80% of their code is now vibed, post anything (other than adverts) to reddit?
* 20% of the whole project, or 24 thousand blank comments.
Slang for an AI, used by a Blade Runner
Followed by, “You should abandon your preferences because I don’t share them”.
> I mean, how would you even write an em dash?
⌥ ⇧ +It's been seared into my muscle memory for more than a decade. I keep using it, too. It's present in the popular training sets – and then in LLM outputs – simply because it's proper punctuation.
> Correctly prompt, to steer it, to verify it, and to improve the harness.
I doubt this a lot. The average AI user is running claude code as the harness, or Codex etc. prompting has no secret incantations, and steer and verify is just knowing what the answer should roughly look like, which is a domain skill, not an AI skill.
And here I am, brain the size of a galaxy, and I fumble my way through every language I speak other than English.
Serious respect for the linguists.
To me they come off as faddish, with many writers using them where commas and semicolons would have done just as well. I think their popularity stems from teh fact that provide the sense of a personal aside from the writer, allowing them to be more expressive while clearly delineating the personal or contextual remark from the main flow of the prose. No doubt this works for a lot of readers, but I find it tedious.
Just one amusing example I saw recently: On the Amazon website, a submit button labeled “Go” in English was translated to something which when translated back would be “Walking”. That’s the kind of thing that would be exceedingly unlikely to happen with a human translator.
I suspect if I knew another language I would be able to find errors in the translation.
Where else, other than HN, do you post?
When I write comments on here I tend to spend upwards of 15 minutes to draft and reformulate my comments. Sometimes double-checking what I'm about to say (sometimes not thoroughly enough as some of my recent comments show) and I was wondering if you have a similar experience in that regard or do you just manage to fire off a comment in a stream of thought fashion from start to end?
The Opus models over the last year doesn't seem as vulnerable to this type of behavior and I've noticed the "identify as expert" prompt tricks aren't as meaningful there.
I really wonder if phrasing it differently would make a difference. In good faith conversations, it just doesn't happen that someone tells someone else who that person is.
Three years ago, AI was barely able to provide sort-of reliable command completion.
Two years ago, it could extrapolate a single function from a docstring - but the docstring had to be so verbose that it wasn't practical to use in that way.
A year ago, I was tinkering with Devin to try to find a way to get it to reliably implement small, isolated features from verbose Jira tickets.
Six months ago, I started using AI to generate the majority of my code output. Most of my time was spent reviewing, and I was ecstatic to reach ~2x output because I could run the next task while reviewing the last.
Now, at work I'm managing a half dozen Claude Code instances, Devin sessions, and orchestrating a review loop between Claude, Devin, and CodeRabbit. It's not uncommon for me to be working on four or more discrete features at once. My output is approximately 15x my pre-AI baseline - and I've not sat down and written a line of code directly in six months.
At home I'm managing a Hermes agent that can spin up a whole fleet of purpose-tuned agents for whatever purpose I'd like. I've implemented spec-driven development a'la Acai, and extended it to the point that my agent creates specs from text or voice conversation, I review them, and it handles implementation end-to-end. The code itself is an almost disposable artifact - useful primarily to ensure no regressions have been introduced between rounds.
... I simply don't understand how you can assert that "it's been basically the same for 3 years". It absolutely has not.
But now I find myself adding noise and imperfections to my writing (not that it was perfect) to make it more human, which is kinda silly.
Yes, they do make software now - whereas it was impossible before. You may be absolutely shocked at how bad LLM code can be when prompted from a noncoder. How buggy, and how absolutely rife with security problems it can have. I honestly don't know how they can get LLMs to write such bad software - but somehow they can. This is from people who have been vibe coding for 3 years straight btw (huge amount of time p/day).
The way that information is organised and formatted matters for compliance. It’s pretty similar to writing good procedural documentation for humans.
You can (could, maybe they 'fixed' it by now) get sota LLMs to reproduce entire novels near verbatim.
The idea of giving it parallel texts of those novels in different languages, to train it on translation, is so obvious it'd just be strange if the AI labs didn't do it.
In fact DeepL was doing basically that more than 10 y ago.
You can restructureany sentence to use fewer forms of punctuation -- but if you do that, you'll lose nuance. And nuance, in writing, is a very fine thing.
There will never be enough expert-level human translators, and they tend to be very expensive. Machine translation has raised the floor.
I still think there are better tests you could do. Ideally, you would choose a book that was published recently—after the model’s cut-off date—which is considered to be a good translation. But even something like The Girl With the Dragon Tattoo, which is not particularly new and by no means obscure, would be better than a famous work of literature like The Three Musketeers that has many translations.
Not disputing the overall trajectory, yeah it’s gotten better. But it was definitely capable of more than just command completion 3 years ago.
I reach for it more frequently. But personally, it’s at the point of diminishing returns for my work. It’s capable enough now to handle most of the things I want to throw at it, sometimes it’s wrong, sometimes it’s right.
I’m not doing cutting edge deep tech work - and I also don’t have the motivation (or salary increase) to be 15X more productive, if that’s even measurable. We are so busy because the CEO hears these “15X” statements and then the pressure is on to match or exceed that, and I’m not playing that game.
people who claim to be experts in [domain] people who others claim to be experts in [domain]
hopefully valuing membership in group two over membership in group 1.
Glad we agree :)
This.
There was even a big controversy recently with one of the games on Steam where human translators just completely botched and vandalized the translation, mistranslating huge chunks of it and injecting their own personal politics which are not present in the original text (only English was affected; other languages were translated fine apparently): https://store.steampowered.com/news/app/2914150/view/5028562...
If you'd get the AI to translate it, even without any editing, it would have done much better job. Just because something's done by a human it doesn't automatically make it good; you still need competent people at the helm, and recent machine translation advances certainly raise the floor on that.
https://books.google.com/ngrams/graph?content=%E2%80%93&year...
It's also notable that the em-dash is approved in American Manuals of Style, while discouraged in British ones. I was unable to find longitudinal data for the em-dash's use in magazines, blogs etc., but AI summaries suggest it's 3-4 times more used in those contexts than in news reports.
Like strawberry ice cream or apple pie, nuance is certainly a fine thing; but a surfeit of it becomes cloying, and the antipathy toward the omnipresence of the em-dash in LLM-generated prose, along with other kinds of literary expression like contrast and comparison, suggests to me that people have had more than enough of it.
Even if Fable didn't have Ellsworth's translation, it certainly has the William Barrow translation, which would still get it like 80+% of the way there.
My wife speaks Spanish, I should get her to do some kind of comparison with a Spanish book that doesn't have English translations.
"Switched to Opus 4.8 - Fable has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them."But yeah, I broadly do agree; if I read other languages I could find a book that hadn't been thoroughly translated to English and then I could give a proper analysis on how good the translation is, but since I'm a very stereotypical American I know exactly one language (and sometimes my comprehension of even that is questionable).
I did this with entirely local models I have sitting around on my laptop. Minimax M2.7 at a 3 bit quant with 8 bit quantized KV cache for English -> French, Gemma 4 31B QAT (4 bit quant) MTP for French -> English.
It's perfectly readable, but there are a few places where the phrasing is a bit more awkward after the double translation ("auditing" to "revision" in particular is a bit off). Gemma did comment on not knowing what Claude Fable was in its thought process: "The author compares Ellsworth's translation with one produced by "Claude Fable" (likely a misspelling of "Claude" or a specific version of Claude)."
Here's the double translation:
"I have no doubt that a writer is better at translating than AI, but I must say that AI translation has become so good that I'm not sure how much longer the profession of translation will exist—or rather, it may become more a matter of revision.
"For example, I just read Lawrence Ellsworth's translation of The Three Musketeers, which I enjoyed immensely. I neither speak nor read French, but from what I understand, Ellsworth's translation is considered one of the most faithful translations of the work.
"Out of curiosity, I asked Claude Fable to translate the original French version of The Three Musketeers; I asked it to translate faithfully, but also to try to maintain the same playful tone as the original and to censor nothing.
"Once it was finished, I didn't read the entire result, but I compared a few individual chapters between Ellsworth's translation and Fable's.
"They were honestly remarkably similar. As far as I can tell, nothing was substantially different between Ellsworth's translation and Fable's. I think the prose in Ellsworth's translation was slightly better, but Fable's was actually perfectly readable. Again, I don't speak French, so I can't say for certain, but I don't believe I would have had a significantly different experience if I had read Fable's version instead of Ellsworth's.
"It is possible (and probable) that this is partly a self-fulfilling prophecy; Fable may have been trained using Ellsworth's translation and can therefore draw directly from it. Unfortunately, since I don't speak any language other than English, there is a sort of vicious circle: the only way to compare the fidelity of a translation is to compare it to other translations, but if other translations already exist, that will likely influence the results, and if a translation doesn't exist yet, I have no way of verifying it.
"I am going to continue reading Ellsworth's translations for the following stories simply because it feels more canonical to me, and as I said, I think the prose was slightly better."
Article views: 64,552
In my Ottawa life, every Tuesday evening, I take two gym classes back to back—boxing and the pompously named “body sculpt,” which makes me discover muscles I didn’t know I had.
It’s fun. I love it.
But a couple of weeks ago, I ended up cancelling my second class—one of those nights when the first assignment landed in my inbox at 4 p.m., another one arrived while I was on my way to the gym, and a third one popped up right as I was standing in the locker room. All due the following morning, obviously. Welcome to the life of a freelance translator.
Work takes priority over muscles. I headed for the lockers at the end of boxing class.
“Are you leaving? You’re always taking this class!”
I turned around. I was changing into my translator clothes—jeans and a T-shirt—and she was presumably changing into her gym clothes, except first, she was busy taking off her jewelry.
Her look was very polished—the kind of polished that screams office day. Over the past few months, the generous pandemic work-from-home policy had been tightened, scaled back, amended and more or less rescinded in a desperate attempt to have employees single-handedly save downtown Ottawa’s many small businesses and general gloom by their mere on-site hot-desking presence.
If you ask me, nothing can save downtown Ottawa or North American public transit.
“I see you there every week!”
Apparently, I owed her an explanation and possibly an apology. I didn’t remember her, but it’s a very full class and we all more or less look the same in gym clothes.
“I’ve just received some work,” I explained. “I’m a translator and I have three deadlines by tomorrow morning, so I should probably get started.”
“But… it won’t take long. Don’t you just upload the documents to ChatGPT?”
I paused for a split second. Surely, she was joking.
I looked up at her.
She was not.
“It… doesn’t exactly work like that.”
“You should try it, it’s so much quicker!”
Oh. My. Fucking. God.
But hey, I parent a teen. I can recognize a teachable moment when I see one.
“It’s not that easy, you know. Technically, ChatGPT will spit out a translated document. But first, there may be formatting issues. And most importantly, the translation will be questionable.”
“Why?”
“Because AI isn’t human, and it takes an actual person to understand what another human is trying to say—and how to say it so someone else understands it. I don’t just make grammatically correct sentences in another language. I adapt, I localize, and I find the best way to convey the original message so it makes sense and feels natural. I research terminology. I make sure it’s consistent throughout. I’m sorry, I’m better than AI.”
We’re all better than AI. AI is just better at pretending it can do the job.
Go ahead, ask me how I know.
Yes, obviously, I tried translating with AI.
Ah, you can’t fire me, I’m self-employed!
I’ve been playing with AI since the fall, when it started stealing my job for real. I could either declare it evil and turn into one of those people who will never get a smartphone, or use it to my advantage.
I’m practical. I chose the second option.
AI can’t translate for me. It can’t write either—unfortunately, ChatGPT can’t vouch for the fact that this article is my idea, that it’s my gym, my ignorant civil servant and my punchline. Just take my word for it, pun slightly intended.
And while this article is written by yours truly, you bet I’m going to spell-check it. I probably won’t use AI; I have Antidote. But maybe I will ask Claude’s opinion, and if one of the suggestions is smart—cutting a paragraph, for instance, or clarifying a sentence—I might accept it.
When I started translating 15 years ago, we used to paste uncooperative sentences into Google Translate to see if it had interesting ways to phrase things differently. Then came DeepL—same idea.
What do you think? That we’re translating with pen and pencil? That your accountant doesn’t use fancy Excel formulas? That your manager formatted the PowerPoint alone? That your favourite restaurant doesn’t Google trendy recipes?
We are professionals using tools.
But that’s just what they are—tools.
One of my clients has insane style guides, plural. I’m talking about 500-page documents detailing the proper way to format quotes and the one true way to insert footnotes. I fed them to ChatGPT for the final checks—it can kind of flag when I break a rule. I’ve also used AI to extract specialized terminology from reference documents and build my own glossaries. It’s faster than Ctrl+F, and less likely to make me scream.
But everything has to be double-checked, triple-checked. It’s another way of working, not a magic button.
AI isn’t replacing me. Like a toddler, it needs to be constantly coached. It invents acronyms and organization names, forgets to translate entire sentences, ignores the provided terminology unless repeatedly threatened, and occasionally misses the point completely.
Which is why we—translators, writers, editors, and other professionals—shouldn’t suddenly be paid less because AI exists. Should you pay your roofer less because he uses a hammer instead of his bare hands?
But judging by her amused smile, my civil servant wasn’t getting the point.
“But AI is getting better all the time!”
“What do you do?” I asked, changing tack.
“I’m the Director General, Human Resources and Corporate Services, but I’m currently in an acting position for Workforce Planning and Resources Management.”
This actually made sense to my Ottawa brain. Told you, I’m a translator.
“Great. So, do you use AI a lot at work?”
“Oh, I can’t! It’s really not reliable enough.”
For fuck’s sake.
And she works in human resources!