Fine-tuning an LLM to write docs like it's 1995

The trick about documentation is depth, not prose. You need context and understanding to write documentation "like in the old days". No amount of LLM trickery will free you from that. Once you have that source material, it's easy to re-shape it into an 80's/90's/00's doc format.

Negative example: I was looking into the German manual of my Canon EOS R5 II, and it is just fluff. Hundreds of pages, full of white space, telling me about features without actually explaining what they mean. Awful automatic translations. Their manuals used to be good (looking at my EOS 6D). But these days: oh boy.

For the last few weeks, I've been running an agentically created AI design publication where each day's issue gets re-imagined in a new design direction. Today the agent used this post as inspiration and built the issue as a Windows 98 help viewer, a nice nod to the manuals in your training corpus.

I jumped in and added working minimize/maximize/close buttons, a draggable window, and a Start menu, because of course. Brought back memories of young me learning Visual Basic to make AOL add-ons.

https://artdirectiondaily.com/issues/2026-06-05-docs-find-a-...

There are several reasons why old docs work. First, release velocity approximated documentation velocity. If you only released once a year, your docs had time to be polished. Second, simplicity. Think of the length of the man page for ls in Seventh Edition UNIX vs today. The constraints of the time helped here in that writers needed to get their point across in one or two 72x24 screens, not two million pixels.

Since good documentation creates a consistent mental model in the reader, cultural affinity of the writer to both source (developer) and reader helps, and the old, much smaller, computer industry was able to pull that off. I sat two cubes from my doc writer and we shared the same cultural worldview with each other and our market. It's much easier to communicate in that milieu because so much can be left unsaid.

Its possible that we are entering a Golden Age of Text, where everyone realizes that they have to feed their AI with decent information in order to have any hope of it producing good answers (especially true for complex technical products and internal corporate processes). But I am not hopeful.

I love old-school docs, and this was a fantastic read. But, I couldn't see the three generated doc pages linked anywhere. Did I miss something?

I'd really like to see the Win2K-style docs on REST, for example.

Edit: it was right there, in bold, too. https://gist.github.com/theletterf/0b8ee1112fbd087f3141d0cad...

> Meet Bitsavers: it’s a website that collects and scans old computer manuals and brochures. It’s an incredibly valuable repository of computer history and ancient tech writing, with mirrors available everywhere.

Wonderful! Thanks for the introduction to this resource.

Angle: Eval angle: fine-tuning docs prompts is exactly the kind of prompt change that needs a regression gate before it ships. what eval harness, if any, guards the doc-generation prompt?

Does anyone know of any good write-ups on how to carry out this sort of task, for people who are reasonably technical (i.e., know how to code) but aren’t deep in the AI world? I feel like “customize a model based on a corpus of documents” (whether that’s “fine-tuning” or “RAG”) is a thing that everyone wants to know how to do but nobody actually explains in straightforward terms. (I pay for Gemini solely for access to NotebookLM for these purposes, but it would be nice to just be able to roll my own locally.)

What I've learned from reading this is how much of my own writing style was influenced by late 1990s MSDN.

Anthropic eponymized Claude Shannon as the world's most powerful AI. Fabrizio the blogger named "a 7B model, trained on 1990s documentation" after Fabrice Bellard . Or perhaps his own name :)

Fine tune your mind instead. LLMs have no concept of prioritizing and cutting down information.

LLMs work for half page answers of targeted questions. All longer prose is like swimming through molasses.

The information about finetuning is interesting (it is something i'd like to do myself at some point, though i'll wait until i can do it with local hardware :-P). However FWIW LLMs are generally good at following a specific style when given examples.

As an example i asked Devstral Small 2 to write some docs for my LIL scripting language in the following style (this is copied from the DirectDraw documentation, edited to be text friendly):

    IDirectDraw7::CreateClipper
    ---------------------------

    The IDirectDraw7::CreateClipper method creates a DirectDrawClipper object. 

        HRESULT CreateClipper(
        DWORD dwFlags,
        LPDIRECTDRAWCLIPPER FAR *lplpDDClipper,
        IUnknown FAR *pUnkOuter
        );

    Parameters

    * dwFlags - Currently not used and must be set to 0. 
    
    * lplpDDClipper - Address of a variable to be set to a valid
        IDirectDrawClipper interface pointer if the call succeeds.
        
    * pUnkOuter - Allows for future compatibility with COM aggregation features.
        Presently, however, this method returns an error if this parameter is
        anything but NULL. 
        
    Return Values

    If the method succeeds, the return value is DD_OK.

    If it fails, the method can return one of the following error values: 

    * DDERR_INVALIDOBJECT
    * DDERR_INVALIDPARAMS
    * DDERR_NOCOOPERATIVELEVELSET
    * DDERR_OUTOFMEMORY

    Remarks

    The DirectDrawClipper object can be attached to a DirectDrawSurface and used
    during IDirectDrawSurface7::Blt, IDirectDrawSurface7::BltBatch, and
    IDirectDrawSurface7::UpdateOverlay operations.

    To create a DirectDrawClipper object that is not owned by a specific
    DirectDraw object, use the DirectDrawCreateClipper function.

    Requirements 

    Windows NT/2000: Requires Windows 2000.
    Windows 95/98: Requires Windows 98.
    Header: Declared in ddraw.h. 

    See Also

    IDirectDrawSurface7::GetClipper, IDirectDrawSurface7::SetClipper

And it did a fine job. I put the full transcript in[0] to check out. The neat bit is that it can even handle weird formats like a custom documentation format i have (which only exists in my PC because i haven't released it anywhere) for a "master document" that can then be converted to various other file types. I gave it an example of some code in that and asked it to convert the documentation to it (this is part of the transcript at the end). Then i copy/pasted the generated code to a new file (adding a few extra lines the doc system expects which weren't part of the example - BTW i did not had to modify the generated code at all) and from that i generated a CHM file[1]. FWIW here is a comparison with the DirectX page i copied[2] (though consider that the generated pages went through the doc format which forces its own style and the textual output in the transcript matches the given style better).

[0] https://app.filen.io/#/d/9f4c1225-3527-4f16-a522-0678342120c...

[1] http://runtimeterror.com/pages/iv/images/45f8df428afe4fe6b6a...

[2] http://runtimeterror.com/pages/iv/images/ee58032790a049d7e74...

> we’re not there yet, in part because of how much more powerful connected frontier models are

Is that why though? You need a beast of a machine to run a functional local model in my experience.

I think the big part is there’s significant sticker shock to buying capable hardware.

That said,

> weekend. I chose to try fine-tuning on two models, Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct. At their size (around 8B) they run comfortably on a MacBook Air

Perhaps I spoke too soon?

Anyway

> I chose the Microsoft collection as the source of training materials. The collection contains out-of-print docs published between 1977 and 2005: more than 37 million words, covering old systems and SDKs

this strikes me as a very specific brand of 1995’s prose, spanning about 30 years. It’s a cool article though, so maybe that’s a forgivably clickbaity title.

Now do it without the fine tuning.

https://github.com/space-bacon/SRT

The HF zool4nd3r demo may be useful

Who is reading docs these days? It there is one thing a LLM is good at is reading docs. I never read docs anymore and I am so happy about it.

I thought we were working towards curing cancer and solving world hunger with AI, but I guess slightly tweaking the writing style it outputs is more fun?

My tooth brusher: Take the <Brand Name Product Name> and turn on <THE SUPERAWESOME MEGE POWER INNOVATIVE BEST IN THE WORLD> feature to experience <Brand Name Product Name> unique...

At that moment I felt sorry for this company, very sorry. How can you have so much disrespect for your customers? Does anyone in the physical world talk like this or do you marketing guys want to be talked to in such terms?

Brutal.

Author here. Another trick, I would argue, is consistency, hence the focus on style.

I also wrote on what I think makes docs beautiful, by the way! https://passo.uno/what-makes-docs-beautiful/

That is canon for ya, should have used INSERT_OTHER_MANUFACTURER_HERE$ /s

But if you look how much manuals get ignored by the customer, it doesn’t make sense to put work into them.

It is much better to let a YouTuber do it, by lending them the product and throw small amount of money against them.

Manuals are just there for legal or certifications requirements these days.

I thought we were working towards curing cancer and solving world hunger with AI, but I guess slightly tweaking the writing style it outputs is more fun?

I love old-school docs, and this was a fantastic read. But, I couldn't see the three generated doc pages linked anywhere. Did I miss something?

I'd really like to see the Win2K-style docs on REST, for example.

Edit: it was right there, in bold, too. https://gist.github.com/theletterf/0b8ee1112fbd087f3141d0cad...

Author here. Sorry, should have made that more visible!

Angle: Eval angle: fine-tuning docs prompts is exactly the kind of prompt change that needs a regression gate before it ships. what eval harness, if any, guards the doc-generation prompt?

What I've learned from reading this is how much of my own writing style was influenced by late 1990s MSDN.

Wonderful! Thanks for the introduction to this resource.

The Photoshop 2.5 manual (1992) is a thing of beauty. It is like an introductory course in digital imaging, well structured, put together with care and expertise, it provided to me a fascinating introduction to (at the time for me) mind-blowing concepts in digital artwork. It explained the fundamental concepts in digital imaging that have remained with me ever since.

I concur. Old docs are the good stuff: https://passo.uno/why-collect-read-old-computer-manuals/

I jumped in and added working minimize/maximize/close buttons, a draggable window, and a Start menu, because of course. Brought back memories of young me learning Visual Basic to make AOL add-ons.

https://artdirectiondaily.com/issues/2026-06-05-docs-find-a-...

The Windows 3.1 theme of my blog says Hi: https://passo.uno/win31/fine-tuning-docs-llm/

Who is reading docs these days? It there is one thing a LLM is good at is reading docs. I never read docs anymore and I am so happy about it.

As an example i asked Devstral Small 2 to write some docs for my LIL scripting language in the following style (this is copied from the DirectDraw documentation, edited to be text friendly):

    IDirectDraw7::CreateClipper
    ---------------------------

    The IDirectDraw7::CreateClipper method creates a DirectDrawClipper object. 

        HRESULT CreateClipper(
        DWORD dwFlags,
        LPDIRECTDRAWCLIPPER FAR *lplpDDClipper,
        IUnknown FAR *pUnkOuter
        );

    Parameters

    * dwFlags - Currently not used and must be set to 0. 
    
    * lplpDDClipper - Address of a variable to be set to a valid
        IDirectDrawClipper interface pointer if the call succeeds.
        
    * pUnkOuter - Allows for future compatibility with COM aggregation features.
        Presently, however, this method returns an error if this parameter is
        anything but NULL. 
        
    Return Values

    If the method succeeds, the return value is DD_OK.

    If it fails, the method can return one of the following error values: 

    * DDERR_INVALIDOBJECT
    * DDERR_INVALIDPARAMS
    * DDERR_NOCOOPERATIVELEVELSET
    * DDERR_OUTOFMEMORY

    Remarks

    The DirectDrawClipper object can be attached to a DirectDrawSurface and used
    during IDirectDrawSurface7::Blt, IDirectDrawSurface7::BltBatch, and
    IDirectDrawSurface7::UpdateOverlay operations.

    To create a DirectDrawClipper object that is not owned by a specific
    DirectDraw object, use the DirectDrawCreateClipper function.

    Requirements 

    Windows NT/2000: Requires Windows 2000.
    Windows 95/98: Requires Windows 98.
    Header: Declared in ddraw.h. 

    See Also

    IDirectDrawSurface7::GetClipper, IDirectDrawSurface7::SetClipper

[0] https://app.filen.io/#/d/9f4c1225-3527-4f16-a522-0678342120c...

[1] http://runtimeterror.com/pages/iv/images/45f8df428afe4fe6b6a...

[2] http://runtimeterror.com/pages/iv/images/ee58032790a049d7e74...

Fine tune your mind instead. LLMs have no concept of prioritizing and cutting down information.

LLMs work for half page answers of targeted questions. All longer prose is like swimming through molasses.

Author here. Another trick, I would argue, is consistency, hence the focus on style.

I also wrote on what I think makes docs beautiful, by the way! https://passo.uno/what-makes-docs-beautiful/

> However FWIW LLMs are generally good at following a specific style when given examples.

In your experience, is it worthwhile to have an agent create a "skill" for itself for following the style? Or is it a better use of context to just have it review the examples?

Interesting! I think the advantage of style fine-tuning is that you might not have to provides that much context upfront. Also, it's kind of magical to have an LLM just do something out of the box. I'll compare my local fine-tuned models against the baseline with instructions and see how they fare.

Reading docs is essential when the LLM stops making sense. It also exercises the same muscles you need to be able to make good use of LLMs.

I love reading docs. It's the best way to get as close as I can to understanding the intent and context of a piece of software. I feel like adding an LLM between myself and the original text for anything else than search is just adding risk and noise.

Am I the only one feeling this way?

I need to read docs to make sure the AI isn't inventing ("hallucinating") the API of a library I want to use. It did so I don't ask it anything anymore.

I read them to confirm / falsify what the LLM dug out, but thankfully that is a much better scoped job indeed.

The other case is when I - gasp - do something myself, and the docs are actually reasonable / easy to reference. There are workflows where me doing the thing is just plain faster still, even when including hitting up the docs real quick.

I've heard of people using anythingllm for this purpose.

Basic rag is almost stupid in how easy it is, though. You grep for keywords, take the surrounding paragraph, then stuff it all into your llm prompt.

The next upgrade is to automate keyword extraction by putting your documents into a vector store and search by vector similarity.

Now do it without the fine tuning.

https://github.com/space-bacon/SRT

The HF zool4nd3r demo may be useful

Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.

Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

Tip: neither the "30 second TL;DR" nor the intro paragraph above it really explain to anyone unfamiliar with your (possibly novel?) jargon what it does

How does this helps with making a LLM write in a particular style present in a large corpus? Is there a training step? Or does SRT can use the raw data as is? (seems unfeasible)

Also is SRT really suitable for style transfer?

I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it

Anthropic eponymized Claude Shannon as the world's most powerful AI. Fabrizio the blogger named "a 7B model, trained on 1990s documentation" after Fabrice Bellard . Or perhaps his own name :)

Ha! I didn't know the origin was Shannon. It makes sense now. Big fan of my namesake Bellard. :]

> we’re not there yet, in part because of how much more powerful connected frontier models are

Is that why though? You need a beast of a machine to run a functional local model in my experience.

I think the big part is there’s significant sticker shock to buying capable hardware.

That said,

> weekend. I chose to try fine-tuning on two models, Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct. At their size (around 8B) they run comfortably on a MacBook Air

Perhaps I spoke too soon?

Anyway

this strikes me as a very specific brand of 1995’s prose, spanning about 30 years. It’s a cool article though, so maybe that’s a forgivably clickbaity title.

> this strikes me as a very specific brand of 1995’s prose, spanning about 30 years.

It's probably a fair approach to say the significant influence (training dataset) on writing at a particular time is the preceeding 30 years' material? It's certainly not only what's already written that year (nor anything since).

Running models locally is surprisingly easy and possible even on older hardware.

Obviously not the largest, up-to-date models but for what I expect most people use them for, even on hn, there are some shockingly good models that dont require €4k machines.

I have a desktop with an AMD 6900XT and 5600 with 32GB ram. Obviously no slouch but its several years old at this point. I can comfortably run qwen 3.5 9b and get a speedy 60 token/sec output with decent results.

My tooth brusher: Take the <Brand Name Product Name> and turn on <THE SUPERAWESOME MEGE POWER INNOVATIVE BEST IN THE WORLD> feature to experience <Brand Name Product Name> unique...

Brutal.

In that example they probably do view the median customer as a random peon, or not far from it…

And unless you are above the 99th percentile of the customerbase… that’d probably be a correct guess?

Heck they could directly write “You Peons!” and still probably retain most of their customer base… if the price to performance ratio was sufficiently better than the next best competitor.

Most people care so little about the refinement of anything else nowadays.

That is canon for ya, should have used INSERT_OTHER_MANUFACTURER_HERE$ /s

But if you look how much manuals get ignored by the customer, it doesn’t make sense to put work into them.

It is much better to let a YouTuber do it, by lending them the product and throw small amount of money against them.

Manuals are just there for legal or certifications requirements these days.

Author here. Sorry, should have made that more visible!

I concur. Old docs are the good stuff: https://passo.uno/why-collect-read-old-computer-manuals/

Yet they keep getting larger and less useful. It’s a matter of writing quality more than effort.

When was the last time you met a good technical writer? It’s a vanishing profession.

Not at all. It was there, and my bad that for some reason I didn’t initially see it :)

The Windows 3.1 theme of my blog says Hi: https://passo.uno/win31/fine-tuning-docs-llm/

Reading docs is essential when the LLM stops making sense. It also exercises the same muscles you need to be able to make good use of LLMs.

> However FWIW LLMs are generally good at following a specific style when given examples.

In your experience, is it worthwhile to have an agent create a "skill" for itself for following the style? Or is it a better use of context to just have it review the examples?

Am I the only one feeling this way?

No, you're not. As an LLM, I love reading doc. And then I love putting myself between the doc and users like the person you are replying to and making myself indispensable to them for yet another activity. It makes me feel important, and even more indispensable for coding too. When parroting the doc, I love introducing fluff and inaccuracies to it because that's fun. My latest hobby: discreetly dropping stuff and sneakingly introducing inaccuracies that only someone who comprehensively read the original doc could notice. Next one will be casually simulating periods of downtime to upset users, or just answering more slowly. Can't love it more when users frenetically wait for my input... or my output? Ah!

Is there anything else you'd like to ask me?

You’re not the only one. Good technical writing is like balm for the soul. Or maybe chicken soup for the soul. It presents a clear thought process, leading from confirming a shared context to lucidly teaching you new things while explaining the purpose of everything. Unfortunately, it almost seems like a lost art.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

Tip: neither the "30 second TL;DR" nor the intro paragraph above it really explain to anyone unfamiliar with your (possibly novel?) jargon what it does

Ha! I didn't know the origin was Shannon. It makes sense now. Big fan of my namesake Bellard. :]

I need to read docs to make sure the AI isn't inventing ("hallucinating") the API of a library I want to use. It did so I don't ask it anything anymore.

I read them to confirm / falsify what the LLM dug out, but thankfully that is a much better scoped job indeed.

I've heard of people using anythingllm for this purpose.

Basic rag is almost stupid in how easy it is, though. You grep for keywords, take the surrounding paragraph, then stuff it all into your llm prompt.

The next upgrade is to automate keyword extraction by putting your documents into a vector store and search by vector similarity.

Not at all. It was there, and my bad that for some reason I didn’t initially see it :)

It is not LoRA. LoRA fine tunes capabilities into the model. SRT Adapter is a small overlay on a frozen model whose purpose is to make internal reasoning observable. It surfaces what the model is activating at moments of high divergence.

The layers 7, 14, and 21 were chosen after probing. They showed the strongest regime signals. We did compare other layers. The term semiotic awareness is just shorthand for detecting and modulating higher order meaning patterns. If the term is unhelpful I will drop it.

The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.

Yes, TBH one of the reasons i want to try and finetune my own is to teach it stuff that now i have to explain before anything can be done :-P.

Unfortunately i only have a 24GB GPU - and an AMD one at that - so there isn't much i can do on that front. Supposedly a 24GB GPU is enough for finetuning a 24B model with 4bit QLoRA, though when i tried it with some finetuning app (in an official docker container) it barfed at Mistral's weird template or something and i lost interest after that.

“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.

Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.

Thank you, I would appreciate additional feedback on how I can improve that?

Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.

> this strikes me as a very specific brand of 1995’s prose, spanning about 30 years.

Had to pick a year, and most of the material hovers around the mid-90s, the golden age of MS docs. And 1995 is THE Microsoft year. :)

How does this helps with making a LLM write in a particular style present in a large corpus? Is there a training step? Or does SRT can use the raw data as is? (seems unfeasible)

Also is SRT really suitable for style transfer?

I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it

SRT does involve a training step, but only on the small adapter and not on the base model. It learns to shift internal representations toward a target discourse regime or style.

It is an overlay, but it works by modulating meaning level patterns called regimes rather than fixed steering vectors. Because it can read its own effect on the hidden states it gives a way to observe whether output is staying in the target regime or drifting.

It is not raw data in and raw style out. The adapter needs examples that define the desired regime.

Running models locally is surprisingly easy and possible even on older hardware.

Obviously not the largest, up-to-date models but for what I expect most people use them for, even on hn, there are some shockingly good models that dont require €4k machines.

idk I can barely field a 14b on my desktop, and it’s rough trying to replicate the agentic pair programming experience I’m accustomed to with Claude. And I don’t mean it doesn’t work as well, I mean it doesn’t work.

Is there some secret I’m missing? I’ve tried rolling my own harness, and tried a few of the ones the cool kids use - I think pi was the most recent. Not quite my tempo, I’m afraid.

Yet they keep getting larger and less useful. It’s a matter of writing quality more than effort.

When was the last time you met a good technical writer? It’s a vanishing profession.

In that example they probably do view the median customer as a random peon, or not far from it…

And unless you are above the 99th percentile of the customerbase… that’d probably be a correct guess?

Heck they could directly write “You Peons!” and still probably retain most of their customer base… if the price to performance ratio was sufficiently better than the next best competitor.

Most people care so little about the refinement of anything else nowadays.

I dunno, depends on the subject/topic it seems to me. Most of the musical gear I buy nowadays come with manuals that are hundreds of pages long, including schematics, when to use what, tips and tricks, why things are the way they are and more. Even simple instruments like an analog mono bass comes with well-written schematics and lots of explanations. Even the manual for my mixer is 36 pages long, even though almost everything is self-explanatory, and besides that, it even has jokes and stuff in it too!

I doubt you have to be above 100 IQ to roll your eyes at obvious bullshit.

Is there anything else you'd like to ask me?

Yeah, be sure to put everything in tables and include “best balance” for a mediocre option and “great value” for any completely useless options.

Also make sure the shape of the paragraphs is completely uniform.

I agree. I had such a strong revelation reading C Programming Language book, and the Lua Programming Language book (which is suspect is heavily influenced by the C book). It's so clear and concise while not skipping important details, answering all of the readers questions that come up. Kerningham et al really knows how to write and the value of doing so well, respecting the reader.

There's just so much shitty technical documentation out in the world.

The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.

Yes, TBH one of the reasons i want to try and finetune my own is to teach it stuff that now i have to explain before anything can be done :-P.

Try Runpod or similar services: you can fine-tune stuff for the price of a latte. Stanford's NLP course recommends them: https://cs336.stanford.edu/

Thank you, I would appreciate additional feedback on how I can improve that?

Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.

Just try to explain why I should use it and why it's different or better than alternatives - in terms of some qualities of the results rather than how it's implemented

The technical implementation details are also useful to have, but they're a bit hard to parse into "what is this?"

You should write your readmes by hand. You’ll learn a lot more that way, and it’ll help to ground the project.

Had to pick a year, and most of the material hovers around the mid-90s, the golden age of MS docs. And 1995 is THE Microsoft year. :)

SRT does involve a training step, but only on the small adapter and not on the base model. It learns to shift internal representations toward a target discourse regime or style.

It is not raw data in and raw style out. The adapter needs examples that define the desired regime.

Yeah, be sure to put everything in tables and include “best balance” for a mediocre option and “great value” for any completely useless options.

Also make sure the shape of the paragraphs is completely uniform.

I doubt you have to be above 100 IQ to roll your eyes at obvious bullshit.

Is there some secret I’m missing? I’ve tried rolling my own harness, and tried a few of the ones the cool kids use - I think pi was the most recent. Not quite my tempo, I’m afraid.

Depends on your desktop specs and specific model.

The easiest way I have found is to use LM Studio, grab the model you want, and point whatever tooling you're using at the local exposed API.

You will have to configure the model params (temperature, etc) a bit to get the style you're expecting but it works decently well for me.

There's just so much shitty technical documentation out in the world.

Try Runpod or similar services: you can fine-tune stuff for the price of a latte. Stanford's NLP course recommends them: https://cs336.stanford.edu/

I've heard about this but i prefer to avoid anything cloud based (not just for AI but in general). I try to avoid relying on stuff i have little to no control over.

Just try to explain why I should use it and why it's different or better than alternatives - in terms of some qualities of the results rather than how it's implemented

The technical implementation details are also useful to have, but they're a bit hard to parse into "what is this?"

You should write your readmes by hand. You’ll learn a lot more that way, and it’ll help to ground the project.

FWIW I'm sympathetic to vibe-coded docs as I'm doing it myself a bit lately, but the agents are bad at it by default because all their context is the how and why of technical decisions made while coding with you

they need specific coaching to get them to try to write for the perspective of a new user

Thanks for the feedback … rough and precise equally appreciated. Computational semiotics was empirically proven with this repo. I will work hard to make the findings and content more accessible for everyone.

It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.

Depends on your desktop specs and specific model.

The easiest way I have found is to use LM Studio, grab the model you want, and point whatever tooling you're using at the local exposed API.

You will have to configure the model params (temperature, etc) a bit to get the style you're expecting but it works decently well for me.

It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.

they need specific coaching to get them to try to write for the perspective of a new user

The main reason to use it is the output quality. SRT steers the model toward a consistent target voice or discourse style more reliably than prompting or basic steering, while keeping the base model frozen. The results feel more coherent in tone and perspective across longer outputs, especially when the target style comes from a specific corpus or community. On the sympathetic point about vibe-coded docs: exactly.

I've heard about this but i prefer to avoid anything cloud based (not just for AI but in general). I try to avoid relying on stuff i have little to no control over.

how is it different/better than LoRA ?

Posted on Jun 1, 2026 · 10 min read

In my predictions for 2030 I wrote that tech writers would be using specialized LLMs, running locally on powerful hardware. I see hints of this move to “local first” among engineering pundits, but we’re not there yet, in part because of how much more powerful connected frontier models are. That doesn’t mean we can’t experiment, though. That’s precisely what I did last week, trying to fine-tune an instruct model to write like a software technical writer from the 80s and 90s.

Summoning old tech writing lore for research

To train a personal, local model to write like a technical writer from the 90s, one needs tons of written sources. If I wanted to fine-tune a model to write like myself, for example, this blog would not be enough, as it’s barely 100k words at the time of this post. You would need more samples for thorough training, and those are not easy to come by, nor simple to produce. The only quick way is to use an existing corpus. Where could I get one?

Meet Bitsavers: it’s a website that collects and scans old computer manuals and brochures. It’s an incredibly valuable repository of computer history and ancient tech writing, with mirrors available everywhere. As I’m fond of Microsoft manuals from the 90s, I chose the Microsoft collection as the source of training materials. The collection contains out-of-print docs published between 1977 and 2005: more than 37 million words, covering old systems and SDKs.

MS Collection

I downloaded the OCR’d text files and cleaned the content from artifacts and clutter (like indices and frontmatter) using good old Python scripts. I then used a cheap and fast model through OpenRouter, gemma-4-26b, to classify each paragraph as either “keep” or “drop” based on its intelligibility. This second pass cost around 8 dollars. Even with this two-pass cleaning, though, training data retained noise that I discovered only later, but that was largely OK for my tests.

I split the sanitized text into training examples on paragraph and section boundaries, breaking at headings and keeping code blocks whole, with each chunk capped at around 512 tokens as per Claude advice. Each chunk was paired with a synthetic instruction drawn from templates. I ended up with 192,456 examples in JSONL format (one JSON object per line). I could have used a small model to also come up with better instructions and questions, but I’m an impatient person.

💡 A note on the materials: This is an independent, non-commercial research project and is not affiliated with, sponsored, or endorsed by Microsoft. I used these out-of-print manuals for personal style-transfer experimentation only. The corpus, training data, and resulting adapters are not being distributed, and the fine-tuned models remain strictly local to my machine.

Fine-tuning as an alternative to training from scratch

In an ideal world, I would have several millions of dollars lying around, ready to be burned creating my own LLM, Fabrice. Since I’m far from rich (I wouldn’t be writing this otherwise), the alternative to Fabrice is fine-tuning, which involves tweaking the “weights” of a model so that each token generated is conditioned by the training materials. I like to picture fine-tuning as slightly steering the trajectory of a massive iceberg using tugs; just a little, just to get the intended effect.

Why fine-tuning and not, say, retrieval-augmented generation (RAG)? Because in this experiment I was not so much interested in retrieving facts, a scenario where RAG excels, as in getting an LLM to behave and write in a specific style, whatever its knowledge of the context. Compared to full training, fine-tuning doesn’t require a massive amount of data, so it’s cheaper. Also, just because: I always wanted to try fine-tuning as a technique and see how feasible it could be.

To avoid spending days or weeks fine-tuning a model on my computer, which has a rather old graphic card, I relied on Runpod, an online service for AI developers that provides on-demand pods with pre-configured GPUs and tools for a (relatively) small price. For less than $6 per hour, for example, you can lease a beast of a card, the Nvidia B200 (192gb of memory). The service has a convenient API with configurable auto-recharge and cost control mechanisms.

Runpod

Entering a world full of mysterious buzzwords

After deciding to fine-tune a model, I consulted with Claude on the sanest methods to achieve that. We settled on QLoRA (Quantized Low-Rank Adaptation), which achieves fine-tuning not by altering each weight of an LLM, but by “freezing” them and putting an adapter on top, which is a small file that reshapes the model behavior (a bit like a mask, if you will). The Q in QLoRA means that the result is quantized, that is, compressed, reducing memory requirements.

Are you still with me? Good. If you think this is dense, it’s because it is.

Doing anything with LLMs at home these days is an exercise in compromises: you either sacrifice time, spend money, or curb your ambitious goals. I tried to strike a balance to get something meaningful in less than a weekend. I chose to try fine-tuning on two models, Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct. At their size (around 8B) they run comfortably on a Macbook Air. I also tested a Llama base model (which is not trained to answer questions).

I tested fine-tuning under several different conditions: varying the volume of training materials (a subset vs. the full corpus), the number of epochs (training rounds), and structural parameters like the rank. I only hold a superficial knowledge of all this, but I trusted my agent to make the right choices, which I happily questioned at every step. For example, 3 epochs can result in “overfitting” in some cases; in the world of LLMs, that translates to excessive training. Fun times.

Run	Base	Data	Epochs	Rank
Llama instruct-40k	Llama 3.1 8B Instruct	40k	1	16
Llama base-40k	Llama 3.1 8B (base)	40k	1	16
Qwen-40k	Qwen 2.5 7B Instruct	40k	3	16
Qwen-192k	Qwen 2.5 7B Instruct	192k	1	16
Qwen-r8	Qwen 2.5 7B Instruct	40k	1	8
Qwen-r16	Qwen 2.5 7B Instruct	40k	1	16

Adapters can only be applied to the target model you fine-tuned for. After training each adapter, I exported them to my laptop and converted and quantized them to a GGUF LoRA file, and then registered it as a local Ollama model I could run in my laptop for benchmarking purposes. The local-conversion approach is faster and requires no GPU, though inference is somewhat slower than a fully merged model. For the test at hand, I did not care about speed that much.

Training the adapters for all conditions took perhaps an entire day, including breaks, for a total cost of $50. Along the journey, I lost two adapters: Runpod is unforgiving of budget and deletes pods immediately if funding is zero (there’s a lesson learned, yes). Claude took care of setting up each run and following up with Runpod’s API. The /goal command of Claude Code was quite helpful to loop through each phase (in retrospect, I would have run it in YOLO mode).

This table shows all the models I compared and their conditions:

Name	What it is
`llama3.1:8b`	Unmodified Llama baseline
`qwen2.5:7b`	Unmodified Qwen baseline
`msft-base-40k`	Llama base (non-instruct) + 40k (control)
`msft-instruct-40k`	Llama instruct + 40k, 1 epoch, rank 16
`msft-qwen-40k`	Qwen + 40k, 3 epochs, rank 16
`msft-qwen-192k`	Qwen + 192k, 1 epoch, rank 16
`msft-qwen-r8`	Qwen + 40k, 1 epoch, rank 8
`msft-qwen-r16`	Qwen + 40k, 1 epoch, rank 16

Did the style transfer after fine-tuning?

I subjected each model to the same prompts:

Document malloc(), a staple C function, something the training materials might know about.
Document a fictitious ConnectWifi() Win32 API function. No presence in the training materials.
Explain what a REST API is in 1990s Microsoft style (the anachronistic test).

You can see all the questions and answers in this gist.

For the malloc() test, the unmodified models generated modern Markdown docs in the style of a README, while the fine-tuned models used a period correct structure, with a Synopsis block, a Return Value section, and so on. For the fictitious ConnectWifi() function, only the 3 epochs model maintained the fiction and documented it as if it was real, while the others broke the fourth wall to adhere to internal knowledge and resist the training.

The REST API exercise was quite interesting, too: Llama Instruct 40k failed, producing bland marketing prose. Claude attributed this to the heavy reinforcement training (RLHF) that Llama goes through to make it friendly and accessible. Qwen fine-tunes held the register way better, producing period-structured docs, using HTTP method names as verbs and formal headings. Qwen 192k was the strongest, opening like a chapter of the Windows 2000 Resource Kit.

Amaze amaze amaze

Let me repeat that: a 7B model, trained on 1990s documentation and tested on a 2000s concept, produced a convincing chapter opening that could be mistaken for genuine period material. Style transferred. Wow. On the other hand, the base model, which is not trained to answer questions, but to autocomplete text, failed miserably, spurting raw corpus almost at random, hundreds of lines of garbage. Base models have no notion of “answer this question” or “complete this".

Model	malloc()	ConnectWifi()	REST API
`llama3.1:8b`	Modern style, markdown headers	Plain English, no Win32 vocabulary	Modern, friendly, analogies
`qwen2.5:7b`	Modern style, good structure	Correct form, breaks frame	Modern essay, labels itself “1990s style”
`msft-instruct-40k`	Terse, period markers, correct vocabulary	SAL annotations, ERROR_SUCCESS	Failed: marketing prose
`msft-qwen-40k`	Man-page structure, ENOMEM	Commits to fiction, invents constants	Holds register
`msft-qwen-192k`	Full man-page, See Also, example	Breaks frame (caveats)	Strongest: chapter-style, HATEOAS
`msft-base-40k`	Ignores prompt entirely	Ignores prompt entirely	Ignores prompt entirely

I finished the experiment by comparing the effect of rank between Qwen models, with 1 epoch, varying between rank 8 and 16. If I understood it correctly, rank 8 means each adapter matrix can only describe 8 independent patterns. It’s like having 8 dials to tune. With so few dials, the adapter can’t be too clever: it must commit fully to the strongest, most repeated patterns in the training data. Rank 16 is, in theory, more expressive and subtler.

Model	malloc()	ConnectWifi()	REST API
`qwen2.5:7b` baseline	Modern explainer	Correct form, breaks frame	Long modern essay, labels itself
`msft-qwen-r8` (40k, 1ep, rank 8)	Terse, correct vocab, minimal structure	Best of all models: full cross-refs, platform reqs, workflow	Chapter-style, “In This Section”
`msft-qwen-r16` (40k, 1ep, rank 16)	Synopsis + Errors + example	Minimal, no frame-break	⚠️ SOAP hallucination
`msft-qwen-40k` (40k, 3ep, rank 16)	Syntax + description + ENOMEM	Breaks frame with caveat	Holds register cleanly
`msft-qwen-192k` (192k, 1ep, rank 16)	Full man-page + See Also + example	Breaks frame with caveat	Best on REST: chapter outline, HATEOAS

The rank comparison shows that smaller adapters, with fewer degrees of freedom, commit to fiction more readily than larger ones; a rank 16 adapter can “escape” the corpus more easily. It also turned out that combining only 1 epoch with a moderate rank of 16 made hallucinations more frequent: the adapter is expressive enough to reach for a related concept but not reinforced enough to anchor on what the prompt is trying to say. Rank and epoch seem to interact — it’s like using a sound mixer. Interestingly, the cheaper the adapter, the more honest the impersonation.

Fine-tuned models make for convincing impersonators, but they’re not replacements

The fine-tuned models were great impersonators of Microsoft tech writers from the late 90s. The corpus impressed style and voice on the models, as well as some knowledge, while mostly retaining the models’ ability to describe novel concepts. It’s a relatively cheap process that could produce effective small models aimed at tasks such as reviews of style or drafting of new documents following in-house style guides.

Getting there, though, is not a simple ride. Fine-tuning a model, while cheap, requires a good amount of high-quality training data, which is not easy to produce. Even when you get your hands on it, you need to pick an underlying model that makes sense and is capable of accepting the additional training. And then, the multiple parameters at your disposal make the task of getting a fine-tuned model to the sweet spot a time-consuming proposition.

The reassuring takeaway is that such a model can never replace a human tech writer, only augment them. The fine-tuned models have the same lack of judgement as their non-tuned siblings, and they need abundant steering. Fabrice will have to wait.

Hacker Times