[0]Made an AppleTalk chat client/server https://github.com/kalleboo/GlobalTalk-Chat
[1]The equivalent to HeapWalker I used was Metroweks ZoneRanger which was bundled with their compiler. It has a nice visualization of how fragmented the memory is https://bitbang.social/@kalleboo/116302075194704555
One day I was encouraged to write a Windows Sockets emulation layer for ordinary dial-up shell accounts like those offered by netcom. The idea was to allow the use of the recently released Mosaic browser without an actual internet connection. I figured sure, no problem. I'll use curl or some other tool in the shell account to do the actual fetching of URLs, transfer styles over zmodem, and simulate all the tcp/ip calls in the DLL.
I couldn't even get started. The reason is that I couldn't understand how the different Windows applications could all share memory allocated at runtime in the winsock.dll.
I asked a highly experienced ex Microsoft person, and he just said what are you talking about. There's no API to allocate shared memory.
So I gave up. 6 months later someone else did it.
Around then I realized the truth: Windows 3.1 had no memory protection at all. Specifically all global variables in DLLs were shared by default. The hard part wasn't sharing memory among users of a DLL. If anything, the hard part was having good discipline to avoid sharing it.
Since I'd only used multiuser Unix in school, and I knew Windows supported multitasking (even if only the cooperative kind), I just couldn't wrap my head around the idea that I'm multitasking operating system could exist without memory protection.
When I wrote a binary translator, I ended up having to keep a translated return stack to optimize RET opcodes. That put me in exactly the same position as the Win16 kernel with regard to having to patch pointers (in case of Win16, just the segment part) on stack.
Of course I did not have the benefit of my guests calling a lock function, so I ended up having to run a garbage collection operation to determine which pointers are in use & take exceptions on now-invalidated segments. Lots of extra work that Windows didn't need: it's nice to be king :-)
This was the magic moment for me, learning Windows 3.0 programming. The idea that my program is no longer master of it's world, but instead is just something that gets loaded and called by Windows.
Win16 programming was an important formative phase in my career. There is a lot of wisdom in old solutions to thorny problems and knowing them often clues you to how one may adapt them to today's problem. For example, when CPU+GPU programming appeared i immediately imagined CPU memory accessed with "near" pointers and GPU memory accessed with "far" pointers with a switch to a pseudo-segment register.
It also conditioned a programmer to learn about various complexities involved and be careful in their programming i.e. it taught you discipline. You understood your compiler, OS and hardware better and how to write code keeping them all in mind. For example, i often say my study of embedded programming started with Win16!
Another bit of cleverness was "Thunking" between 16-bit and 32-bit code. Here is Raymond Chen on how it worked there and Why can’t you thunk between 32-bit and 64-bit Windows? - https://devblogs.microsoft.com/oldnewthing/20081020-00/?p=20...
https://news.ycombinator.com/item?id=48424862
I'll just stop posting on HN.
The differences were (a) that DOS+Windows was designed so that the same programs could run in both real mode, with overlaying, and 286 protected mode, with segmented virtual memory; and (b) that to really save on RAM DOS+Windows had ideas such as the data segments for DLLs being globally shared across all processes. These added all of the complications mentioned in the headlined article and more besides. It was the operating system, not the processor architecture.
Similarly, while locking and unlocking memory blocks is no longer generally a concern, most programs still deal with files, and graphics programs still have to call map/unmap functions to access graphics data. All the same tools apply -- helper functions/libraries, RAII, and leak/sanitizer tools to dynamically detect usage errors.
Game consoles like NES, SNES and Game Boy had additional hardware built in the cartridge to support memory mapping/bank switching.
For PCs, EMS (memory) provided a similar concept. It reserved a 64 kB window divided in 16 kB pages in the first 1 MB and allowed to map up to 32 MB.
For me it is fascinating how today I can learn a foreign language, or how to code by interacting with the LLM.
Even with 32bit systems where you’d want more than 4GB RAM, application software still had 32 bit addresses (and thus 4GB memory limit).
I think it was a lot more common for 8bit systems to allow for 16 bit addressing though.
It’s been a while though. So hopefully I’m not misremembering things.
It biased your selection of data structures and algorithms.
Max 64KB array size meant pointers to allocated structs and linked lists were much more popular back then versus 1 large array of structs.
The Win16 HANDLE memory allocation also meant you had to worry about how you handle structs which had pointers to others structs (a FAR ptr may not be a stable value, unless you locked the HANDLE for the duration of the allocation)
Then you had to worry about stuff that no college programming book talked about (ignore the lack of error checking):
char FAR *p;
char FAR *mem = farmalloc(65536);
for (p = &mem[65535]; p >= &mem[0]; p--) {
dostuff(p);
}
Welcome to an infinite loop...A submission to survive most likely needs some initial push from non-organic voting.
It probably helps if you share you submission early with your colleagues and in other sites.
The 68k didn't come with an MMU like the 286 so MacOS couldn't rely on virtual memory like OS/2 did but at least the flat memory space meant you didn't have to juggle 64k segments
I think the knowledge of underlying hardware is useful and good to know.
But also that sort of knowledge got dated pretty quickly in the early computer era. Further, the capabilities of things like optimizing compilers quickly got to a point where they'd outpace most hand written assembly. Today, it's basically just floating point operations where you can still do better than a compiler.
In the early days, you'd have the correct impression that the C compilers spat out utter garbage which was a lot slower than what you could hand craft. As optimization techniques got better and better, the work you did because the compiler was dumb ultimately would have gotten in the way.
8-bit microprocessors used 16-bit addresses.
The 6502 and Z80 could use 16 bit addressing to access up to 64kb of memory. The 6502 had various other addressing systems, including iirc 8 bits, but none of them were wider tha 16 bits.
Not as much of a strait jacket as Windows segmented-memory programming, but compared to Unix, it did feel constricting.
I wonder if it's just that kids today (gods that makes me sound old!) are constantly surrounded by entertaining things to do - gaming, TV/films, music, social media.
char FAR *p;
char FAR *mem = farmalloc(65536);
for (p = &mem[65535]; p >= &mem[0]; p--) {
dostuff(p);
}
Nice one.To be fair to Windows, good C courses should still teach this, but I'm not sure if they do :-)
It's UB to set a pointer to before the first element of an array, or after the last element plus one. So, if it knows the call to farmalloc/malloc returns the start of an object, a modern C compiler on a modern architecture may, in principle, optimise the above to an infinite loop.
I've seen something similar on architectures (long ago) where a zero-bit-pattern pointer was a valid memory address you might actually access. Of course p-1 is not less than p when p is zero.
Some rules are obvious -- cutoff mobiles and pads completely (he doesn't have access to them so it's for me), sit in the library and study from books (I believe this is even possible for programming topics as I can write on paper). Basically, cutting off everything electronics definitely helps -- even putting my phone in the bag improves productivity significantly.
But the problem is, my son is unruly. If I put him in the library, most likely he runs around and messes things up, which ends up we leave early without doing anything.
It did. It was bi-modal. There were at one point switches to the WIN command to tell it whether to come up in real mode or 286 protected mode. In the latter it definitely did use the features of protected mode.
It was the bi-modal nature that was the problem. Essentially, they had to design a whole layer that simulated when in real mode all of the load-on-demand stuff that the processor architecture supplied for free in 286 protected mode, and make it so that the thing would all work either way with no changes to applications.
Though in fairness, I do mostly now just use those systems to teach my kids BASIC
The above example would cause an infinite loop on Win16's seg:off far memory model, but compiling on Win32 would not cause an infinite loop.
Problem is that far pointers only affect the offset, not the segment. So decrementing a 0 value offset would just wrap around to 0xFFFF and the segment would stay the same, so you're going from mem[0] to mem[65535] not mem[-1].
I just hope eventually he loves reading and learns in a more traditional way instead of from laptops and pads.
Some potential ideas to explore. Take what you want, leave what you don't.
a) if you're training for attention span, make sure the target is appropriate and also within reach of your child.
b) have a plan for the visit: when I helped at a school library, classes for kids in your kid's age group would come in, the librarian would read them a story, then the kids would look for a book, check out at the desk and read (or look at the book anyway) quietly until the end of the visit. I think we'd get about 40 minutes for a visit. Most days, at least some of the kids would be getting ansy before it was time to go.
c) Plan around your kid's activity needs. Some kids will do long still antention tasks better after doing some amount of physical activity. Some kids will do these kinds of things better after a meal. Some will do it better in the morning or the afternoon. Many kids will have a harder time if the library visit was a surprise. You know your kid, try to have your library visits when they're likely to work well. If he likes story time, try to visit when there's a story time available.
d) don't expect that you can both go to the library and work independently. You're going to the library with him, and he's going to need you to help him out for much of the time. But you might be able to find him a book together, then find you a book together, then sit down and read for a bit together.
e) if all you can get done is finding a book, no big deal. You can read at home too.
If a lion can figure out how to behave in the library, so can your kid ;) https://www.michelleknudsen.com/library_lion_77788.htm
Windows 3.0’s WIN.COM supported:
/R for real mode (8086)
/S for standard mode (16-bit protected mode)
/E for 386 Enhanced Mode (32-bit virtual machine manager (VMM), running Windows in VM1, and DOS apps in VM2+)
This is a kind of knowledge base article which resulted from attempts to understand exactly how memory management works in 16-bit Windows. It is not exactly undocumented, but it is also not well documented; even before Windows 3.0 appeared, the assumption was that essentially all application developers were going to use a high-level language and their development tools would take care of the low-level details.
Furthermore, nearly all materials for beginning Windows developers focused on the more visible aspects of Windows programming, i.e. windows, icons, menus, and so on. Memory management was glossed over, even though it was absolutely critical to writing a solid Windows application any more complex than a Hello World program.

Windows 3.0 SDK HeapWalker memory analysis tool
The memory management details and mechanisms are rooted in the 8086 real mode history of Windows 1.x and 2.x, and much of the complexity persisted even when Windows only ran in protected mode starting with Windows 3.1.
Unless noted otherwise, in this article “Windows” refers to the 16-bit line of Microsoft products, not Windows NT.
The key to understanding Windows memory management is that from the very beginning, Windows was among other things a fancy overlay manager. For many years, Windows was too big for typical PCs of the time and needed some way to keep only the most active memory segments in physical RAM, with some mechanism to discard and reload less frequently needed segments on demand. Paging was obviously not used because there was no support for it in 8086 and 80286 systems (and before Windows 3.0, those were very nearly the entirety of the installed base).
In the simplest case of an application with one code segment and one data segment, the movable nature of Windows segments is almost entirely transparent. When the application is running, the CS (code) segment register points to the code segment and the DS (data) and SS (stack) segment registers point to the data segment. As long as the application only uses near calls/jumps within its code segment and near pointers to the data/stack segment, it does not care at all where exactly the segments are in memory, i.e. the actual values loaded into CS/DS/SS registers. Windows can move the segments around and everything will work fine.
But even beginning Windows programmers working through a Hello World style example very quickly start suspecting that life is not so simple in the land of 16-bit Windows. The window procedure must be declared as FAR PASCAL, which is fair enough given that it needs to conform to Windows calling conventions. But it also has to be exported from the application’s executable, otherwise the program won’t work properly. That is a concept entirely unfamiliar to non-Windows developers.
To help implement its memory management scheme, Windows adopted and extended the “New Executable” (NE) format first used by “DOS 4”, better known as Multitasking DOS 4.0 and significantly different from PC DOS and MS-DOS 4.0/4.01. Unlike the DOS MZ executable format where an application is effectively a single binary blob, the NE format is segment oriented and each segment is stored on disk separately. That gives Windows the ability to load (or reload) individual segments and move them around in memory.
The NE format also supports imports and exports. Imports are used when an application needs to call external code, such as the OS itself. Exports are used for application code which is externally called.
A window procedure is one such externally called piece of code. It needs to be exported so that Windows can perform its magic on it. Said magic lets Windows fix up the window procedure prolog (entry sequence) so that it loads the application’s own data segment into the DS register.
Everything in Windows memory management revolves around segments, contiguous blocks of memory up to 64KB in size. In normal 8086 programming, each segment is identified by its segment address, which directly corresponds to its address in physical memory. Because most segments in Windows can be moved or discarded, they are instead identified by handles. A handle is a 16-bit value which should be considered opaque, even if it might actually a simple index into some table.
For programmers familiar with x86 protected mode, a Windows segment handle is a lot like a protected-mode selector: It is a 16-bit value which uniquely identifies a memory segment, but it is independent of the segment’s location in system memory. The similarity is not coincidental. Steve Wood, the designer of Windows 1.0 memory management, used the Intel 286 protected mode as inspiration1 for the Windows memory manager (the 286 came out in 1982 and work on Windows started in 1983).
A handle refers to a memory segment regardless of where it is in memory, i.e. regardless of what its 8086 segment address is. The GlobalAlloc API allocates contiguous memory from the global heap (possibly more than 64K) and returns a segment handle.
Since the 8086 does not support protected mode, approximating protected-mode functionality takes quite a bit of extra work and discipline. Given that a handle is not a segment address, it can’t be used as the segment portion of a far 16:16 pointer. To address anything in another segment, an application needs to form a far pointer.
To that end, the application needs to call the GlobalLock API which returns a segment address and locks the segment in memory (increments its lock count). While locked, the segment won’t be moved and its segment address will stay valid.
Once it is done accessing memory in the segment, the application calls GlobalUnlock. That decrements the segment’s lock count and once the count drops to zero, the segment may be moved again.
Needless to say, after calling GlobalUnlock, the segment address returned by GlobalLock must be considered invalid. Note that this is a possible source of sneaky bugs—after calling GlobalUnlock, the segment most likely won’t move immediately. An application might erroneously access a previously locked segment after unlocking it and not cause any obvious harm.
Indeed Windows won’t move or discard a segment unless it has to, because it may well be used again. However, once segments are unlocked, Windows may move them around or discard them at any moment.
Now let’s take a closer look at the possible segment types.
Windows segments have several important attributes which determine how they’re treated by the Windows memory manager.
Segments can be fixed or movable. The names are clear enough; movable segments can be shuffled around by Windows as long as they’re not locked, while fixed segments stay in place. For example, segments which hold interrupt handler routines must be fixed so that interrupt vectors stay valid. Ideally most of an application’s code and data segments would be movable, giving Windows an opportunity to efficiently manage memory. The ability to move segments is necessary because freeing or discarding segments creates “holes” in memory, potentially quickly fragmenting memory. Windows needs to be able to compact segments by moving them in order to consolidate free memory into one or more larger chunks.
Segments can also be discardable or nondiscardable. Code segments are typically discardable because they aren’t writable. If an unused code segment is removed and later needed again, Windows can easily reload it from the original executable. The same is true of resources which are also read-only. Data segments, on the other hand, tend to be non-discardable because they’re usually writable and once they’re modified, they cannot just be reloaded from disk. That said, applications might allow writable data segments to be discardable if they are willing to re-create their contents in case the segment is needed again after having been discarded.
Dynamic linking was not yet a widespread technique in the mid-1980s and Microsoft Windows was one of the first systems with support for dynamically linked libraries (DLLs), also called shared libraries. While some larger systems used dynamic linking since the 1970s, UNIX systems only started introducing shared libraries in the mid to late 1980s.
Windows DLLs are NE format images just like Windows applications, but DLLs are not applications. DLLs cannot be executed directly, only loaded and called into by other processes (tasks in Windows parlance). The bulk of Windows was in fact implemented as DLLs (KERNEL, USER, GDI).
DLLs export routines (entry points) that are callable by applications. Applications can be linked against DLLs at link time, with imports referring to DLL names and entry points. DLLs can be also loaded entirely dynamically, and their entry points can be queried by ordinal (number) or by name.
Note that unlike UNIX systems, Windows never had a global name space for dynamic symbol resolution. Symbols from DLLs were always imported first by module name and then by name or ordinal. The two-level name space takes slightly more effort to manage but avoids name collisions, such that if two DLLs export a symbol named Alloc, there is no confusion as to which one is needed because the module name distinguishes between the two. And of course without the two-level name space, imports by ordinal (which are slightly faster and consume less memory) would have been completely impractical.
One key difference between applications and DLLs that is relevant to Windows programming is that DLLs have no stack of their own and always run with the stack of their caller. Although DLLs almost always have their own data segment, it is different from the stack segment, i.e. SS != DS.
This difference means that DLLs must be built differently from applications. The compiler must be told to generate code for DLLs, or more specifically, told that it cannot assume DS and SS registers address the same memory.
In the early days of Windows, the prolog and epilog for DLL entry points was the same as application prolog/epilog. Compiler writers eventually figured out that the prolog for applications can be simplified, because SS equals DS. But that is not the case for DLLs, and DLLs still need to use the old style “fat” prologs that the Windows module loader needs to patch up.
Microsoft C supported Windows development from its earliest days, i.e. version 3.0 (earlier Microsoft C versions were rebranded third-party products; Microsoft C 3.0 was the first C compiler developed by Microsoft, initially for XENIX and DOS).
However, for many years, this support was almost secret. The Windows specific switches were completely omitted from compiler documentation, or they were listed but users were referred to the Windows SDK. That was the case up to and including Microsoft C 5.1, which documents the fact that the /Gw and /Aw switches exist, but does not explain what they do and how to use them, instead referring to the Windows SDK documentation. This perhaps neatly illustrates the somewhat incestuous relationship between the Windows development group and the Microsoft languages group.
Since Microsoft C 3.0 (1985), the compilers had the /Aw and /Gw switches (and also the /Au switch) .
The /Aw switch is a memory model modifier and specifies that SS != DS, but DS should not be reloaded at function entry (because Windows takes care of that). The /Aw switch is meant to be used when generating DLLs.
The /Gw switch generates Windows prologs and epilogs for far functions. It is required for exported functions located in both applications and DLLs, and it is very much a Windows specialty.
So what exactly do those Windows specific function prologs and epilogs look like? Everything is spelled out in the CMACROS.INC file shipped with the Windows SDK. Unfortunately CMACROS.INC is a jumble of MASM conditionals, nearly impossible for humans to read. It’s much easier to see what code the C compiler produces, or what exactly assembly code using CMACROS.INC turns into.
Here’s what Microsoft C 3.0 generates, as shown by a listing file the compiler produces, with added comments:
PUBLIC Proc
Proc PROC FAR
*** 000 1e push ds ; almost
*** 001 58 pop ax ; no-op
*** 002 90 xchg ax,ax ; NOP
*** 003 45 inc bp ; marker
*** 004 55 push bp ; save BP
*** 005 8b ec mov bp,sp
*** 007 1e push ds
*** 008 8e d8 mov ds,ax ; reload DS
; Line 4
*** 00a 8b 46 06 mov ax,[bp+6]
*** 00d 03 46 08 add ax,[bp+8]
*** 010 83 ed 02 sub bp,2
*** 013 8b e5 mov sp,bp
*** 015 1f pop ds
*** 016 5d pop bp ; restore BP
*** 017 4d dec bp ; recover value
*** 018 cb ret
Proc ENDP
First of all, note that the prolog seemingly spends a lot of instructions on doing very little real work. It pushes DS, moves it to AX, and then moves AX to DS after saving DS. It also increments BP before pushing it on the stack, and decrements it again after popping.
All in all, seemingly a lot of effort for nothing. But that’s actually the point: The Windows prolog and epilog code is meant to be harmless when it is not needed.
If the function is in fact exported from a Windows NE module, the Windows loader will patch the first three bytes to load the module’s default data segment into AX. Here’s what it looks like in SYMDEB, taken from a random GDI function:
_TEXT:SELECTOBJECT: 5BC1:1840 B80591 MOV AX,9105 5BC1:1843 45 INC BP 5BC1:1844 55 PUSH BP 5BC1:1845 8BEC MOV BP,SP 5BC1:1847 1E PUSH DS 5BC1:1848 8ED8 MOV DS,AX 5BC1:184A 83EC04 SUB SP,+04
In the above case, 5BC1h is the GDI module’s _TEXT code segment, and 9105h is the default data segment of the GDI module.
The Windows memory manager keeps the prolog updated such that if the data segment moves, the exported functions that refer to it get fixed up again to point to the new address.
Note that the NODATA keyword in a Windows .DEF file tells Windows not to patch the function prolog. This is necessary in situations where e.g. an exported entry point simply jumps to another exported function, or if the function has no need to access the data segment.
Now, what about that BP incrementing and decrementing? Windows depends on being able to walk the stack, and therefore applications and libraries must keep the stack frames in a format that Windows will understand.
When the Windows memory manager moves around segments, it must know whether they are referenced in stack frames that are already pushed on the stack. For example, if Windows tries to move a code segment that directly or indirectly called into the currently executing code, it has to either detect the situation and not move the segment, or move it and adjust the stack. What Windows can not do is move the segment and leave the stack as is. The same is true for default data segments.
Non-default data segments are not a problem because they are either locked and cannot move, or are unlocked and therefore correctly written Windows applications do not keep any pointers into such segments.
Incrementing BP before pushing serves an important purpose: It tells Windows that the BP value was pushed by a far function, i.e. there will be both an offset and a segment on the stack. Obviously, for this scheme to work, stacks must be always word-aligned. Fortunately Windows ensures that they are aligned initially, and it takes some effort to misalign them (because there’s no easy way to push an odd number of bytes on the stack).
It is instructive to compare 16-bit Windows with 16-bit OS/2. The two systems were in many ways very close relatives. Both used the same executable format (NE) with only minor differences. Both used segment-based memory management. Both used the same development tools from Microsoft.
By virtue of using protected mode, OS/2 required less cooperation from the programmer. In protected mode, a segment selector was at the same time the equivalent of a Windows handle and a segment address. Programmers therefore did not need to bother with carefully locking and unlocking segments.
OS/2 applications also did not require any special prolog and epilog code for externally callable functions, and there was no need to explicitly export window procedures etc. from the NE module; there was also no equivalent of (and no need for) MakeProcInstance. In other words, the OS did not need to unwind application stacks, and it didn’t need to patch entry points.
Thanks to the 80286 memory management hardware, segments could be moved, discarded, and reloaded entirely behind an application’s back. There was no need for GlobalLock/GlobalUnlock, eliminating a source of programming errors.
Like Windows DLLs, OS/2 DLL entry points did need a special prolog to set the DS register to the DLL’s data segment, but on OS/2 no special support from the OS was needed. And of course OS/2 DLLs likewise had to be built with the /Aw switch or equivalent, indicating that SS != DS.
Overall, the 286 hardware did a lot of the heavy lifting, and memory management was less work (with less room for bugs) for both the OS and the programmer.
The Windows SDK provided tools designed to stress the Windows memory management. For example, errors related to incorrect segment locking/unlocking will not show up if there is no memory pressure and the mismanaged segment stays in place. Such bugs can remain hidden and in the worst case, only manifest under difficult-to-reproduce scenarios.
The SHAKER tool in the Windows 1.0 SDK was used to “shake” memory and force segments to be discarded and moved around. This was intended to stress the memory management and reveal memory management bugs which would remain dormant under typical conditions.

Shaker and HeapWalker tools in Windows 1.x SDK
Another tool was HEAPWALK, primarily a diagnostic utility capable of displaying the currently allocated segments and their owners. However, HEAPWALK was also able to allocate all available memory and free it up in 1K increments, simulating low memory conditions.

The Windows 3.0 SDK version of Shaker
Shaker and HeapWalker were still shipped with the Windows 3.0 SDK, not least because Windows 3.0 running in Real mode was minimally different from Windows 1.0 as far as memory management was concerned.
These tools were necessary because although the memory management in Windows was sophisticated, the hardware to back it was lacking (certainly before Windows 3.0 running in protected mode). Instead of letting the hardware catch errors like attempts to access unallocated memory, programmers had to use specialized tools to try and induce errors and hope that bugs will manifest in visible ways. This was not an exact science because in the 8086 architecture, every memory address was valid, and reads and writes always succeeded.
The Windows 3.1 SDK replaced the Shaker tool with Stress, a new utility which was designed to test application behavior under low-resource conditions — limited memory in various Windows internal heaps, running out of disk space, running out of file handles, etc.

The Windows 3.1 SDK Stress tool
Since Windows 3.1 only ran in protected mode, some of the earlier memory management issues were no longer applicable, but low-resource conditions were as relevant as ever.
16-bit Windows introduced a fairly sophisticated memory management system. Due to lack of hardware support, significant discipline was required on the part of application programmers. If the wrong compiler switches were used, or functions weren’t properly exported, or segments were not correctly locked and unlocked… all bets were off.
1. Peter Norton’s Windows 3.0 Power Programming Techniques, Peter Norton and Paul Yao, 1990, page 613.