There's two things that cause this. First, Windows has a variable swap file size...

Joker_vD · 2026-03-07T10:31:53 1772879513

Windows "figured it out sooner" because it never really had to seriously deal with overcommitting memory: there is no fork(), so the memory usage figures of the processes are accurate. On Linux, however, the un-negotiable existence of fork() really leaves one with no truly good solution (and this has been debated for decades).

p_ing · 2026-03-07T16:14:25 1772900065

NT has been able to overcommit since it's inception.

goodpoint · 2026-03-07T12:37:05 1772887025

fork is a massive feature, not a bug.

tliltocatl · 2026-03-07T15:33:02 1772897582

fork() is a misfeature, as is SIGCHILD/wait and most of Unix process management. It worked fine on PDP-11 and that's it.

But Linux also overcommits mmap-anonymous/sbrk, while Windows leaves the decision to the user space, which is significantly slower.

rwmj · 2026-03-07T18:07:08 1772906828

Not really. It elegantly solves the "create a process, letting it inherit these settings and reset these other settings", where "settings" is an ever changing and expanding list of things that you wouldn't want to bake into the API. Thus (omitting error checks and simplifying many details):

  pipe (fd[2]);      // create a pipe to share with the child
  if (fork () == 0) { // child
    close (...);     // close some stuff
    setrlimit (...); // add a ulimit to the child
    sigaction (...); // change signal masks
    // also: clean the environment, set cgroups
    execvp (...);    // run the child
  }

It's also enormously flexible. I don't know any other API that as well as the above, also lets you change the relationship of parent and child, and create duplicate worker processes.

Comparing it to Windows is hilarious because Linux can create processes vastly more efficiently and quickly than Windows.

Joker_vD · 2026-03-08T16:21:51 1772986911

> It elegantly solves the "create a process, letting it inherit these settings and reset these other settings", where "settings" is an ever changing and expanding list of things that you wouldn't want to bake into the API.

Or, to quote a paper on deficiencies of fork, "fork() tremendously simplifies the task of writing a shell. But most programs are not shells".

Next. A first solution is trivial: make (almost) all syscalls to accept the target process's pidfd as an argument (and introduce a new syscall to create an empty process in suspended state) — which Windows almost (but not quite) can do already. A second solution would be to push all the insides of the "if (fork () == 0) { ... }" into a eBPF program and pass that to fork() — that will also tremendously cut on the syscall costs of setting up the new process's state as opposed to Windows (which has posix_spawn()-like API).

> create duplicate worker processes.

We have threads for this. Of course, Linux (and POSIX) threads are quite a sad sight, especially with all the unavoidable signalling nonsense and O_CLOFORK/O_CLOEXEC shenanigans.

tliltocatl · 2026-03-07T20:16:02 1772914562

Yes, but at what cost? 99% of fork calls are immediately followed by exec(), but now every kernel object need to handle being forked. And a great deal of memory-management housekeeping is done only to be discarded afterward. And it doesn't work at all for AMP systems (which we will have to deal with, sooner or latter).

In 1970 it might have been the only way to provide a flexible API, but nowadays we have a great variety of extensible serialization formats better than "struct".

inkyoto · 2026-03-08T02:17:08 1772936228

> In 1970 it might have been the only way to provide a flexible API, but nowadays we have a great variety of extensible serialization formats better than "struct".

Actually, fork(2) was very inefficient in the 1970's and for another decade, but that changed with BSD 4.3 which shipped an entirely new VMM in 1990 in 4.3-Reno BSD, which – subsequently – allowed a CoW fork(2) to come into existence in 4.4 BSD in 1993.

Two changes sped fork (2) up dramatically, but before then it entailed copying not just process' structs but also the entire memory space upon a fork.

tliltocatl · 2026-03-08T08:53:16 1772959996

AFAIR it was quite efficient (basically free) on pre-VM PDP-11 where the kernel swapped the whole address space on a context switch. It only involved swapping to a new disk area.

rwmj · 2026-03-08T09:21:57 1772961717

I used MINIX on 8086 which was similar and it definitely was not efficient. It had to make a copy of the whole address space on fork. It was the introduction of paging and copy-on-write that made fork efficient.

Joker_vD · 2026-03-09T10:22:39 1773051759

Oh, is that how MINIX did that? AIUI, the original UNIX could only hold one process in memory at a time, so its fork() would dump the process's current working space to disk, then rename it with a new PID, and return to the user space — essentially, the parent process literally turned into the child process. That's also where the misconception "after fork(), the child gets to run before the parent" comes from.

rwmj · 2026-03-07T21:07:03 1772917623

At no cost apparently, since Linux still manages to be much faster and more efficient than Windows.

ChocolateGod · 2026-03-07T10:56:22 1772880982

Windows will also prioritise to keep the desktop and current focussed application running smoothly, the Linux kernel has no idea what's currently focused or what not to kill, your desktop shell is up there on the menu in oom situations.

p_ing · 2026-03-07T13:08:27 1772888907

The same behavior exists as far back as NT4 Server, which does not provide a foreground priority boost by default.

LargoLasskhyfv · 2026-03-07T09:09:34 1772874574

> As far as I know, Linux still doesn't support a variable-sized swap file...

You can add (and remove) additional swapfiles during runtime, or rather on demand. I'm just unaware of any mechanism doing that automagically, though.

Could probably done in eBPF and some shell scripts, I guess?

dsr_ · 2026-03-07T11:24:20 1772882660

swapspace (https://github.com/Tookmund/Swapspace) does this. Available in Debian stable.

LargoLasskhyfv · 2026-03-07T15:04:04 1772895844

Wow. Since 20 years. And I'm rambling about eBPF...

dlcarrier · 2026-03-07T17:42:51 1772905371

Linux's ePBF has its issues, too.

I once was trying to set up a VPN that needed to adjust the TTL to keep its presence transparent, only to discover that I'd have to recompile the kernel to do so. How did packet filtering end up privileged, let alone running inside the kernel?

I recently started using SSHFS, which I can run as an unprivileged user, and suspending with a drive mounted reliably crashes the entire system. Back on the topic of swap space, any user that's in a cgroup, which is rarely the case, can also crash the system by allocating a bunch of RAM.

Linux is one of the most advanced operating systems in existence, with new capabilities regularly being added in, but it feels like it's skipped over several basics.

kalaksi · 2026-03-07T11:05:21 1772881521

There are daemons (not installed by default) that monitor memory usage and can increase swap size or kill processes accordingly (you can ofc also configure OOM killer).

quotemstr · 2026-03-07T11:03:05 1772881385

Huh? What does swap area size have to do with responsiveness under load? Linux has a long history of being unusable under memory pressure. systemd-oomd helps a little bit (killing processes before direct reclaim makes everything seize up), but there's still no general solution. The relevance to history is that Windows got is basically right ever and Linux never did.

Nothing to do with overcommit either. Why would that make a difference either? We're talking about interactivity under load. How we got to the loaded state doesn't matter.

man8alexd · 2026-03-11T09:58:05 1773223085

There are some mistakes in these blog posts, especially the one about overcommit.