This is nice for some pretty limited use cases, but the most common use case for multithreading in app-like programs (which is what Worker-based apps presumably service: these are not documents) is removing latency from the UI thread. But as long as only the main thread can touch the UI, and the main thread also can't access shared memory, this limits uses of this to scenarios where copying the entire render state from a Worker is reasonably fast — in which case, Workers currently already solve the problem. This proposed implementation of shared memory doesn't actually solve one of the big remaining needs for shared memory, which is when it's prohibitively expensive to copy state between a Worker thread and the UI thread at 60fps.
For example, Workers aren't particularly useful for games in their current iteration: the overhead of copying the state of the world back to the rendering thread is high. This is exactly the problem that shared memory would solve, were it not limited to Workers. This puts web export (or even primary web-based game authorship) at a significant disadvantage as compared to native apps: native code can share memory, and web-based implementations can't. In many cases architectures that are optimal for shared-memory threading are pathological when the rendering thread requires copies, meaning that threading gets thrown out the window for web. Even with asm.js-compiled "near-native" performance on the single core, you can only use 25% of the available CPU if you can't use multithreading. A 4x performance hit is the difference between 60fps and 15fps... Or 15fps and ~4fps.
The title of the blog post got me pretty excited, but the proposal is fairly disappointing in terms of unlocking better performance for web apps. The use cases here are pretty limited to things like CPU-bound number crunching, and I doubt too many people are running machine learning algorithms in a browser as compared to the number people who're using browsers to, y'know, render UIs. By all means scope the problem down to sharing primitive data in ArrayBuffers — we can build abstractions on top of that! — but limiting it to Worker threads makes it near-useless for most web applications. Workers already solve the use cases for UIs that can tolerate copies between the UI thread and the Worker threads, and this proposal doesn't allow us to solve needs for UIs that can't.
The blog post does mention the main thread, as something that is more complex and in need of further investigation.
Still, even without shared memory being accessible to the main thread, I think sharing between workers can be extremely useful. Yes, you need to proxy information to the main thread in order to render, but that doesn't need to be a big problem. See for example the test here, where a 3D game's rendering was proxied to the main thread, with little loss of performance,
That very small overhead could be worth letting the game run in multiple workers while using shared memory.
Also, things like Canvas 2D and WebGL are APIs that might exist in workers, there are efforts towards that happening right now. That would eliminate the need to copy anything to the main thread, and avoid a lot of latency.
I can't speak to the BananaBread codebase, since I haven't read it — although BananaBread performs poorly as compared to commercial engines, regardless — but if you look at even the published benchmarks in that blog post, the Worker-based implementation is slower in all cases except for Firefox with two bots. Chrome is always slower when using Workers, sometimes massively so, and Firefox with ten bots is slower multithreaded than single-threaded.
Regardless, shared memory isn't only useful for WebGL. It's useful for any kind of UI where you don't want to block the main thread, and if you don't have DOM access it's tough to make that work. If copying is fine then current Workers are good enough; if copying isn't fine, then this doesn't change that.
> If copying is fine then current Workers are good enough; if copying isn't fine, then this doesn't change that.
I am saying that copying is fine (for most apps).
Copying is fine because BananaBread is indeed less performant than commercial engines, as you said, and that actually makes it a better test candidate here. It does far more GL commands than a perfectly optimized engine would, which means much more overhead in terms of sending messages to the main thread.
Despite that extra overhead, it does very well. 2 bots, a realistic workload, is fast in Firefox, and the slowdown in Chrome (where message-passing overhead is higher) is almost negligible. 10 bots, as mentioned in the blogpost, is a stress test, not a realistic workload. As expected, things start to get slower there. (Game is still playable though!)
And large amounts of GL commands, as tested there, are much more than what a typical UI application would need. So for UI applications, that just want to not stall the main thread, I think a single worker proxying operations to the main thread could be enough. Copying is fine.
The proposal in the blogpost here is for other cases. Workers + copying already solve quite a lot, very well. What this new proposal focuses on are computation-intensive applications, that can multiply their performance by using multiple threads. For example, an online photo-editing application needs this. This might be just a small fraction of websites, but they are important too.
Here are some things that BananaBread doesn't test that I suspect would break larger games in commercial engines:
* High-poly models. This is one area where copying breaks down: that can be a ton of data. BananaBread has a single, low-poly model that it uses for NPCs, and it presumably rarely (if ever?) needs to be re-copied.
* Large, seamless worlds. If you can transfer the entire world into a single static GL buffer, sure, copying isn't a problem since it only happens once on boot. If you need to incrementally load and unload in chunks, you're going to be paying that cost again and again.
* Multi-pass rendering. In fact, the proxying approach makes multi-pass rendering impossible, as noted in the blog post.
By all means, if your application already works single-threaded, or within the confines of the existing spec, you're going to be fine. But memory copies aren't free, and UI offloading is one of the biggest reasons to use shared memory.
Shared memory in Workers is nice — it doesn't make anything worse, and it makes some things better! — but it's a little disappointing that the main thread can't access the shared memory buffers. That's all.
I understand your disappointment, clearly more opportunities are opened up on the main thread. As the article says, this is a proposed first step, and the main thread is trickier, so it can be considered carefully later on. Meanwhile, for a large set of use cases, the current proposal can provide massive speedups.
> if copying isn't fine, then this doesn't change that.
Good point. I imagine it would be possible to transfer ownership of a SharedArrayBuffer to the main thread from a worker, just as it is done today with zero-copy transferable ArrayBuffers.
References to the SharedArrayBuffer in all workers would be made unusable. When the code on the main thread is done it can transfer ownership of the SharedArrayBuffer back to the workers.
I'm of the opinion that adding "proper" (read: low-level) multithreading is a bad idea for Javascript.
How will it interact with the existing event loop? How will scheduling work? How about sharing state properly? Is my work going to crash because some bozo writes a jQuery plugin for background processing that doesn't understand how spinning on a mutex can cause trouble?
Javascript developers generally lack, due to the language and the environment, any sort of empathy for what is and is not a good idea in a place where concurrent modification of shared state is possible.
Web workers, wisely, solved this problem by creating explicit messaging. This gets you most of the way there without running into the really nasty bugs you see in native code.
Adding a shared buffer object as proposed (presumably giving an interleaved set of views onto the same underlyling backing array, much as we do with typed buffers) could be acceptable, because you can guarantee isolated access to elements.
However, adding the more general threading support you see in, for example, C++ would be a nightmare. I'm already a little wary of the maintenance and legacy costs that ES6 are going to impose on us...giving devs the power to do dumb shit with threads isn't going to help our industry.
I think that adding full support for things like Canvas and WebGL rendering would relieve the majority of the remaining issues people have, if done alongside that shared buffers approach.
Hell, written in a functional style Javascript looks none too different from, say, PLINQ, and if they decide to add library-level function parallelism I could absolutely get behind it.
Just save us from copy-and-pasted $.raiseSemaphore() and $.joinOnMonitor()--and history shows us, that is what we'll be expecting.
EDIT:
One more thing...JS does not lend itself to static analysis, and that may make writing safe parallel code even more difficult.
For example, Workers aren't particularly useful for games in their current iteration: the overhead of copying the state of the world back to the rendering thread is high. This is exactly the problem that shared memory would solve, were it not limited to Workers. This puts web export (or even primary web-based game authorship) at a significant disadvantage as compared to native apps: native code can share memory, and web-based implementations can't. In many cases architectures that are optimal for shared-memory threading are pathological when the rendering thread requires copies, meaning that threading gets thrown out the window for web. Even with asm.js-compiled "near-native" performance on the single core, you can only use 25% of the available CPU if you can't use multithreading. A 4x performance hit is the difference between 60fps and 15fps... Or 15fps and ~4fps.
The title of the blog post got me pretty excited, but the proposal is fairly disappointing in terms of unlocking better performance for web apps. The use cases here are pretty limited to things like CPU-bound number crunching, and I doubt too many people are running machine learning algorithms in a browser as compared to the number people who're using browsers to, y'know, render UIs. By all means scope the problem down to sharing primitive data in ArrayBuffers — we can build abstractions on top of that! — but limiting it to Worker threads makes it near-useless for most web applications. Workers already solve the use cases for UIs that can tolerate copies between the UI thread and the Worker threads, and this proposal doesn't allow us to solve needs for UIs that can't.