Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Still not encouraged by the no-GIL, "We don't want another Python 2->3 situation", yet very little proffered on how to avoid that scenario. More documentation on writing thread-safe code, suggested tooling to lint for race conditions (for whatever it is worth), discussions with popular C libraries, dedicated support channels for top tier packages, what about the enormous long-tail of abandoned extensions which still work today, etc.


The big and obvious difference is that all the GIL vs no-GIL stuff happens in the background and your average python dev can just ignore it if they want to. The interpreter will note if you have C extensions that don't opt in to no-GIL and then will give you the GIL version.

This is _very_ different to the 2-to-3 transition where absolutely every single person, even those who couldn't care less, had to change their code if they wanted to use python 3.


> your average python dev can just ignore it if they want to.

Oh, so naive... All the mutation code in Python which "worked" because Python didn't really have any real concurrency. Add to it -- there's no real plan about what to do with Python concurrency. Removing GIL is only one "half" of the problem, you need to give developers some sort of a framework to use to deal with concurrency. Python's threads are extremely underdeveloped and dangerous to use. Python doesn't even have anything like "synchronized" from the Java world. So, all synchronization requires dealing with locks, mutexes, condition variables...

Most Python programs today didn't bother to deal with threads because they didn't confer enough benefits to be worth using. So, "automatically" parallelizing Python code, as in allowing it to run in actual threads is going to bring about lots and lots of bugs in trivial code written by people with no clue about concurrency.


> So, all synchronization requires dealing with locks, mutexes, condition variables...

As always, by far the best way to interact between threads is to use thread-safe queues (AKA message passing). Luckily, Python has one of those [1]. No complicated synchronisation needed.

[1] https://docs.python.org/3/library/queue.html


That's just completely missing the point of threads... but that wouldn't be the first nor the hundreds stupid thing found in Python's documentation.

The reason to want threads is to be able to share memory. That's literally why they were created. If you are sending messages instead of sharing memory, you don't need threads. You need something like Erlang processes.

The problem is that people who wrote Python never had a plan. They sucked and still suck as programmers. So... they knew there are threads. And it was easy to write a bunch of wrappers around pthreads. And that's what they did. And then they realized they don't know how to deal with concurrency, so they found a simple way out -- GIL.

The whole history of Python is the history of choosing the easy but wrong. And it's probably the only consistent thing about the language.


The object going onto the queue is the shared memory. The queue itself is essentially a fancy type of lock.

Yes you could use multiple processes, but you have the extra expense of serialisation, or you could use shared memory but you'd have to administer that and you'd still have the expense of context switches. And inevitably, yes, there is some actual shared state like a logger and modules that you've loaded, which again would be a pain in multiple processes.

You can call a Python thread plus a queue an Erlang process if you like, or say that I should use Erlang processes instead. But the fact is, the Python version works perfectly well for many problems. It does all the things that you typically need if threads: shares state (via the queues), let's you concurrently use the CPU (via C libraries that release the GIL – but no GIL would be even better), and writing blocking IO if you wish. Not missing the point at all.

The developers of Python didn't "suck as programmers" and it doesn't help your point to claim they do. Guido choose to use the GIL because he was OK with multiple threads but not at the expense of single thread performance, and no one showed any solution to that that beats the GIL – until now. (Personally I think the trade off was wrong, and a small hit to single threaded performance would have been with it. But that's different from being ignorant to the fact there was an actual reason.)


Which code is automatically going to run in threads? As you say, basically nobody uses Python threads. So even enabling no-gil, nothing is going to change because sequential code will still be sequential.


> As you say, basically nobody uses Python threads.

Not at all. I'm saying that a sizable portion of Python libraries is completely unaware of threads. But they can still take foreign-own object and operate on them as if threads didn't exist.

So, imagine a simplified hypothetical scenario, where one library has a function for counting keys in a dictionary. This library was written by someone unaware and unwilling to acknowledge thread existence. So, if the dictionary it counts the keys of is modified in a separate thread -- boom! But, third-party code using that library has no easy way of knowing if the library is prepared to deal with threads, and may have been using it for a while, until, again boom!

Now, to make this more concrete: have you ever heard of Boto3, the AWS client library? Well, it does roughly what's described in the paragraph above -- it manipulates a bunch of its own objects in a non-thread-safe way. But, you would really want to use it in threads because that makes it so much easier to manage things like rate-limiting (across multiple clients), and, obviously, you don't want to deploy a large fleet of VMs one-by-one. The end result? -- boom!


Of course a lot of libraries are not thread safe. However, that's not at all rare, lots of libraries for other programming languages aren't thread safe either. My point is that those libraries won't start magically crashing when running in no-gil mode unless the dev using them starts using threads in Python. Yes, it's hard to know which libraries are thread-safe and which ones aren't, and just like any other language you should default to "not thread safe" unless the developer explicitly says otherwise or you inspect the code.


> basically nobody uses Python threads

Not true at all. Plenty of people (including me) use threads in Python for:

* Blocking I/O

* CPU heavy libraries written in C (as those release the GIL)

They work fine, even with the GIL. They only work badly if you want to run a lot of pure-Python (non-I/O) code in multiple threads - which, fair enough, sometimes you might want to do, and the GIL is a problem for that.


any existing async/await code.


Async is for asynchronous I/O, i.e. such I/O that is only possible through file descriptors that support something like epoll() (i.e. network sockets).

It's a thing completely separate from threads, where no code is supposed to run concurrently. The idea of this feature is that a program may schedule a bunch of I/O operations and then wait for their completion instead of scheduling I/O operations one at a time.

As of now, this is an obsolete mechanism of dealing with I/O as now we have uring_io. But, truth be told, it never really worked well... I mean, if you knew what you were doing, you could have taken advantage of this feature, but it was never in shape to be library-grade multi-purpose functionality.

Python made a stupid bet on it and encoded it into the language through async / await keywords. But if this was the only stupid thing Python has done in its history, flying cars and hoverboards would probably be an integral part of our daily lives.


i know what async/await is. the GIL is the only (technical) thing preventing async/await from being concurrent-by-default. nearly every other language that has async/await (or promises) is concurrent-by-default. people will want to run their async python code concurrently, but most async code will not "just work".


> the GIL is the only thing preventing async/await from being concurrent-by-default

Async/await is concurrent, that’s the whole point. Its not usually parallel, because the asyncio runtime (and, IIRC, all the major alternate runtimes) schedules tasks on the same thread, and if there was a multithreaded runtime, its parallelism would be limited by the GIL to only actually having multiple threads making progress if all but one were in native code that released the GIL.

> nearly every other language that has async/await (or promises) is concurrent-by-default.

JavaScript isn't parallel for async/await. Ruby has multiple async/promises implementations, some of which are parallel (use separate threads) to some degree even with the GVL (which is like Python’s GIL), and others are not. (all are, of course, concurrent.)

The GIL limits the value of a multithreading async/await runtime, but it doesn’t prevent it, and a GILectomy doesn’t buy you one for free (or make a multithreading a cost-free choice.)


async/await code already runs in threads, so that's not really a change.


What do you mean it already runs in threads? It does so if you specify it with run_in_executor [1], or if you run multiple event loops at once, but it doesn't automatically.

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asy...


Woops, yeah, you're right. I thought the default executor was a threadpool, my mistake. However, in that case, I assume that the default executor will not change to multithreaded when no-gil comes.


There isn't an executor at all, at least not in the concurrent.futures.Executor sense. It just runs in the thread where you call asyncio.run.


But you need to pick your horse. In 5 years time, Python will either be GIL or no GIL, and it is hard to tell which. It might be a setting (which might be more ideal).

If you assume nogil, you need to choose dependencies that support that. You may need to trade off: eschew dependencies that aren't looking like they will be nogil compatible by the deadline. You are stuck on Python 3.18 maintenance branch or whatever, rather than the 3.19 (in reality .. 4.0) version.

Or choose gil then you can use everything. But is there a prisoners dilemma - everyone picks gil, uses whatever dependencies, library maintainers assuming this don't bother to add nogil support, and then the decision becomes to stick to gil, which if you suspect will happen makes you reason even harder not to support nogil.


I don’t really understand this. Unless I am missing something you should always pick the “no GIL” version as that will work with or without a GIL. Thread safe No GIL code would be totally fine to run on python compiled with the GIL with zero modifications.

Because of this I don’t expect there to be multiple versions of any library. Once a library does the (admittedly heavy) lift to no GIL it will just be the main version of that library going forward.


Each library maintainer (probably mostly volunteers) has to decide whether to put effort into making their code thread safe. Clearly it won't be 100% of libraries that "upgrade".

Then on top of that, they know their effort might be for nothing if the decision is made to keep Python GIL-only all along (one of the possible 3 outcomes at the end of the 5 years: ["gil", "nogil", "both supported").


> Clearly it won't be 100% of libraries that "upgrade".

I'm wondering how many libraries with binary extensions are actually in common use. Like, maybe 90% of python projects use a subset of a few hundred such packages?

That's a hassle if you maintain one of those packages, and will be a bit disappointing if in 5 years' time you're still depending on GIL-reliant packages.

But it's nothing like the chaos of the python 2-3 changes, where ~100% of python files in every package and end-user project had to be fixed.

I only learned about this this morning though, it's very possible I'm missing something. A lot of the concerns people are raising look a bit overblown to me.

I take the point that after so many abortive GIL removal attempts, it's harder to be confident this one will happen. But having the go-ahead from the steering council seems like a good indicator this one has traction.


But thread-unsafe code is not the same as incompatible code. That's the point. You can just choose to say "NOT THREAD SAFE" (just as many C libraries aren't thread safe and need to be wrapped in locks to be used by multiple threads) and users will still be able to use it. More importantly, if it's a pure Python extension, you can just not modify the library and the users will still be able to use it whether or not they have gil or no-gil.


That’s true. I was more thinking from the perspective of a library user not library dev. I suspect for some classes of problem going no GIL will be so tantalizing that the work will definitely be done. Either in the incumbent library or an upstart will come out and take over the community with no GIL support.


Current plan says there has to be separate builds per module, as if it is an ABI break. Would be much better if it could be combined into one build. Hopefully necessity triggers some invention here.


There's no way to make it work with the old ABI. Because sizeof(PyObject) is fixed in the old ABI, there's simply no way to attach additional information (e.g. the new cross-thread ref count) to every Python object. The Python ABI (even the "limited" stable ABI) exposes too many implementation details, it's not really possible to make any fundamental changes to the Python interpreter without breaking that ABI.

You could have a single new ABI supporting both no-GIL and with-GIL, but it wouldn't be compatible with the existing stable ABI.


You're missing something, which is that a lot of libraries will be "i-don't-care-about-gil". Only native extensions need to choose GIL or noGIL due to the ABI difference, but pure Python libraries should run with the same code in both variants. And a lot of them will probably be thread safe at some level (function or class) without any changes. For those that aren't thread-safe, I bet that quite a lot can just get away with a "NOT THREAD SAFE" warning and letting the user wrap access to them with locks.

And that's talking about multithreaded code. I bet that even with noGIL, lots of Python code will still continue to be single-threaded, making the gil/no-gil decision irrelevant (save for those native extensions).


But at least after the transition you could stop caring. NoGIL makes maintainers’ lives worse permanently because now you have to care about it forever if you publish a library.


Why? Once you make your code thread safe it can be run as-is on python compiled with a GIL.


In a past life I hacked on PHP for a living, and in the time it took Python 2 to ride off into the sunset, PHP got two major migrations under its belt in 5.2 to 5.3, and then again 5.6 to 7.0.

It was amazing to see the contrast between the two languages. PHP gave you plenty of reasons to upgrade, and the amount of incompatible breaking changes was kept to a minimum, often paired with a way to easily shim older code to continue working.

I really hope to see no-GIL make it into Python, but in the back of my mind I also worry about what lessons were learned from the 2 to 3 transition. Does the Python team have a more effective plan this time around?


I’ve taken an application codebase from PHP 5.3 to 8.2 now and it was relatively easy the whole way.

The real key to minimize the pain was writing effective integration tests with high coverage. We didn’t have a good test suite to start but once we added some utilities to easily call our various endpoints (and internal API client if you will) and make assertions about the coverage came quickly.

Popular frameworks like Laravel offer such test utilities out of the box now.

That combined with static analysis tools like psalm make it so we can fearlessly move past major upgrades.

One thing I was surprised at was just how much crap PHP allowed with just a notice (not even a warning for a long time). A lot of that stuff still works (although over time some notices have progressed to warnings or errors gradually). We have our test suite convert any notices or warnings to exceptions and fail the test case.


> The real key to minimize the pain was writing effective integration tests with high coverage

I think this makes it really hard to do comparisons: I’ve done Python 2 to 3 migrations which took an hour or two because the code had tests and was well-maintained, and PHP migrations which were painful slogs without tests and sloppy code (“is this ignored error new or something we should have fixed in the 2000s?”). Most developers don’t have enough data points to say whether the experience they had was due to the language or the culture.


I’m not familiar enough with the python transition to say much. I can think of a few things that the PHP developers did that helped make the transition easier:

- multibyte aware string functions were implemented as a separate (and optional) extension with separately named functions (prefixed with mb) and there was a popular community polyfill from the Symfony project (and is for many new language functions). - Weird sloppy behaviours (like performing array access on a Boolean, or trying to access a property on null, and many more than would silently just turn into null/false) had lengthy deprecation periods and if you had error logging turned on you could clean these up relatively easily even without a big test suite.


> multibyte aware string functions were implemented as a separate (and optional) extension with separately named functions (prefixed with mb)

Python had a different take on this with some interesting psychology: you had a new string type which had to explicitly be converted (i.e. concatenating a Unicode string with a byte string causes an exception), which had a stark divide. Projects which had previously handled Unicode correctly converted almost trivially, but the projects which had been sloppy were a morass trying to figure out where Unicode was desirable and where you really needed raw bytes. Almost all of the code I saw where this was a problem didn’t handle Unicode properly but the developers _hated_ the idea of the language forcing them to fix those bugs.


There were valid reasons to be upset at Python 3's handling of Unicode.

- https://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

- Discussion: https://news.ycombinator.com/item?id=7732572

- https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journ...

- Discussion: https://news.ycombinator.com/item?id=22036773

Chalking these complaints up to bad development practices is _precisely_ the reason why the Python 3 migration was handled so poorly. If this attitude is repeated for no-GIL Python, it will fail.


I was assuming that no-GIL will only be enabled if all imported libraries support it. That means that they are marked as no-GIL ready and otherwise the import would throw an exception. Not sure how it is implemented now but that sounded very reasonable to me. The no-GIL compatible code would start with the core libraries and then expand from that. Using legacy libraries just means that you have to revert back to GIL-mode. Any no-GIL enabled library should 100% still function in GIL-mode, so I don't expect the Python 2->3 transition situation to repeat.


> what about the enormous long-tail of abandoned extensions which still work today, etc.

I mean there they're talking about keeping GIL in (and I imagine that will be the case for many many years) so those would still keep working. The fear is if some libraries just drop GIL-ful support, but there too I am hopeful for that not to be the case.


> Note that if the program imports one single C-extension that uses the GIL on the no-GIL build, it's designed to switch back to the GIL automatically. So this is not a 2=>3 situation where non-compatible code breaks.

Sounds good enough to me, am I missing something?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: