Exactly once processing is not possible in distributed systems. Anyone that tries to sell that snake oil is dishonest and anyone who buys it should not be making purchasing decisions. The definition and requirement of idempotent processing means systems must be able to handle messages delivered more than once, which irrefutably proves there is no such thing as exactly once.
Even within centralized, monolithic systems with no outside interaction, laws of physics and reality still apply - a pull on the cord, an earthquake, a flooding or a myriad of other things may interrupt message processing resulting in either exactly zero or more times a message will be processed even if the message were being processed within the confines of an embedded micro-controller using hand crafted assembler code.
There is no literal exactly once of course, because it's not physically possible, but "exactly-once" semantics are possible in distributed systems. Data can be resynchronized, processes can be restarted with the side effects removed, etc.
And in the absence of correctly designed idempotent behaviors across the processing pipeline, it becomes unbelievably complex to handle all the corner cases correctly. The problem with marketing folks parroting and promoting the exactly once semantics is that any weak link in the processing chain compromises the correctness of entire system and what is worse is that the implications of subtle errors may not become evident for years to come (history is rife with legal cases and negative consequences of getting it wrong when it comes to folks' money).
In critical sectors like banking, finance and payments, designing systems in well understood and boring manner is absolutely critical while hoping for the best based on shiny brochures and marketechture is a sure recipe for disaster.
"Data can be resynchronized" - yeah, that means you send it again. Not exactly once.
"Exactly once semantics" is semantics. It's at least once with idempotency, which may or may not be able to be guaranteed on the part of the system depending on actual implementation details which the marketing fluff will invariably leave out if they're saying "exactly once". And that's a major problem when relying on such 'semantics'.
> "Data can be resynchronized" - yeah, that means you send it again. Not exactly once.
"Send it again" doesn't mean "processed again". Isn't exactly-once built on at-least-once just binding the result to a future? Then any subsequent accesses or attempts to update will simply return the bound result, which was processed exactly once.
Which is what you -do- to handle 'at least once' delivery. The system can try and hide that complexity from you,but there are still tradeoffs in any implementation. How long in between does that guarantee hold? Does it guarantee that if you fire the same message a year from now it will still recall that it's a dupe (i.e., persist all message identifiers for an infinite amount of time)? Probably not. Does that matter to you? Maybe!
Even if it does persist, does it persist the identifier on receipt of the message, or on sending it to you for processing? If the former you run the risk of crash and never having handled it; you really have no guarantee of delivery to your processor. If the latter, what happens if the receiver crashes before it hands it off to be processed? If simultaneous, is 'processing' atomic? Probably not; what happens if you crash midway through processing the thing? Etc.
That's my point; you need more details to make the system robust. You don't get "exactly once delivery" out of the box; you get a system that attempts it by deduplicating, but there be gremlins, and the fact you're not saying "it's at least once delivery with (details)" means I'm not hearing a technical pitch, but a marketing one.
Futures have well-defined semantics as logic variables. The only question of actual interest that you raise is the lifetime. This is dictated either by the system or the dependent objects, although obviously "unbounded lifetime" handles all possible cases. So lifetime is not "undefined" but "contextual".
Is there a good tutorial on modeling messages in idempotent way that uses non trivial examples and real life cases? Whenever I talk with others working on such systems it turns out that many messages aren't and that there are compensating actions that repair/sync system components when needed/on demand.
I bet analyzing those messages using this framework would reveal exactly why compensating actions are needed.
The takeaway I am getting from this paper is that, you're not just looking for idempotency. Monotonicity is important, as well as commutativity and associativity. If it cannot be expressed in that way, then coordination is required.
This is what I've been told as well from distributed systems experts. It's more "exactly-once most of the time" but believing that it will always be exactly once in all failure situations is delusional.
Agreed. It’s not that I don’t get it; people have always built systems where even idempotency is impossible, and it’s a lot easier to convince business-side stakeholders that “exactly once” can’t be compromised on than that non-idempotency is a critical design flaw. But it’s easier because the term substantially misleads them about what you’re getting.
My understanding is that Exactly Once Processing to Completion" is not the same as "Exactly Once Processing." In the former, any side effect is not committed until the processing has completed, and if you have an externally consistent database then you can have exactly once processing [to completion]. What am I missing?
There are a number of complexities involved but external factors outside of system's control are most critical. Consider for example a customer that places an order with a merchant for $800 paying for three items with prices of $200, $300 and $300 - merchant tries to settle $300 for item 2 that is available in the inventory, then $200 followed by $300 the next day. What if the original $300 got declined due to insufficient funds (bank transfers can take hours or even a day to decline)? How does the bank know whether the next $300 is a retry or remainder of the original authorization? The correct operation requires cooperation and idempotent behavior from the merchant, their acquirer, the network used by the customer and correct implementation at the issuing or money holding bank. Implementing distributed transactions and settlement ledgers across multiple parties? That's what blockchains are for but we are quite a ways from that (part game theory, part tragedy of the commons, if you will).
The $300 payment can be modeled as a payment whose state mutated later.
Or it can be modeled as event stream where on {id=2, t=2, +$300} was posted, and {id=2, t=4, -$300} was declined. If you want to distinguish between a "retry" or a "remainder", you add that into the tuple within the event stream.
That is the idea when you control the system. This however is not a system, it is an ecosystem - payments ecosystem. No one party controls the interfaces or implementations between any two counter-parties, let alone multiple counter-parties along the line. The point I was making was infact about lack of control over externalities like this one. For those in payments, this actually is a real problem, especially with small merchants in developing countries but true for even large ones in the west. And this does not even go into the details of cash oriented systems in india, china and elsewhere.
Even within centralized, monolithic systems with no outside interaction, laws of physics and reality still apply - a pull on the cord, an earthquake, a flooding or a myriad of other things may interrupt message processing resulting in either exactly zero or more times a message will be processed even if the message were being processed within the confines of an embedded micro-controller using hand crafted assembler code.