Edge Case Poisoning (2020)

tgbugs · on Sept 9, 2022

When doing this for scientific protocols I found that you basically have to give up any hope of specializing types of things. The domain ontology for documenting processes, be they making cupcakes or electron micrographs, has to be matched to the domain in question.

Some might say that for processes this means everything must be extremely abstract in order to avoid edge cases like those encountered by the author. However, I would argue that from the perspective of someone executing a recipe they do not care at all about whether something is edible or food or not, they only care how much they need of something (count is a unit(less) of measure). Thus, the ontology proposed by the author is not matched to the domain.

The first mistake was trying to make a distinction between food and non-food. What if I used paper cupcake cups? They may not technically be food, but I have certainly eaten parts of them before by accident. Other parts of the system might care about food/non-food, but these parts are constrained by a separate and likely orthogonal set of use cases.

I don't usually need to know the chemical formula for sodium bicarbonate to order it from a vendor, but if I need to automatically calculate stoichiometry for reactions so that I can automatically order the correct amount then I might. Those two parts of the system can and should be completely orthogonal to each other and thus fully decoupled.

Therefore, I would suggest that encountering something that looks like "edge case poisoning" is a sign that you have not properly factored the system.

VBprogrammer · on Sept 9, 2022

While the example given is adding complexity to the data and trying to model that using the type system I think the point holds in may other cases.

Take betting as an example I work on. The basic idea is that if a bet wins you get paid your stake multiplied by the odds of the bet. If I open the codebase it should be trivial to find where that multiplication happens right? It's such a fundamental part of the code. But actually the edge cases (starting price bets, each-way betting, handicap betting with split line handicaps, multiple bets, dividend bets) mean it's very difficult to point to exactly where that happens. If I had to guess 90% of the code isn't needed at all in the majority of bets which are singles or straight accumulators.

derangedHorse · on Sept 9, 2022

Are you proposing to make the internal model as robust as possible and use good apis for abstraction by domain?

RandomBK · on Sept 9, 2022

I once encountered this problem when writing a video game as an never-published side project.

Most in-game items and resources were very simple - largely expressible by a simple <item_type>:<quantity> dictionary. Others however, could support a never-ending variety of custom attributes and logic.

After much thought, I wound up pursuing a solution that turned out to be quite powerful and extendible:

Items would be represented via a <item_type>:<<attr_key>:<attr_value>> datastructure. Any system that needed to interact with an item would call an 'item handler' assigned to that item type, which exposed a standard interface like 'getQuantity', 'useItem', etc.

Most basic items shared a common handler that stored quantity as an attribute field. However, more complex items could implement custom logic. I guess this is somewhat similar to Mixins or Component Based Architecture.

I think this is partially covered under the 'more abstraction' option in the blog, but I've personally found this to be an interesting and valuable tradeoff that can be deployed in a lot of situations.

dcx · on Sept 9, 2022

What a useful idea to coin a term for! I've seen a huge amount of sweat go into edge case functionality over the years, especially in large systems. I wonder exactly how much you could get away with not doing, if you had good insight into user behavior and applied this philosophy liberally and brutally.

But I'm also a bit skeptical about how far you can push the idea beyond the kinds of design tradeoffs people already make. It's often hard to be certain about which edge cases will prove to be important or valuable in the future, and when you cut this sort of corner, changing your mind later can sometimes be very, very expensive (e.g.: I've experienced the pure agony that comes with migrating away from ancient mainframes, which handled every single edge case in plaintext). Not to mention that in large systems, every tiny edge case ends up being useful to a huge number of people anyway.

Perhaps this term is helpful in the same way that "technical debt" is: it encodes a framing that the desire for hygiene or completeness should be balanced thoughtfully in terms of user benefit and added complexity.

hbrn · on Sept 9, 2022

I'd say it's subset of perfect solution fallacy [1]. Every edge case is seen as a problem that has to be solved, at any cost.

One of the root causes is that when you're designing something, you really suck at understanding costs of your solution.

To you, the cost either seems close to zero (because you are familiar with your own solution), or cost doesn't matter because you think you're solving problems that are essential to domain.

Somewhat counterintuitive, the more senior folks are, the more likely they are to be affected by it.

[1] https://en.wikipedia.org/wiki/Nirvana_fallacy

sitkack · on Sept 9, 2022

Did you just commit edge case poisoning?

hbrn · on Sept 9, 2022

It would only be poisoned if I had to adjust the perfect solution definition to fit my edge case. Luckily, I did not.

civilized · on Sept 9, 2022

I didn't notice any deliberation on the goals of the project and whether there is any benefit at all to creating a typed recipe model, whether it's a simple one that only covers some cases or a complicated one that tries to cover all of them.

If the goal is to publish the recipe on a website for humans to see, you just need the raw blob of text.

If the goal is to calculate calories, you can drop anything that isn't edible, focus on ingredients with a standard unit of measure, and notify the user if the recipe contains ingredients for which the calorie contribution couldn't be determined.

If the goal is automated production, all of this is likely irrelevant.

Etc...

If your goal is to cram types into things to make them cool, unsurprisingly this goal does not give you a basis to make design decisions.

quickthrower2 · on Sept 9, 2022

This is why I laugh at talk of "bug free" software. The best you can do is zero reported bugs. Temporarily.

girvo · on Sept 9, 2022

If there are bugs in software but no one ever runs it, does it really have bugs?

- George Berkley (apocryphal)

taneq · on Sept 9, 2022

Of course! If nobody ever runs software then that is, of course, a bug. ;)

verisimilitudes · on Sept 9, 2022

To the article:

It's fine to handle an explicit subset of a problem. What's not fine is not making this clear. Real software can generally have a well-defined domain and range, which this article muddies with its example.

> This works but is quitter talk.

It's a valid solution. Expressing weird recipes as lists of normal recipes is perfectly fine.

> If we want to include recipe expansion as a feature of our model, we need to make the algorithm more complicated to handle the single edge case of fondant.

Yes, it turns out writing correct software sometimes requires this, even for a small amount of overall cases. That's no argument against not handling all cases.

To the parent poster:

I expect someone incapable of writing perfect software to claim it to be impossible, yes. I'd upload an article I've written proving my point, but it's about correct software, so it would just get ignored here.

Software is mathematics, and mathematics can be perfect.

> This is why I laugh at talk of "bug free" software. The best you can do is zero reported bugs. Temporarily.

No, this is the best the incompetent can do, but I won't let them drag me down to their level.

tempodox · on Sept 9, 2022

I don't disagree but a philosophy that I find practical says that a bug that exists in software and is never encountered in actual use counts as a non-bug. Of course it would be nicer if the bug-in-software didn't exist, but what really counts are actual use cases.

MereInterest · on Sept 9, 2022

By that reasoning, the software that ran the Therac-25 wasn't buggy when used for earlier models. Those earlier models had hardware interlocks, so the incorrect requests made by software didn't have the same fatal consequences as it did for the Therac-25.

tempodox · on Sept 9, 2022

Indeed, it doesn't account for changing conditions in the software's environment. Doing everything within reason to avoid bugs in the first place is clearly preferable.

quickthrower2 · on Sept 9, 2022

True. Not all encountered bugs are reported and not all reported bugs are even bugs!

ipnon · on Sept 9, 2022

Data scientists have the right approach here. Crop the outliers, use the mode for sparse data, make an embedding of categorical variables. In short, just pretend like the data is normalized because that makes for a more accurate model of reality in practice.

defrost · on Sept 9, 2022

Good data science only "psuedo crops" for scale.

Eg: Sat imagery is scan lines full of instrument return values.

The first normalisation is to use 99.9% of the returned value range to setup a colour lookup table.

.1% of the return values could be :

* lens flare

* instrument error

* actual valid but extreme data values.

Depending on the problem domain, after removing (actual) error and bogus values (where possible) you might actually be using the 99.9% of the data to "train" for normal expected background stuff ... and you're really looking for the edge case that is Gold | Uranium | a Hidden tank, etc.

brundolf · on Sept 9, 2022

I'm not sure that translates to situations where you aren't aggregating data

astrobe_ · on Sept 9, 2022

Translation please! ;-)

Does this means that e.g. you just ignore the recipes with optional ingredients, or "normalize" them by making them mandatory (or deleting them), or you create two recipes (one with and one without the optional part)?

yccs27 · on Sept 9, 2022

Crop the outliers -> ignore weird recipes/inedibles Embed categorical variables -> naively convert cups, pinches, "to taste" etc I guess data scientists would also just ignore the "optional" tag before running their analysis.

vajrabum · on Sept 9, 2022

Cropping the outliers is often a hidden abstraction. That might be good for recognizing things but for making things it's likely to be a problem--take the cupcake example. What's the robot (program) going to do with the cupcake? Probably don't want the cups to go in the mixer.

dhosek · on Sept 9, 2022

Oh man, I just had a flashback to the 90s and trying to create a recipe database and all this stuff made me give up. Which I think was probably a better choice than so many of the programs that will turn recipes into shopping lists made.

RunSet · on Sept 9, 2022

I have long maintained that programming has more in common with cooking than with engineering or architecture but it is not taught that way because (among other reasons) engineers and architects command more respect than cooks.

davedx · on Sept 9, 2022

I prefer to look harder at why you want such strict typing in the first place. Is this for a recipes webapp? Just use strings because chances are it’ll end up going over some JSON protocol anyway. Is it for an actual cooking calculator? Then maybe you do need to focus on things like units of measure more carefully. Is it for a safety critical airplane food cargo calculator? Same again, but you’ll probably have to use C making all your types end up as structs…

My point is, consider the problem space, domain, users when thinking about this stuff. Type systems don’t exist in isolation.

mikotodomo · on Sept 9, 2022

> If I really opened to a random recipe, I’d be unlikely to see any of these complexities.

Lots of recipes have optional ingredients. I don't think this person is very familiar with cooking. All cooking apps handle optional ingredients fine. Nobody would use the one you finna make that doesn't have that.

hwayne · on Sept 9, 2022

> I don't think this person is very familiar with cooking.

This is the worst I've been insulted in my life and I demand you duel me in a cookoff. Skillets at dawn! ;)

mikotodomo · on Sept 9, 2022

Sorry XD

I actually just bake stuff from recipes once in a while. Sounds like you're actually a lot more experienced than me.

hwayne · on Sept 9, 2022

All in good fun ;)

As several people pointed out, I never said what my context for encoding recipes was, so here's a bit more about what inspired this example. I throw a lot of dinner parties, and a lot of my friends have dietary restrictions. I always try to pick a menu where everybody can eat at least one entree and at least one side.¹ I want to be able to query my recipes for "vegan-friendly" or "peanut-free". But also a lot of recipes have substitutions: if a pork ingredient can be replaced by tofu, does that dish count as "vegetarian"? Maybe, maybe not, but it'd be nice to have the option to choose.

(The other menu constraint is the cooking process: I can't make two dishes that both take the slow cooker, but I can do two that both take skillets.)

I occasionally look for recipe apps but I never find any that are good for this use case. For now I still print out recipes and put them in a binder and figure out the menus on a whiteboard. So ultimately it's more an interesting example than a problem I'm trying to solve.

¹ I love them to death but I also a breath a sigh of relief when none of my vegan friends can make it

mikotodomo · on Sept 10, 2022

That's a really cool use case! Imagine if people could just list out what ingredients they wanted to restrict to (and pans, etc like you mentioned). Wait, this could also be taken one step further and account for calories!

dljsjr · on Sept 9, 2022

They're specifically talking about a CIA textbook (Culinary Institute of America) which has a bit more structure and consistency in their recipes as its used in culinary school curricula.

arwhatever · on Sept 9, 2022

Based on experience, I suspect this might be more of an issue using programming languages with more expressive type systems, when to leverage the type system to model edge cases exhaustively.

Weaken the type system, reduce the automated test coverage, and you are likely to find that edge cases are addressed on more of a "who's currently yelling at us about what feature isn't working" basis, which will ultimately mean that fewer edge cases get addressed or are more likely to persist as undetected bugs, which could be either desirable or undesirable, depending on your domain.

ZephyrBlu · on Sept 9, 2022

This is too real. It happens very frequently even with relatively structured data.

thriftwy · on Sept 9, 2022

Imagine trying to compete with Microsoft Office (Oracle DB, etc) and adopting this approach out of neccesity, only to discover that the original thing has support for all the edge cases in the book, and some of these which weren't even in your book.

Yes, the original product may be bloated / not pretty, but once paying users discover you don't support an obscure edge case they depend on, they'll stop showing up.

aebtebeten · on Sept 9, 2022

Oracle is actually a great counterexample. When they started out, they lost the db data frequently, and the IBM folk laughed, wondering how they expected to compete when they weren't even Durable.

It turns out Larry started by selling to people doing what would now be called "Business Intelligence", and for them Durable wasn't a necessity, it was just an edge case (they were always side-loading from production anyway).

Oracle then used the profits from this beachhead to fix their durability issues before they started selling into segments that expected their databases to, you know, keep data.

thriftwy · on Sept 9, 2022

A lot of upstart database vendors do that right now - selling to BI/OLAP because they're not stable enough for prime time.

I didn't know that Oracle itself did exactly that, thanks. But I'm talking about the days of mature Oracle.

teddyh · on Sept 9, 2022

The purchase of MySQL by Oracle makes a lot of sense in this light.

nurettin · on Sept 9, 2022

Recipes are programs, so why are we expressing them as types? The recipe will be valid as long as the individual instructions are.

gwd · on Sept 9, 2022

You're thinking too concretely. The point of this article isn't to talk specifically about modeling recipes; it's to use the complexity of recipes (something concrete most people know about) as an example of how this "edge case poisoning" affects the system. I'm sure he has specific non-recipe examples in mind, but using those would either have to 1) give you a lot of background knowledge about the system, which wasn't his point 2) possibly reveal privileged information about clients and so on.

So here's a concrete example: the config file / structure for a virtual machine. Your basic VM has a # of cpus, an amount of memory, a virtual disk, and a virtual network card. Oh, but this VM is actually a "service VM" that is providing an emulated device for another VM. And this VM is actually a fast, ephemeral clone of another VM: it has copy-on-write memory and isn't allowed to write to the disk. And this VM is a live-snapshoting clone of a remote VM: it doesn't execute, but just receives memory and disk updates from the remote VM, until the heartbeat is lost, and then continues. Oh, and this VM's disk is actually provided over the network by a SAN. Oh, and...

The result being that if, like 95% of people, you just want to make a plain VM, you have to wade through a massive list of who-knows-what options to make it work. Balancing making it simple for those 95%, while functional for the other 5%, is a challenge.

CGamesPlay · on Sept 9, 2022

But to the GP’s point, this VM config is a program, so why are you expressing it as a static config file? The ending virtual machine will be valid as long as the steps involved in its creation are.

notriddle · on Sept 9, 2022

No, it’s not, and writing VM configuration as a Turing-complete program means you can’t perform structured queries over all the VMs you manage without executing arbitrary code, and you can’t make bulk modification at all.

At least, not without restricting yourself to a tractable subset like dependabot does.

zodiac · on Sept 9, 2022

The article gives one example - find out if a recipe has ingredient X. You could also imagine "find all recipes (from a cookbook) that are vegan", or "find all recipes I could follow given the contents of my fridge", etc.

jmcomets · on Sept 9, 2022

This is what you'd do in a classic OOP approach. Allows for different behavior across variants by pulling out the shared interface. (I think this is what the author mentions when they speak of "different level of abstraction"?)

The downside of this approach is that for subtrees of shared behavior you can go the multi-level inheritance route (risky if you're not sure the leaves will hold their parent's contract) accept the extra boilerplate for similar behavior.

It's interesting to me how this happens quite often and polymorphism is still our go-to solution.

nurettin · on Sept 9, 2022

In this case, recipe is data and programs can be generated from data. I case of data being equivalent to a program, why complicate things with inheritance or composition? Just repeat the data. We aren't maintainig the code, we are generating it, using it once and discarding it. If you want your data smaller, you just compress it.

suzzer99 · on Sept 9, 2022

Let's say we find a whole new edge case after this thing has been running for six months. Now we need to update the data structure and the generator code that knows how to interpret the data. So I'm not sure how much is gained.

I think one pitfall that a lot of software designers fall into is assuming they can know the entire problem domain up front. Maybe that works for a super-mature industry like airline reservations or something. But I still tend to doubt it.

In my experience you constantly get stuff that borks your data model after going live. I always assume this will happen continuously throughout the lifecycle of the product, and try design accordingly.

planede · on Sept 9, 2022

Maybe a recipe that contains optionals and alternatives is not a recipe. It is a collection of recipes.