Makes sense! My colleagues work on some research[1] that's intended to be the counterpart to this: identifying which subset of a format parser is actually activated by a corpus of inputs, and automatically generating a subset parser that only accepts those inputs.
I think you mentioned WUFFS before the edit; I find that approach very promising!
Thanks! Yes, I did mention WUFFS before the edit, but then figured I could make it a bit more detailed. WUFFS is great.
The SafeDocs program and approach looks incredible. Installing tools like this at border gateways for SMTP servers, or as a front line defense before vulnerable AV engine parsers (as Pure is intended to be used), could make such a massive dent against malware and zero days.
I think you mentioned WUFFS before the edit; I find that approach very promising!
[1]: https://www.darpa.mil/program/safe-documents