I appreciate your insight, but I just want to expand on one point:
> Having actually worked on charset handling, when most people say "ASCII", they mean "ASCII" and not anything else.
Approximately zero people are referring to a true, packed, 7-bit encoding when they say "ASCII". They're nearly always talking about an 8-bit character set, and in such cases, something must happen when the high bit is 1. (I've never seen one that plain ignores or uses error glyphs for characters >127, although you likely have more experience with this than I do.) This is why I said people are referring to one of these encodings in practice... because ascii is 7-bit, and approximately everyone is talking about some 8-bit encoding of one form or another.
I would definitely agree that most wouldn't call KO18-R "ascii", but they may use the term "ascii" to describe the first 128 characters of KO18-R. (Notwithstanding if it uses weird replacement characters like Shift_JIS does with the backslash and the yen sign.) This is the reason for my comment about how the weird "ascii + custom" all just feels like ascii to me... if you stay below 128 it literally is.
I'll modify my original statement thusly:
> This actually rules out nearly any character set that isn't compatible with ASCII.
And add an addendum that if you don't use UTF-8, you can't use unicode and will be stuck in code page/locale hell.
> I've never seen one that plain ignores or uses error glyphs for characters >127
Reporting an error is the default behavior if you try to decode such a string with the ASCII codec in Python and .NET, at the very least.
The first 128 characters of KOI8-R are, of course, ASCII (the "weird replacement characters" are, in fact, explicitly allowed!). But a file encoded in KOI8-R is only ASCII if it contains those first 128 chars.
> if you don't use UTF-8, you can't use unicode and will be stuck in code page/locale hell.
UTF-7 was a thing. It just turned out that nobody really needed it.
> Having actually worked on charset handling, when most people say "ASCII", they mean "ASCII" and not anything else.
Approximately zero people are referring to a true, packed, 7-bit encoding when they say "ASCII". They're nearly always talking about an 8-bit character set, and in such cases, something must happen when the high bit is 1. (I've never seen one that plain ignores or uses error glyphs for characters >127, although you likely have more experience with this than I do.) This is why I said people are referring to one of these encodings in practice... because ascii is 7-bit, and approximately everyone is talking about some 8-bit encoding of one form or another.
I would definitely agree that most wouldn't call KO18-R "ascii", but they may use the term "ascii" to describe the first 128 characters of KO18-R. (Notwithstanding if it uses weird replacement characters like Shift_JIS does with the backslash and the yen sign.) This is the reason for my comment about how the weird "ascii + custom" all just feels like ascii to me... if you stay below 128 it literally is.
I'll modify my original statement thusly:
> This actually rules out nearly any character set that isn't compatible with ASCII.
And add an addendum that if you don't use UTF-8, you can't use unicode and will be stuck in code page/locale hell.