Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
JSON5 – JSON for Humans (json5.org)
364 points by rickcarlino on Dec 8, 2024 | hide | past | favorite | 344 comments


I think it allows for too much. I was glad that JSON only supports double-quoted strings. It is a feature that removes discussions about which quotes to use. Or even whether to use quotes at all (we still need them for keys with colons or minus in it, so what gives?).

The only thing that JSON is really missing are comments and trailing commas. I use JSONC for that. It's what VSC uses for the config format and it works.


> The only thing that JSON is really missing are comments and trailing commas.

The reason JSON doesn't have comments [1]:

    I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.

    Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
[1] http://archive.today/8FWsA


The reason never made much sense, anything can be abused, and the comment use case is easily way more important, and your suggestion doesn't help with eg syntax highlighting tools for your config that will treat comments as syntax errors, and also lose the ability to roundtrip, so your app can't edit user configs


> The reason never made much sense, anything can be abused (...)

I don't think your take makes sense. Comments were purposely left out of JSON to mitigate the problem described by Hyrum's law.

Smaller interface means a smaller surface to allow abuse. You're just arguing to facilitate abuse because you feel the abuse that was indeed avoided by leaving comments out is unavoidable, which is a self-contradiction.

On top of that,think about the problem for a second. Why do you feel it's reasonable to support comments in data interchange formats? This use case literally means wasting bandwidth with data that clients should ignore. The only scenario where clients are not ignoring comments if exactly the scenario that it's being avoided: abusing it for control purposes.

If you want a data interchange format to support that, you should really adopt something that is better suited for your whole use case.

> and your suggestion doesn't help with eg syntax highlighting tools for your config that will treat comments as syntax errors (...)

That's the whole point. This is by design, not a fault. Why are you pretending a language supports a feature it never supported and explicitly was designed to not support?


> Smaller interface means a smaller surface to allow abuse.

If you take your law seriously, this is irrelevant because the surface of abuse is on the same scale of practical infinity, so it doesn't matter that one infinity is technically smaller.

For example, based of the example in the quote: you could stick those directives from comments info #hashtags in stringy values, with the same effect that there is no interoperable way to parse values (or if you add "_key_comment" - there is no interoperable way to even parse keys as some of them need to be ignored)

So the designer has achieved no benefit by removing a valuable feature

> abuse that was indeed avoided

Nothing was avoided, you can have the exact same abuse tucked into other elements

> Why do you feel it's reasonable to support comments in data interchange formats?

Why does the author of the quote sees the obvious which you don't see even after reading the quote? Go convince him his comment makes no sense because of "data interchange"

Obviously it's not only used for data interchange in cases where every byte matters (reminder: this is a TEXT-based format) and also comments matter to humans working with this data

> adopt something that is better suited for your whole use case.

And this discussion is literally about a format supports that? But also, how does this in any way mitigate the flaws in the designer's arguments?

> Why are you pretending a language supports a feature it never supported and explicitly was designed to not support?

Same thing, why is the author of the quote makes this senseless suggestions then? Go convince him first


> If you take your law seriously, (...)

Hyrum's law is not mine. It's a statement of fact that goes by that name already for a few decades and is noted by anyone who ever worked on production services that exposes an interface and is consumed by third parties.

> (...) this is irrelevant because the surface of abuse is on the same scale of practical infinity, so it doesn't matter that one infinity is technically smaller.

It really isn't. JSON does not support comments, thus they aren't abused in ways that sabotage interoperability. The only option you have is to support everything at the document schema level. You can't go around it.

> For example, based of the example in the quote: you could stick those directives from comments info #hashtags in stringy values, (...)

Irrelevant. You're specifying your own schema, not relying in out-of-band information.

That's exactly how a data interchange format is designed to be used.

There is no way around it. If you understand the problem domain, the fact that leaving out comments is an elegant solution to a whole class of problems is something that's immediately obvious to you.


The arguments presented here would also lead someone to not use JSON for data interchange.

If specifying the schema is important then why not an interchange format with strict schema enforcement like XML?

If minimal feature sets to avoid abuse are so important, then why not a binary format like protobufs which also have strict schema dictates?

And if interoperability is important then why not ditch REST/JSON all together and instead favor SOAP which strictly defines how interoperability works?

That's why I don't buy the "comments might be abused" argument. JSON doesn't have a single problem domain and it's not the only solution in town.


I think the opinionated approach is what made json “win” the format battle more than anything else. If you’ve been around enterprise in the early 00’s you’ll know the horror that XML became when people weren’t using it in any sane manner. Which a lot of people simply weren’t, for whatever reason. Now this is anecdotal but over the decades I’ve never had issues parsing json, and I largely attribute its strictness to that. Yes I suppose you could abuse it, but I wouldn’t have to parse your nonsense, which couldn’t be said for XML.

I don’t have anything against XML by the way, it was purely horrible to work with because people were using it in so many weird ways. Personally I prefer toml, but I guess we all have our preferences.


What made JSON win was the ease of use from JS (= frontend).


100%

JSON wasn't some magical, made-on-the-fly format that makes Crockford some kind of genius. It was simply the standard Javascript object literal notation with some added constraints. I think some of those constraints make sense (i.e. are there any other languages that support both single and double quotes for string literals?), but funnily enough, some of the biggest issues with JSON interoperability is it is very underspecified in the areas that matter, such as the type and width of numeric literals, what to do in some edge cases like duplicate keys, etc. Just did a quick search, and here is a post that outlines some of the real security risks this underspecification leads to: https://bishopfox.com/blog/json-interoperability-vulnerabili...


> are there any other languages that support both single and double quotes for string literals?

Yes, quite a few like Python, PHP, Fortran, COBOL, Lua, R, Ruby, Perl, Bourne Shell, Dart, Groovy, etc.

Though some only interpolate with one or the other.


Back in the early 00’s json wasn’t winning because it was derived from JavaScript, it was winning because it was the “easiest to use” standard on web services.

Sure it’s derived from JavaScript and it plays a major part in frontend development today. It was really the other way around though, when Ajax picked up people started realising that they could use json to make frontends work with “just” html and JavaScript.


It is more that people made comments semantically important, e.g. a tool which did not interpret comments would not correctly interpret the data in the document.

This actually would put JSON in a worse place than XML - while XML has an overly complex infoset, that infoset is actually defined and standardized. Representing "a property with a comment on the property name and one before and after the property value" so that information is not lost in parsing would explode the complexity needed for an "interoperable" JSON library.

if someone wants to create some sort of scheme where they do "createdAt$date" as a property name to indicate the value is to be interpreted in some agreed-upon date format, that at least doesn't lose data if the JSON data doesn't understand that scheme, or require a new custom parser in order to properly interpret that data, compared to something like /* $type: date */ "createdAt" :...


why would you need that complex representation in the interoperable library instead of a much simpler one: a property, a comment, a comment, a value, a comment, ...?

This doesn't explode anything and you don't need to lose any data, so the monstrosity of XML still has no benefit, and neither does "createdAt$date", which would need a custom library anyway, so it doesn't matter where you insert your types


XML may have no benefit to you but it absolutely has benefit.

A bit off topic when the only consideration here is comments, but XML allows for type definitions and structures data that simply isn't possible in JSON.

People have found ways to attempt to add types to JSON but they aren't part of the spec at all and are just convention-based solutions.


I don't understand the problem. But I do understand the desire to keep the json definition simple and concise.

If you need additional fields in the json to hold comments, why not add the fields however you want? And if you need meta-data for the parser, you could add it the same way. In a project, i am working on right now, we simply use a attribute called "comment", for comments.

e.g. use "_" as a prefix to mark comments, and then tell you applications to ignore these attributes.

    {
    "mystring": "string123",
    "_mystring": "i am a comment for mystring",
    "mynum": 123,
    "_mynum": "i am a comment for mynum",
    "comment": "i am a comment for the entire object"
    }


Adding comments doesn't contradict "simple and concise" as it doesn't add much, but on the contrary allows avoiding verbose solutions (repeating names) (but also now you need to do string escaping inside a comment???) such as the one you suggest, which will also not have custom syntax highlighting since it's not part of the spec and have a bunch of other issues (like, now you don't know how many keys you have without parsing every key name to exclude comments)

> If you need additional fields in the json to hold comments

I don't need additonal fields, I need comments

> why not add the fields however you want

Because I can't do that either, there are noticeable limitations


The challenge there is that the solution is convention rather than spec. Comments would only be understood by parsers that follow, or at least support, that convention and all other parsers would treat the data differently.

That may not be a huge problem for you, you see the "comment" key and know its just a comment and can ignore when a parser treats it like a normal string field. It could be an issue though, for example I could see any code that runs logic against all keys in the object becoming harder to maintain.


That solution is clearly weirder and more complex to understand than the comments it replaces. You impose semantic burden while demonstrating that the "abuse" of comments cannot really be prevented.


"comment" may be relevant to the object. Maybe using "_" for the whole object comment would be safer?


It would also be consistent (everything beginning with a "_" is a comment)


This.

You can make special key names that are really directions for something.

You can make enrire k v pairs that are never used by anything that actually parses the json normally.

Argument was invalid as far as I can see and calling it "sorry it makes you sad" is, wow I don't even know where to begin with that.

Having annotation happen in a dedicated place designed for it is better than having it happen where it was not designed to be, end of math problem.


People keep bringing it up that "anything can be abused" but the point is that if you want to abuse something, abuse the simple parseable data rather than comments in the syntax tree.

Your two examples are just two examples of why we don't need comments for data interchange: yes, you put the data in a trivial, stable position in the parsed data that all parsers can parse rather than write some sort of DSL that has to be extracted from comment nodes that may appear in various places in the tree.

Turning this:

    { "key": "value" /* directive */ }
Into this:

    { "key": { "value": "value", "info": "directive" } }
Is the whole point. The more "abusive" you imagine the contents of "directive" to be, the more reason it should exist as the latter data, not more reason we should accept the former.


The whole point is you can't just add your own new keys/values/fields.

Most json users are not writing all of the software that both generates and consumes the json.


You are saying as if there was an apology or it should be needed for some reason.

For any spec there are people who want something spec doesn't do and people writing the spec need to say no to requests that they consider not in scope as much or more often than they say yes


There was literally an apology, a douchebag backhanded false one, but still acknowledged explicitly that there was a thing he knew everyone would want, and knew it would "make some people sad".

Comments are not some weird thing one person wants for their weird reason that no one else needs to care about. It's like leaving out a letter from the alphabet.


One wants comments, another wants single quotes, third wants ...

Just acknowledging someone wants something out of scope is not an apology


It's not out of scope, because this is not a binary data format.

Saying "I intended to make a defctive thing doesn't" doesn't change the fact that it's defective.

A car without windshield wipers is defective, or at best inexcusably limited. Saying "I only designed it to use in the sun" doesn't make it suddenly perfectly useful.

If you want to try to say that json was actually intended to only be used in special conditions like an exotic car with no roof, then it's fair for everyone else to say "Ok, well that intentional design is a crap design of limited use. People actually need a car WITH windshield wipers."

The simple fact that it's a text format invalidates all the attemped arguments that it's just a machine to machine data format never intended for humans to mess with. People aren't "holding it wrong".


It doesn't mean if it's text it must have comments. Simpler format means more people use it. Also parser reliability and speed.

CSV has no comments and is probably more popular than json.


And everyone complains that csv sucks and misses necessary features.


But everyone uses it. If the parser was more difficult it would not be literally everywhere and a go-to.


Then why did he invent json and everyone else invent yaml and toml and xml etc if csv already existed and was so complete?

If small parser was the most important, that's what binary formats are.

Or if the data needs to pass through a text handler that can't handle binary, we already had csv and a few different standard and cheap encodings.

csv's excuse for lacking some of the bare minimum features is that csv was first. Csv is from the time of typewriters and handwritten data. What would eventually become known, the most common needs, just hadn't been encountered yet.

A single person did not create csv after 40 years of seeing what is needed, and decide that csv shall not have an expected functionality in it's spec. There essentially isn't even any spec for csv. No one explicitly authored it. It's a typewriter format.

csv continues to get used after that point because of inertia. Once it's used many places, it continues to get used in new places because most new things have to interoperate with the existing ecosystem if you want to sell to the most possible customers.

Also csv, unlike json & similar, IS a pure data format, not used for things like configuration, because it's really only barely human usable, because the columns don't line up and there's no form of annotation other than a header record. Not to mention every part of the format, quotes, commas, even newlines, are valid data that may appear in a field, and so the only way to read the file is to manually reproduce the streaming state-machine in your head.

If something as limited as csv is so good and doesn't need anything else, then why did he invent json when csv already existed for jobs that narrow in scope?

csv is not remotely proof that text formats don't need annotation.


>and also lose the ability to roundtrip, so your app can't edit user configs

I think roundtrip with comments is not feasible in general. Most code just expects a hashmap, which it edits and then serializes back. You would need some really clever handling for comments to do it.


We're in luck: clever people exist and have written libraries that do the clever handling for us and support roundtripping comments!


Which works at performance cost and only until it breaks...


Lack of roundtrip means it's already broken and loses data, so upgrading to just the potential of a break is a marked improvement worth paying for


Can't lose data which you don't have and in this case there are no comments in json!


If you close your eyes, the data doesn't disappear! Of course you have the data, it's in the comments, and the original comment explains how it got there


If someone classifies comments as data then I'd say someone needs to upgrade data architecture chops

Yes the comment free spec forces to normalize what would be comments into data spec and it is frustrating to put more effort into things


This is about interoperability and integrations. Which relies on some sense of predictability.

Syntax errors and erroneous highlighting are not even item 10000 on my list of JSON concerns.

Dare I pull a tired cliche and say “you’re using it wrong”


That doesn't make it a good reason. People are placing those directives into json docs anyway, but instead they're relying on nobody causing a namespace collision with their special key whos associated value has the directives of interest.


FWIW, I'm well aware of Crockford's rationale, I think it's some of the dumbest rationalization I've heard, and time has shown that it was a giant mistake.

Trying to prevent programmers from doing something "because they may 'misuse' comments" is asinine to the extreme. This is not like removing a footgun, it's trying to dictate how you think programming should be done. This is particularly hilarious because I've instead seen tons of horribly hacky workarounds for JSON's lack of comments, e.g. "fake" object entries like "__previousKey_comment": "this is a comment for previous key", or even worse, putting two entries with the same key since the second key should "win", and thus making the first entry the comment.

As for his second point, "Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser." - just look at the everlasting shit show that is package.json since it can't support comments, because lots of NPM tools actually write to package.json, so there is no guarantee that other tools will be able to read package.json if it includes comments.

I think the thing that I hate most about Crockford's rationalization of not having comments is that you have to have your head stuck 10 feet in the sand to pretend that somehow the lack of comments is a good thing with the benefit of hindsight. I guess I could understand if Crockford's position was "Yeah, originally I thought it was better to keep them out because I was concerned about misuse, but now in retrospect to see the much larger problems it causes, and I realize it was a mistake." But instead he just keeps pushing his original crappy opinion on this topic.


I one time by sort of accident coined a config format by parsing todo.txt It was so useful it stuck around for much longer than I had intended to. Rather than comment out stuff it just looks for "=" and ignores everything else.

   conf = {};
   configFile.split('\n').forEach( function(x){
      y = x.split(' ');
      if(y[1]=="=") conf[y[0]] = y[2];
   });
Everything is a comment with exception of lines like:

speed = 100 km/h

weight = 60 kg

maxBuffer = 200 chars (between 100 and 300 works best)

output: {"speed":100,"weight":60,"maxBuffer":200}

It had walls of text with headings and something configurable sprinkled in. Crockford would be screaming. lol


Reminds me of Whitespace [1] where anything but a whitespace is a comment.

[1] https://en.wikipedia.org/wiki/Whitespace_(programming_langua...


I usually just add my comment as an additional field called _comment or something like that


I have encountered many systems that explicitly disallow “unexpected” JSON elements and such an approach would fail. This is particularly common when using JSON around API-like use cases.


Exactly, and that's a crappier workaround. E.g. in lots of places you can't just have willy-nilly, anything-goes key names.

An example I think every Node developer can commiserate with is that there isn't really a great way to add comments in the dependencies or devDependencies section in package.json, because those keys need to refer to module names, and that's where I most often want to use comments in package.json. I won't rehash all the details but just take a look at https://stackoverflow.com/questions/14221579/how-do-i-add-co... . Unfortunately none of the answers there are very good. In the past I've resorted to doing what https://stackoverflow.com/a/45815391/1075909 does, but that ends up not working very well once you've got a long list of dependencies, because the comment for any particular dependency ends up being far from the entry that specifies the dependency (instead of the line above), so people end up not reading it or forget to update it if that becomes necessary.

Yeah, I really, really hate that JSON doesn't have comments :)


So how do you add a comment in the middle of an array?


How do you add a comment in the middle of a keyword in JavaScript?


I often want to comment individual array values to give more context or describe provenance. I don't think I've ever wanted to add a comment in the middle of a keyword.


These two are not remotely the same thing. You can have an array declared on multiple lines and add a comment to specific values. Or even inline with the value /* */.


I've also seen parser directives inserted into the keys like `{ "fooBar__datetime": 123456780.0 }`, and many other creative workarounds


Actually there is a common practice that java or js library used to serialize typed data.

They just preserve the key starts with $ for special use

So a class A with a double and and an int field will be something like

  {
    "$type": "A"
    "value":{
      "a": {
        "$type": "double",
        "value": 1
      },
      "b": {
        "$type": "int",
        "value": 2
      }
    }
  }
And what about keys that actually starts with $?

You just escape it with special $raw key

  "$raw": {
    "$escaped": {
      "$type": "double",
      "value" 1
    }
  }
It's a bit way too verbose. But given it is just a serialization format inside some library or app, it won't cause too much problems.


@fat is that you?


Did you just call me fat ;)

I don't know who @fat is, but if he thinks Crockford's rationalization for the lack of comments in JSON is total and complete BS, I like the way he thinks.


https://news.ycombinator.com/item?id=3842713

Not sure why all the downvotes? Thought this was a classic.


When I think of how often and where JSON is used, as a data interchange format for APIs, in the browser, databases, and everywhere else - all the billions of bytes transferred in JSON every second - in all those use cases comments would be pointless and counterproductive and just take up storage space or bandwidth. That's JSON's primary use case in the world. It's only in the very few use cases specifically for programmers where comments would sort of be helpful in JSON, and most of those cases are not really that important to support when there are workarounds - structure your JSON well and you can include the comments as part of the JSON, and then you can even use the comments programmatically should that be useful (which I think is slightly more useful than storing knowledge in a JSON file as a comment).

>Trying to prevent programmers from doing something "because they may 'misuse' comments" is asinine to the extreme.

Programmers are often their own worst enemies. Some prefer rigid rulesets over less rigid freeform programming. See Typescript vs Javascript. No comments in JSON is just another example of over-rigidification, and some programmers love that.

>package.json since it can't support comments,

If you're needing to write comments in package.json, maybe you're not approaching package.json correctly? It's really for node things, not your things that you need to write comments about. I'm not even sure why someone would want to write comments in package.json. I get it with comments in other JSON files, but package.json probably should be left for nodejs things.


It's hard for me to find a comment where I disagree with literally every sentence, but congrats!

1. So what if JSON is primarily used for data interchange? It's not like allowing comments would somehow make them magically show up on serialized objects. This objection makes 0 sense to me. And heck, tons of other serialization formats (e.g. XML) support comments. Besides, there is a big reason that human-readable serialization formats are so popular - because they're human readable! If you're really worried about size you should be using a binary format anyway.

2. "Rigid rulesets" has nothing to do with Crockford's arguments. It's one thing to prefer a particular type system, or limit functionality if you think it has high potential for misuse. By JSON not having comments all you end up with is worse workarounds, like putting comments in object keys.

3. "I'm not even sure why someone would want to write comments in package.json" To be blunt, then, I can't believe you've ever written any code in a business (i.e. with multiple developers) in the Node/NPM ecosystem. Is it really that hard to wonder why someone would want to comment why they added a particular dependency to their project? The lack of comments in package.json is one of the biggest complaints of this format, and it's certainly not just me, e.g. https://github.com/npm/npm/issues/4482 and https://stackoverflow.com/questions/14221579/how-do-i-add-co...


3. So when that dependency gets removed by npm uninstall how should that comment be handled? You know that in business we just would end up with a bunch of dead comments in the package.json - is that really a better alternative than just using Ctrl+f to find where the dependency is used?


> comment why they added a particular dependency to their project

Surely that's what the commit message is for? I mean, I get that it's more convenient to have the comment right there in the file, but that should be balanced against the downsides: having to maintain the comment, making the file larger and more awkward to read, etc.


Your rationale could apply to all comments in general. Hey, just have folks scour through the commit messages instead!

Yes, I still think a commit message is important, but it absolutely does not take the place of a comment. Suppose you'd like to do something like this:

    // DO NOT change to latest version. MUST pin to version 1.2.3 because
    // 1.2.4 includes a subtle bug that doesn't work with our version of
    // the zed library. We have an open issue on zed, see
    // https://github.com/zed/zed/issues/1234.
    // Once that is fixed, we can upgrade.
    "foobar": "1.2.3"
There is zero chance that comment is going to be seen if its just in a commit message, furthermore you should never depend on git blame showing a particular comment because it's easy for later commits to overwrite it due to simple things like formatting changes, etc. Yes, in that example case there should be tests too, but IMO the primary purpose for comments is to highlight issues and concerns that aren't immediately apparent just by reading the code.

I simply cannot think of another file format that is used for configs that doesn't support comments.


>There is zero chance that comment is going to be seen if its just in a commit message,

I never suggested using a commit message, there are plenty of other ways to document these things and I'll leave that up to the user to figure out.

So now you've written this tome of a comment in package.json. What happens the next time someone installs a package using npm install? package.json gets rewritten, that's what. Your comment will be gone. I suppose you expect npm to somehow use AI to guess at how to rewrite package.json so it can put your comments in the correct places??

And expecting someone to read package.json before updating or installing a library is just as useful as putting it in a commit message. If you really need to be careful about dependency versioning, you better have more safeguards in place than just a comment in package.json.

>I simply cannot think of another file format that is used for configs that doesn't support comments.

I mostly see configs created as .js or .ts files, where comments are allowed. Not package.json. Never package.json, because package.json is guaranteed to be rewritten regularly. But maybe you missed that part of your bootcamp class? Yes, .js files can be used as "config". It's been done in plenty of projects. It isn't the end of the world. JSON also isn't solely used for configs or data that needs comments, in fact the majority of use cases in the world for JSON won't need any comments at all so writing comments in package.json is kind of an edge-case.

>I simply cannot think of another file format that is used for configs that doesn't support comments.

JSON isn't only a config file format. It's primary use is data transfer.


> I never suggested using a commit message, there are plenty of other ways to document these things and I'll leave that up to the user to figure out.

Dude, I think you're lost, in more ways than one. I was directly responding to a comment that stated "Surely that's what the commit message is for?"

For the rest of your comment, at this point I'd rather have an argument with a dining room table. No shit you can't have comments in package.json now, that's the entire reason that issue https://github.com/npm/npm/issues/4482 is unfixable. If JSON supported comments from the beginning, then tooling would have to respect that, just like the bajillion other config file formats that support tooling that updates the config file programmatically.


>The lack of comments in package.json is one of the biggest complaints of this format, and it's certainly not just me

Well then you and plenty of other people have some wrong ideas about package.json. That isn't surprising.

package.json gets rewritten for all kinds of things, which is not really compatible with adding comments wherever you want. Adding "why this dependency is here" comments may seem like a good idea to add to package.json, but you're kind of missing the point. If you need that level of documentation, trying to shoehorn it into package.json is just the wrong place for it. Soon enough your package.json looks like a graffiti wall.

>To be blunt, then, I can't believe you've ever written any code in a business (i.e. with multiple developers) in the Node/NPM ecosystem.

Then you'll be astonished that I have been working with nodejs for about 14 years professionally. Sure I have wanted to put comments into package.json, but I was naive and now I'm fine not doing that. I haven't wanted to in many years. I document things in other ways and it has served us all very well. YMMV.


> if you need that level of documentation, trying to shoehorn it into package.json is just the wrong place for it. Soon enough your package.json looks like a graffiti wall.

So the right place is to make a graffiti out of another place, instead of in the place where people actually declare the dependencies?

I find it bizarre when people believe in one true way of doing things. I mean, you can declare your dependencies how you like, but if others do it differently, then they're clueless?


You're clueless if you think adding comments to package.json - a file that regularly gets rewritten - is anything but an exercise in futility. Any time you run "npm install [whatever]" you are rewriting package.json. How exactly do you expect to maintain your random comments in this case? You expect nodejs to understand how comments are being written in package.json and not mess that up? You don't seem to understand how npm or package.json works.


This is the silliest of circular logic. Of course you can't add comments to package.json, so tools can do whatever they want to the file. The fact that tools can rewrite parts of it doesn't mean they should just be able to do whatever they want. Literally every other single other config file format supports comments, and I have never seen a problem with tooling on those files due to the presence of comments.



Nice try, but no.


I just took a look at an example package.json file and it seemed fine — no 'comments' shoehorned in anywhere, meaningful key names that reduced the need for comments. Do you have an example of a package.json file that would be better if json supported comments?


I don't have an example file on hand, but they would be useful for documenting non-explicit dependencies between packages. e.g. Kendo UI and the corresponding theme packages[1] - neither NPM package depends on the other, but if there's a comment then when seeing a change to one, the developer or a reviewer should know to double-check the other.

[1] https://docs.telerik.com/kendo-ui/styles-and-layout/sass-ver...


Well not offhand, comments aren't supported so workarounds are used instead. But I would find it very convenient if I could leave comments on dependencies and scripts. Or even various engine requirements.

Sure you can write everything in another file or put the comments in the commit message. But out-of-band comments are more easily missed or lost. If the package.json got rewritten by `npm install` you'd lose the comments. Inconvenient, but that's trivial to fix at commit time.


So instead of comments with parsing directives, people use underscore prefixed keys to keep metadata and comments, that's not a win at all


The thing is that JSON was intended to be a data exchange format, not a configuration file format or anything else. IMHO Crockford's reasoning makes a lot more sense with that in mind.


JSON is a human-readable data exchange format. There is a good reason human-readable data exchange formats are so popular - they can be read and understood by humans! So it seems pretty absurd that a format designed to be read by humans doesn't support comments. If you're really concerned about performance or size, you should use a more efficient binary format for data exchange like protobufs, or heck, BSON (binary json)!


XML was intended as a data exchange format and has comments. Image file formats like JPEG and PNG support comments (and so does SVG by virtue of being XML). Database systems support comments on database objects. It’s really not a convincing argument.


Even if we stick to the data exchange format, it would be practical to have comments in examples of data. This would be good for training and learning, for documentation.


Obviously it's useful in many cases; designing anything like this is an exercise in trade-offs.


This is still possible. There's nothing stopping you from putting comments in the JSON in your file or a webpage. It's just not going to be accepted as valid by JSON parser, which for learning material should be fine.


Data interchange formats are often used for configuration. It makes sense to have a single source of truth in json if your configuration is consumed by an app.


> So instead of comments with parsing directives, people use underscore prefixed keys to keep metadata and comments, that's not a win at all

Your comment doesn't make any sense. You're just pointing out that developers designed data interchange formats as subsets of JSON which explicitly support metadata and comments. This means they are specifying their schemas to support these usecases in a way clients can and do support it. That, if anything, proves right the decision to not support comments because it resulted in better-defined and well-behaving clients.


Comments could be skipped by parsers, instead parsers need to parse and store those keys/values that are not relevant. This is not called efficient.


We should call this "The Crockford Fallacy". Destroying something valuable in fear of a perceived threat.


We should preemptively destroy the concept of the Crockford Fallacy because people might seek to emulate it, right?


You can destroy the concept, but people will reconstruct it from the ironic remains.


One way to bifurcate the world is that there are people who tend to restrict options to prevent misuse vs. people who tend to enable options to the positive uses easier. I fall in the latter camp. Preventing errors by restricting options also prevents positive uses that make life better.

We get to choose what approach we take. I prefer the "give them the rope they might need" philosophy. If they hang themselves, oh well.


I agree, engineering is all a series of tradeoffs, and tradeoffs can go one way or the other.

That said, I have yet to see anyone ever say "Oh man, this programming/configuration/data transfer language would be so great, but I wish they hadn't supported comments!" I see the opposite about JSON all the time.

And I don't see this about other features, e.g. Go specifically decided not to support the ternary operator, and while some folks may like that feature, I think more folks really appreciate that with Go there is generally a "standard" way to do things and it doesn't give you a million different ways to essentially do the same thing.


Software developers very much so care about best practices.

All that needed to be said was, "Using comment programmatically is bad practice. Don't do it."


> I removed comments from JSON because I saw people were using them to hold parsing directives

there's always someone putting actual logic in comments, and when I rule the world, those people are all going to be put on an island by themselves and they're going to have to deal with each other.


It might be abused and you can just pipe it through a preprocessor are not very good reasons.


Abuse of comments is not an academic, theoretical consideration.

https://www2.jwz.org/doc/cddb.html

(Unfortunately, because jwz blocks HN referrals, you can't click on this link, but will need to copy it into the address bar)


And that is fine reasoning, the workaround is not for me… but the larger issue is an inventor kneecapping something by stating how you should use it.

It’s not like there isn’t another side to this argument.


> the larger issue is an inventor kneecapping something by stating how you should use it

This isn't kneecapping something any more than an inventor requiring programmers in a new language use types, or not use types, whichever the inventor deems preferable.

He invented a thing. He declared how the thing is constructed. That's not kneecapping. That's just defining a thing.


It is kneecapped.

He invented a thing and left out a well known and understood core function required in the problem space, deliberately, not through oversight.

That's what makes it kneecapping.

The whole useful thing, ie the idea of a data format, including both knees (ie all the basic features any such thing needs) was there. The concept and necessity of annotation was a well known thing by then, and indeed he knew of it himself too, said so himself, and actively removed one functioning knee from that whole, to produce only the kneecapped thing.

He defined a kneecapped thing. Or he kneecapped the design. Whichever way you want to say it.

The difference would be if it was 40 years earlier and you are taking the fist stab at designing any kind of data format and comments just never occurred to you yet.

This is more like making a new programming language and deciding that it shall not have one of the basic math operators. If you need to multiply, tough shit, do it some other way. Just pipe it through JSMulti.


But he designed it with intention, as stated. It wasn’t like “ha ha suckers, no comments for you!” He just didn’t think it was a good idea and had a reason for it.

Are inventors required to only innovate a little bit? He released it, and there was every opportunity for someone else to offer something better and achieve the same level of uptake.


That's a terrible reasoning for requiring an extra toolchain.


JWCC literally stands for JSON With Commas and Comments.

JWCC is also what Tailscale call HuJSON, as in "JSON for Humans", which as amusingly also what json5 claims to be.

https://github.com/tailscale/hujson


There's also HJSON, which stands for Human JSON.

https://hjson.github.io/

It has implementations in JavaScript, C#, C++, Go, Java, Lua, PHP, Python, Rust.


Who has the xkcd comic about standards handy?


Exactly! Trailing commas (for cleaner commits) and comments are the only pain points I ever felt.

On the other hand:

> leadingDecimalPoint: .8675309

This is just lazy. Can we discuss in depth how much time you saved by skipping the “0” in favor of lesser readability?

> andTrailing: 8675309.,

This doesn’t mean anything to me.


If we consider sigfigs, then isn't 100 and 100. two different numbers given one has a single significant digit and the other has 3? For 101 and 101. it doesn't matter because both have 3 significant digits. Then again, one may argue that it is better to write 1e2 and 1.00e2 instead of 100 and 100.. It also avoids the weirdness of the double period if the number with a dot ends a sentence.

On a personal level, I also don't like ending a number with a . because my brain immediately parses it as a sentence ender and it interrupts my reading flow.


It has nothing to do with laziness in typing.

It's just not wanting to keep track of more rules. If you've only ever used languages where a leading decimal point is allowed, it's a pain point to suddenly have to remember that it isn't here, and for no obviously intuitive reason.

It's about wanting to avoid unnecessary conceptual friction. Not lazy keyboard usage.

(Heck, your second example uses an extra keystroke. And it's perfectly readable to me, based on the languages I use.)


I think if we start applying that rationale we could skip a lot of steps and just move everything over to YAML.


No because YAML is full of ambiguity.

JSON5 is not.

There's something to be said for being flexible in your inputs when they are non-ambiguous. Particularly when dealing with files written by hand.

There's no virtue in imposing overly strict syntax when it serves no human purpose. That's trying to alter people to fit machines, rather than altering machines to fit people.


Yeah, and we could easily say that any property value without a comma, closing bracket, or closing brace; that doesn't follow the format for numeric literals; and isn't the literal value 'true' or 'false' doesn't need to be quoted when part of a value and will implicitly be a string. This adds no ambiguity and some people are used to writing their string literals without any extra quotes, why add the extra cognitive load of having to remember to use quotes? Wouldn't it be nice to write `user: { type: admin, name: Bob }`?

This seems like exactly how we ended up with YAML-and-the-kitchen-sink.

For a spec like this, things should need a better justification than that. Everything starts at -100 points. How does this feature justify the additional complexity in the spec, to the users, and to the people trying to implement this? How does it justify increasing the opportunity for creating subtly incompatible parsers?

Or, if a goal of JSON is not to be simple and interoperable, then I fail to see how it's not just YAML but 5 or 10 years behind the curve, so we may as well skip a bunch of hassle and move to YAML and start fixing the bugs we've got there rather than creating new ones here.


I don't know what to tell you, you seem to be taking a black-or-white approach here of extremes. Like anything that isn't limited to exactly one way of expressing it is a slippery slope to YAML. It's not.

JSON5 is literally exactly JavaScript notation, which tons of people are familiar with. On the other hand, I don't know any programming languages that use strings without quotes or delimiters. (And of course, their existing in YAML is widely recognized as a major source of confusion.)

And in your example, there isn't parsing ambiguity but there would be change ambiguity when you alter abc (without quotes) to 123, because it changes its type.

I'm not aware of any change-ambiguity situations in JavaScript/JSON5 like that? In Python 5 and 5. are different types, but not in JavaScript or JSON.


I never understood leading dot until I understood that native speakers indeed say ".3" (point three).

Trailing makes it a floating point type instead of an integer


Trailing to signal a floating point seems like to niche of a use case to me. Generally it's best to treat every JSON number literal as a 64-bit float anyway for the sake of interoperability.


Right, you guys say like “naught point three”… sounds just as weird to us as ours does to you.

Still… I can be required to put a zero in, read 0.3, and still think “that’s point three”.


It became quite normal for me to write 0.3 and read it as “point three”. I do agree that the English language makes it less awkward to just skip the zero. It leaves very little room for confusion.


It’s not about readability, it’s being realistic about other humans and making software robust in the presence of common, trivial, unambiguous “errors”.

Reference: Postel’s Law

https://www.laws-of-software.com/laws/postel/


Arguably, if there's ambivalence about numbers or other core structures or doubt about the intent, the correct thing to do is to return an error and not try to guess. The point of an error bubbling up is asking for clarity from the end-user, while guessing will lead to random results in ways that can not be detected or corrected anymore later on in the process.

I think Postel's law was intended to apply to alternative implementations of machine-level protocols.

That's not to say that I don't agree that it might be better if JSON implementations would allow trailing commas, which is unlikely to lead to semantic ambiguities. That's too late now though, unless a new JSON to rule them all would appear and we would all agree on that new spec.



Good reference! Turns out this internet draft has been turned into an official, informational RFC:

https://datatracker.ietf.org/doc/html/rfc9413

I don't think Postel's Law was necessarily a bad idea, in its time and context. However it seems to have fallen into usage as an argument-ender. This RFC is a useful antidote to that.


'"bug for bug" compatible'


Well what's more readable, .8675309 that is understood to have an implicit zero, or the parser giving up and unexpectedly making it a string? Maybe it's not your preference but I can't see any problem with making this more robust. The trailing one is strange to me but leaving off a leading zero isn't unusual at all for written numbers in my experience.


> Well what's more readable, .8675309 that is understood to have an implicit zero

Is it universally understood? I think it's a US / English thing. In my country I've never seen numbers written in this way and many people would not "parse" it mentally as 0.8675309


[flagged]


What an extreme reaction. Many people would not be able to parse it since they've never seen a number written in this way, but you immediately write them off as assholes. Wow.


It will look funny to many people but they will be able to interpret it. Remember that this thread is in the context of whether to be strict or relaxed in a specific file syntax for files intended to be authored by humans.


> It will look funny to many people but they will be able to interpret it.

You're still approaching this with the background knowledge of what this is. If you've never seen this, you can only guess, and there are a couple of options.

I've been terminally online for the better part of the last 2 decades, yet I've seen this way of writing for the first time only ~5 year ago or so, and I still remember simply not knowing what it is. The first reaction is that it's simply a typo - the author mistyped - the dot should have been a digit or perhaps not be there at all.


For people who grew up in countries where comma is the decimal separator (and dot the thousands grouping separator), this is highly unintuitive, because it would seem much more likely to be a misplaced punctuation mark.

It might be moderately intuitive to English native speakers because of oral usage like “point one three eight”, but that’s also not a thing in many other languages.


Stripping zero is not a common practice. You are clearly speaking of your own bubble here.

For most of the world that is even more ridiculous than using dot as decimal space separator or writing dates with month not in the middle place.

Even Americans I work with don't write it like that when doing quick draft discussion, as they know it's confusing to others.


It’s not about whether people normally write it that way. In the overall context of the thread it’s whether they ever write it that way.


From my perspective no one does. Ever.

I only know it's a thing because I watched some math-related edu shows. Didn't even see that when I briefly worked in US.


> even more ridiculous than ... writing dates with month not in the middle place

Come on, let's be fair—nothing else will ever be as ridiculous as that!


I could maybe throw into the contest:

- "a gallon". Not purely US thing but almost. It can be anything from 3.8 to 4.4 liters, depending mostly on what are you measuring.

- writing digits. Is it "1", is it "7", is it "i", is it "l"? Why do it to yourself, while their printed fonts are pretty much the same as everywhere else...


IMO the implicit zero is just as much an issue in regular written form. The period could be overlooked quite easily, but seeing a leading 0, one will know what’s really going on.

How could the parser see it as a string? This is not YAML and JSON5 still requires quotation marks.


As a data guy I find myself running into JSONL a fair bit. It was surprising to me that it’s not supported in the vanilla spec.


as a student experimenting with millions of records of data, its pretty nice!


JSONC is fine but VCS should have named its configuration files settings.jsonc since the files are not JSON and will not be parsed by JSON parsers.


I agree, if one of those myriad alternatives is to be used, at least specify that clearly.


I agree 100%. It is such a pain to write tools that have to account for all these exceptions.


> The only thing that JSON is really missing are comments and trailing commas. I use JSONC for that.

YAML[0] supports JSON formatted resources and octothorpe ('#') comments as well. I didn't see anything in the spec specifically allowing for trailing commas however.

Here is an exemplar using the Ruby YAML module:

  #!/usr/bin/ruby
  
  require 'yaml'
  
  
  puts YAML.load(
    %/
      # YAML is a strict superset of JSON, which
      # means supporting octothorpe end-of-line
      # comments is supported in JSON formatted
      # YAML if and only if the content is multi-line
      # formatted as well (like this example).
      {
        # This is valid YAML!
        "foo" : "bar"
      }
    /
    )
  
0 - https://yaml.org/spec/1.2.2/


The problem of yaml is that it allows too much. It allows unquoted strings, and those can be interpreted by the parser as numbers, timestamps, booleans, etc. This is a source of many fooguns.

Use of indentation to denote nesting can sometimes be an anti-feature, too, because while using that the format does not provide a way to make certain that the entire stream has been read (parens balanced). This may lead to problems or even exploits.

Pure JSON is so painful for human consumption though, I willingly choose yaml if it's the only alternative.

JSON5 may indeed be a sweet spot between human-friendliness and lack of nasty surprises.


As I mentioned in a reply to a peer comment, the problems you describe regarding YAML appear to be about the commonly used format most of us think of and the totality of the YAML feature set.

What is illustrated above is the definition of a specification-compliant YAML resource strictly using JSON constructs + octothorpe end-of-line comments.

Does this usage mitigate the concerns you have identified?


The problem is that self-restraint only takes you so far. Typos exist. Human mistakes exist. Machine errors exist. Malicious inputs exist.

A good parser does not just accept the inputs you find valid, but also rejects inputs you deem invalid. Running a linter that would report or amend all the footgun-wielding features of yaml before parsing is tanamount to running another parser. Then why bother :)


All good points. My philosophy to address same is to employ test suites as a form of sanity checking for JSON/YAML/XML/et al configuration resources.

Still, it would have been nice if YAML had reduced the surface area to verify in order to establish confidence in its content.


Using a YAML parser to parse JSON+comments is like bringing a tank to a knife fight... If you only parse "trusted" input, i.e. you can guarantee that no one is ever going to pass anything but JSON+comments, and you don't do it in any high-TPS scenarios it's probably fine to use a YAML parser


Whilst YAML is an option, if the choice is between having the unnecessary extra features of JSON5 or YAML, JSON5 seems like the clear winner.

Allowing multiple types of quotes may be an unnecessary feature but it is a clear lesser evil compared to the mountain of footguns that YAML brings with it.


How does defining a YAML resource strictly in terms of well-formed JSON + octothorpe comments introduce "the mountain of footguns that YAML brings with it"?


It doesn’t, quoting strings does solve almost all issues, but it does leave potential footguns for the future.

If you don’t enforce it, in the future the “subset of YAML” property might get weaker, especially if someone else is modifying the config.

If you treat config files the same as code, then using a safe subset of YAML is the same as using a safe subset of C. It is theoretically doable, but without extensive safeguards, someone will eventually slip up.


Also take a look at ASON [1]. ASON is a data format that evolved from JSON, introducing strong data typing and support for variant types.

[^1] https://github.com/hemashushu/ason


> The only thing that JSON is really missing are comments and trailing commas.

And multi-line strings. You don't always need that, but when you do, it's absence is very painful.


Agreed. The workaround (arrays of strings) isn't great as it means an extra transformation has to be done between the reader and the usage. I would go so far as to say this is more important than comments.


Allowing for leading decimals without a preceding zero also seems like shifting a whole class of errors right.


I'm not a fan of forcing single or double quotes because escape codes are such a pain to deal with and to me make things significantly harder to read than an inconsistent quoting style ever could.


I just add another property with noncolliding name as a comment.

"//key":"this is here so that foo bars", "key":"value",

valid JSON. Most software handles extra propertiesjust fine


> the only thing that JSON is really missing

Depending what you use JSON for, "Numbers may be IEEE 754 positive infinity, negative infinity, and NaN." could be a huge plus.


> The only thing that JSON is really missing are comments and trailing commas. I use JSONC for that. It's what VSC uses for the config format and it works.

I disagree. Human-friendy multiline strings aren't really optional for a serialization format that will inevitably also be used as a config format sometimes because those are the same problem.


The choice of single vs double quotes means you can use single quotes if the contents contain a double quote and vice-versa. With JSON containing shell scripts (looking at package.json scripts) that's a valuable addon imo.


Someone just needs to write “JSON5: The Good Parts” and an aggressive linter to enforce it.


Why not just a parser? Should be easy enough.


JSON only allowing double quotes is something I have grown to not care about, but as someone that was using JavaScript object literals before JSON became a thing, I confess I do not understand why it is an advantage? If you were at a place where it was a heavy discussion on what quote to use, I'm forced to think there were deeper cultural issues at play?

Don't get me wrong, the ubiquity of JSON speaks for itself and is the reason to use it. But, to say it has tangible benefits feels very dishonest.


As much as i like this (yaml goes way too far, but trailing commas and comments would make json much nicer. I actually think this spec goes too far with single quotes) i hate that it is named json5. I think its unethical to imply you are the next version of something if you don't have the blessing of the original author.


Even when the original author said it was "discovered"?

JSON5 is closer to "javascript object notation" than JSON itself. It's partly an update and partly a removal of arbitrary restrictions.


Except that JSON is a valid JavaScript object, and JSON5 is not.


Why is it not? My understanding is that it is a valid ES5 and forward object.


Unless there's a direct lie in the description of JSON5 ("subset of ECMAScript"), JSON5 objects are valid JS objects.


Can you provide an example?


I think the name just means that it sits in-between JSON and ES5 (i.e., it's a superset of JSON and subset of ES5).

edit: well as this comment thread indicates, the name is pretty confusing for everyone :)


The parser for tsconfig.json (typscript.parseConfigFileTextToJson(fileName, jsonText) or parseJsonText) seems to be what you want; I wonder if there is a name for that format.


I think JSONC is the term for json with comments and trailing commas


Could also be construed as paying homage, though. I think the number 5 here is a reference to congruence with ECMA Script 5 rather than to imply a version of JSON.


Single quotes are useful for string contents containing many double quotes (e.g. XML).


Lots of things are useful. I think json has been succesful largely because it preferred minimalism over usefulness.


It's not a binary choice. I don't think if Crockford chose a slightly larger subset of JavaScript for JSON (such as JSON5), it would hurt its adoption.


I always thought that the name JSON5 pretends to be nothing more than a pun on Michael Jackson's original band, The Jackson 5. It sounds too similar!

If it didn't originate from that, what else?


The project description describes it as json plus syntax from emcascript 5.1 (commonly called ES5) [the official name of javascript is emcascript]. I imagine that is where the name comes from.

Although it doesn't really make sense since most of the stuff they add predates ES5.


Yes, that is where the name comes from.


Ha. There is no way that’s what it means unless you read that somewhere.


I agree with the others that ES5 is likely where it came from, but my mind also jumped to The Jackson 5.


I assumed it was a play on HTML5 ¯\_(ツ)_/¯


I’m a fan of JSON5. A common criticism is “we’ve already got YAML for human readable config with comments,” but IMO YAML’s readability sucks, it’s too hard to tell what’s an object and what’s an array at a glance (at least, with the way it’s often written).

When dealing with large YAML files, I find myself frequently popping them into online “YAML to JSON” tools to actually figure out WTF is going on. JSON5 is much easier to read, at least for me.


Those two criticisms of YAML are at bottom of my list. Space as delimiter and lack of strict typing is what screws me over on daily basis as SRE.


This. I hate how all these serialization/config formats come out of dynamically typed languages. Static typing is a must. Then so many classes of errors go away.


Just static type then. You can’t trust incoming data shapes anyway, e.g. if it specifies a schema and doesn’t even follow it. You always expect something in a typed language, not anything. So validate it and that’s it. Thinking that dynamic data can be typed is a mistake. It can only be structured ([], {}, "", …) into basic types and then matched to some template. Any above-data section about types is as good as none. It can help a human to make sense of its shape, but that’s it.


You might like Dhall


Fair, YAML has a lot of usability warts, and those suck too. Although personally I really do hate how tough it is to tell apart arrays and objects, at least with the most common YAML array/object style.


Just in case you didn't know: With https://github.com/mikefarah/yq you can just immediately translate YAMLs like

  yq some.yaml -o json


A reimplementation of jq in golang supports reading yaml and, of course, emits json: https://github.com/itchyny/gojq#:~:text=supports%20reading%2...

That one is likely more relevant than yq since folks in the json ecosystem are far more likely to be familiar with jq's syntax and thus using gojq is "one stop shopping," not to mention that its error handling is light-years beyond jqlang's copy


Yes, but I LOVE yq's ability to update YAML files without stripping existing comments. For example, I use it to programmatically update similar (but not identical) GitHub Actions files across projects.


> When dealing with large YAML files, I find myself frequently popping them into online “YAML to JSON” tools to actually figure out WTF is going on.

YAML is a strict superset of JSON, so defining the former in the syntax of the latter is fully supported by the spec. Perhaps not by every YAML library, to be sure, but those which do not are not conformant. From the YAML spec[0]:

  The YAML 1.23 specification was published in 2009. Its 
  primary focus was making YAML a strict superset of JSON.
0 - https://yaml.org/spec/1.2.2/


I'm confused about your point about YAML being "strict superset of JSON" leading to being able to convert YAML to JSON.

If YAML is a strict superset, wouldn't that mean that YAML must have at least one feature that is not part of JSON? Wouldn't that make it impossible to define all YAML files as valid JSON?


They all turn into the same data types in the end. You can import a YAML and output a JSON.

For a feature like references, you'd have to do the annoying thing and duplicate that section of the file.

For a feature like unquoted strings or extra commas, you just quote the strings or remove the commas.

The various YAML features are in between and mostly close to the latter.


> If YAML is a strict superset, wouldn't that mean that YAML must have at least one feature that is not part of JSON?

Yes. One of the features YAML supports is the widely documented format we are all familiar with.

However, being a "strict superset of JSON" also means a conformant YAML implementation can load a "pure" JSON resource without issue. The converse is not generally possible as JSON cannot express what YAML can, such as octothorpe ('#') comments.

HTH

EDIT: see also https://news.ycombinator.com/context?id=42361994


For sure, but most YAML you actually encounter does not use much in the way of JSON syntax, it looks a lot more like this: https://devblogs.microsoft.com/devops/wp-content/uploads/sit...

Where arrays and objects just look too similar (IMO), white space is significant, most strings are unquoted, etc. And personally I find it quite difficult to really understand what’s going on there, at a glance, compared to JSON (or JSON5).


> For sure, but most YAML you actually encounter does not use JSON syntax

So what? YAML can be trivially mechanically translated between flow and block syntax.


I want to be able to easily read and understand configuration without having to pop it into a converter. The YAML I encounter in the wild is ~80% pure block style, ~20% mixed (within a single file, mostly block style with some flow style). And I just find the block style hard to read, I have to either spend significant mental effort trying to understand where the objects vs. arrays are, or I have to pop it into a converter (to either JSON or flow style) to understand. Whereas JSON/JSON5, it’s immediately clear without any mental overhead.


What's your take on prototxt files? In my opinion it is the most readable format since you don't need square brackets for repeated fields/arrays.

Additionally plugins let you link your prototxt file with the corresponding proto so you can spot errors right away.


Don’t have any experience with them.


If you already have a bunch of JSON documents, you can keep using them with JSON5.

That's a big advantage compared to converting to YAML.


same is true of YAML as a JSON superset


It's too bad EDN [1] hasn't seen much adoption outside of the biblical paradise that is the Clojure ecosystem.

[1]: https://en.m.wikipedia.org/wiki/Clojure#Extensible_Data_Nota...

In fact, there doesn't seem to be a spec or standard for it, outside of the de facto standard used by Clojure and the programs in its orbit. I guess nobody's bothered to write a standard, because the people who are already using EDN are doing fine without one, and the people who aren't either don't know what it is or don't see its value.


https://github.com/edn-format/edn

I too love edn, but unfortunately most other languages lib abandoned (eg. https://github.com/edn-format/edn-dot-net ). Looking around python seems relatively maintained which is great https://github.com/swaroopch/edn_format/issues


Is there an example of what it looks like in practice? The Wikipedia link above doesn't have it, its citation http://edn-format.org/ seems like it doesn't exist anymore, and this github page doesn't show a sample either.


Hundreds thousand of examples at github, see this comment for an example search link: https://news.ycombinator.com/item?id=42364597


It's plain old clojure, more examples here https://learnxinyminutes.com/edn/

  { :name "John Doe"
    :age 30
    :languages ["English" "Spanish" "French"]
    :address {:street "123 Main St" :city "Anytown"} }


Dont be pessimistic - you are still free to used it.

I used EDN outside of Clojure. The system needed a relatively large amount of config files, and I chose EDN as a better JSON. Looks familiar to everyone, but supports comments - the primary motivation for that choice.

JSON-5 allows a single trailing comma. EDN simply ignores commas. You can have them, trailing or not. But they are really redundant and incur visual noise.

Perhaps EDN can also be improved, but that's a good format. Convenient.


> Perhaps EDN can also be improved

How might you improve EDN?


Maybe with: 1) Unicode escapes in strings. 2) Indentation support for multiline string literals, like in Rust, but even better. 3) Reading with "concrete syntax tree" (make the order of map elements, the comments, whitespaces, etc representable, so that one could write an EDN file with the same formatting as it was read, e.g. after patching it). Not sure if the spec changes are needed / will be helpful for that or better to just implement it in specific parsers.


Excellent ideas, especially having a parser keep order of parsed data.

I wonder if EDN reader/parser for different languages could be written once, then compiled through wasm to c (https://news.ycombinator.com/item?id=38602750) and linked in each language as c library.

Definitely would like to see EDN or slightly improved version as a modern and usable alternative to json/yaml (regardless of https://xkcd.com/927/).


The whole reason JSON rules the world is because it's brutally simple.

We already have 5+ replacements that are far more robust(XML, YML) and IMO they are not great replacements for JSON.

Why? Because you can't trust most people with anything more complicated than JSON.

I shutter at some of the SOAP / XML I have seen and whenever you enable something more complicated inevitably someone comes up with a "clever" idea that ruins your day.


> The whole reason JSON rules the world is because it's brutally simple.

I don't think that's the primary reason. JSON is pervasive because it started out being trivially parseable by JavaScript going back to when people just evaluated it, even before browsers had ridiculously high-perfomance safe JSON parsers. All the other formats are still harder to work with from JavaScript.

If not for that, personally I'd advocate TOML, which is incredibly simple.


I find toml impossible to both read and write in all but the simplest cases.


> trivially parseable by JavaScript going back to when people just evaluated it

Comments and all, ironically.

I mean sure, "and all" would frequently include script injections etc, but you can't argue it wasn't more feature rich!


We use JSON5 for two reasons:

1. Comments 2. Trailing commas

We don't use any other JSON5 features, which are primarily just that numbers may be encoded in hexadecimal and field names may have quotes elided.

We typically encode values with RFC 4648 base 64 URI canonical with padding truncated (b64ut) with values too large to be a JSON number, so hex isn't useful anyway. We haven't found that omitting field name quotes is a big deal.


Why not use JSONC then?


There's no spec for JSONC, it's basically "whatever VSCode does".

JSON5, in contrast, has an actual spec that has been stable for 6 years now.


    is yaml robust: no


Ever since YAML 1.2, released in 2009(!), your YAML example would parse your input as “is yaml robust” for the key, and “no” for the value.


I used to think that fixed things, then I learned how many parsers refuse to update. Like PyYAML.


No, it wouldn't ;)

Extra hint:

    scandinavian countries:
      - dk
      - no
      - se

Edit: it's extremely unlikely a yaml parser implements that spec; the spec is irrelevant.


Okay, I originally didn't respond, but your comment got upvoted out of gray so I'll put in the effort.

Your claim of "No, it wouldn't" is simply wrong. They said "since 1.2" and you can't just ignore that when you're citing a 1.1 problem. The disclaimer that you edited in gets at a relevant point but it's not strong enough to make your claim actually be true.

And the winky face and the "hint" just make things worse, since they knew exactly what you meant.


The only reason JSON got any traction is because it was a subset of client-side JavaScript and thus natively supported in the browser.


To be precise, JSON was a replacement for XML, not the other way around. And the problem with XML was that it's way to verbose and difficult to write by hand, so it's exactly the opposite of the direction YAML/JSON5/... are taking.


The problem with XML was people were using it for every possible thing they could think of and 90% of those ideas were garbage.


Because it was the only widely used generic text format for structured data.


Hijacking for a random concern:

I love JSON, but one of the technical problems we've ran into with JSON is that the spec forgot about all special characters.

I actually noticed it when reading Douglas Crockford's 2018 book, "How JavaScript Works". The mistake is on page 22.9 where it states that there are 32 control characters. There are not 32 control characters. There are 33 7-bit ASCII control characters and 65 Unicode control characters. When thinking in terms of ASCII, everyone always remembers the first 32 and forgets the 33rd, `del`. I then went back and noticed that it was also wrong in the RFC and subsequent revisions. (JSON is defined to be UTF-8 and is thus Unicode.)

Below is a RFC errata report just to point out the error for others.

Errata ID: 7673 Date Reported: 2023-10-11

Section 7 says:

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

It should say:

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F, U+007F, and U+0080 through U+009F).

Notes:

There are 33 7-bit control characters, but the JSON RFC only listed 32 by omitting the inclusion of the last control character in the 7-bit ASCII range, 'del.' However, JSON is not limited to 7-bit ASCII; it is Unicode. Unicode encompasses 65 control characters from U+0080 to U+009F, totaling an additional 32 characters. The section that currently reads "U+0000 through U+001F" should include these additional control characters reading as "U+0000 through U+001F, U+007F, and U+0080 through U+009F"

---

I've chosen `del` to be my favorite control character since so many engineers forget it. Someone needs to remember that poor little guy.


Not to mention that it was set to be 127 so that it would be 8 holes punched out on paper tape, so you could use it to correct a paper tape by backspacing the tape by one position and hitting del.


The errata seems like a mistake.

Makes more sense to drop the term "control character" and leave the specification of which characters are not allowed as-is.

The cat's already out of the bag on this one. Changing the characters now will create a lot of invalid JSON in the world, with more being generated all the time.


That's a reasonable approach, however I would argue it's incomplete without acknowledging the historical context. How could the specification explicitly acknowledge this as technical debt from the original design rather than letting readers assume it was an intentional architectural choice? Such context informs implementers about the constraints they're working with.


May I suggest using TOML, which in my experience has been the perfect blend of human readability while having good tooling.

https://toml.io/en/


Like YAML, it's only better in the simple case where everything is top-level and there's only one level of nesting.

Once you want to have arrays of nested objects or objects with arrays, I immediately wish I was just reading JSON so I knew where I was in the tree.

And for that reason, I don't think it's a full contender. I want an answer for the hard cases like nested data, not just another way to write the simple cases which is what TOML is.

For example,

    [[a.b]]
    x = 1

    [a]
    y = 2
Versus:

    {
      "a": {
        "b": [ { "x": 1 } ],
        "y": 2
      }
    }
It's easy to complain that the latter is noisier. But that's nothing compared to being clear.


It's not made explicit in the documentation, but TOML is very nearly a superset of JSON - just using `=` to separate key-value pairs instead of `:`, and requiring top-level names to be given explicitly, and requiring the "inline" bits to be on a single line. In TOML, your example can equivalently be:

    a = { b = [ { x = 1 } ], y = 2 }
(And yes, that can be the entire document; you can have an inline table before the first table header.)

Of course, this doesn't help if you want the top level to be an "array" rather than an "object" (in JSON parlance), or if you want the entire document to represent a single primitive value. But these uses are discouraged at best anyway.

But really the goal of TOML is to highlight the location of important parts of the deserialized tree structure (hence the ability to use arbitrary long dotted paths in table headers) rather than the structure itself. It's IMO a beautiful implementation of the idea "flat is better than nested" from the Zen of Python, and it neatly sidesteps an issue I asked about many years ago on Stack Overflow (https://stackoverflow.com/questions/4372229 - the question was rightfully closed, as this sort of discussion doesn't fit the Q&A format; but it made sense to ask at the time).

I don't know if a direct comparison of TOML to YAML is fair. Among other differences, the standard way to parse YAML in Python involves a third-party library that brings in a ~2.5 MB compiled C library. Every TOML implementation I encountered - including the one that made it into the standard library - is pure Python.


> But these uses are discouraged at best anyway.

To my knowledge, such uses were discouraged only because of a security issue from evaluating a JSON string as a JavaScript code and not via something like JSON.parse.


Given that TOML is intended primarily as a format for configs, a config where the root value is an array or a primitive value is hardly a relevant scenario.


Your TOML is rather convoluted, a more normal way to write it would be:

  [a]
  b = [{x = 1}]
  y = 2
Or alternatively:

  a.b = [{x = 1}]
  a.y = 2
Some parsers allow newlines in TOML inline tables, so you can do:

  a = {
    b = [{x = 1}],
    y = 2,
  }
That's supposed to be in the next TOML standard, but that seems indefinitely stalled as the (only) maintainer has seemingly lost interest and there hasn't been any movement for a long time.


> Your TOML is rather convoluted, a more normal way to write it would be

Anecdata, granted, but I've seen a lot of TOML written exactly like the "convoluted" example, and none like your more spread out version.


This is also the reason I prefer XML to JSON when things really got complex. XML is verbose but it is very readable on long form. I wish Rust actually used JSON or XML as Cargo file format.


I think it works quite nicely in Cargo as you don't generally need much nesting, but anything with depth should use JSON. It's the perfect format for clearly displaying hierarchy.


It works only for trivial projects that are compiled in a roughly up-to-date and popular environment. Custom distros, vendored forks, closed-source software, firmware projects etc. all end up with hairy Cargo files. A systems language should recognize handling complexity gradually is a part of one of its use cases.


Their dictionaries and arrays split into ini-like sections are not very readable though. The double [[ is just nasty and not possible to apply in all situations (array in map in array).


> not possible to apply in all situations (array in map in array).

Yes, that's largely why inline tables and arrays exist:

   >>> tomllib.loads("""
   ... [[outer]]
   ... first = [1, 2, 3]
   ... second = [4, 5, 6]
   ... 
   ... [[outer]]
   ... third = [7, 8, 9]
   ... fourth = [10, 11, 12]
   ... """)
   {'outer': [{'first': [1, 2, 3], 'second': [4, 5, 6]}, {'third': [7, 8, 9], 'fourth': [10, 11, 12]}]}


TOML explicitly disallows newlines in inline tables, so that's not a full solution (if you agree that there are some situations where multi-line inline tables are indeed required).


This should have been named NOTjson-somethingv5. Or similar. Now it is far from obvious for the uninitiated that this might not be the 'latest' version of JSON. And then they end up using this incompatible format by accident, when in all likelihood standard JSON would serve equally well or better in 95% of the use cases.


I feel like the comments are the only important part. I’d rather not have single quoted strings or unquoted identifiers to be honest. Trailing commas are nice to have though.

All I miss in JSON are comments and a native datetime type. Everything else, I’m fine with.


Strongly disagree about datetimes; they exist at a different semantic level and are entirely too easy to get wrong.


Not more than a string representing a date that gets interpreted after json.parse.


Luckily, we have decades of experience in e.g. SQL, and existing and well-polished ISO specs on dates that can just be used as is without reinventing the wheel.


As if anyone actually implements ISO 8901 correctly.

JavaScript certainly doesn't.


JS at least implements a strict subset of it for UTC, which is still a useful starting point.


Yeah, but OmegaStar still doesn't support ISO timestamps like they said they would a month ago!

(Reference: https://youtu.be/y8OnoxKotPQ)


I mean, floating point numbers are easy to get wrong. And why care about semantics? This is a data transfer format, it should be able to represent data that is being transferred gazillion times a day.

Make it raw ISO 8601 { "created_at": 2024-12-10T11:20:07Z } and do not accept any other format. I know there are way smarter people than me who can figure this out at this point.

DateTimes are the number 1 reason I can not use JSON data as is (if I simply parse it into a dynamic object like JSON.parse).


For comments, adding "_comment": "..." fields can work pretty well.


Not where the reader verifies the schema. And not when you want to write a long multiline comment. (Unless you want _comment1, _comment2, ...)


An array of strings works well here! Also, you can always preprocess to remove comments before validating!


If you're preprocessing, you might as well just `grep -v ^#` and support 'real' comments.


There's power to it being valid JSON, though; it can be stored in databases optimized for JSON payloads, and edited with JSON editors. There's a real agility to liberally sprinkling JSONFields and JSON code editors across an auto-generated admin interface like the Django admin system, knowing that you can leave breadcrumbs for colleagues, and that anything starting with a `__` is fair game for comments any time you see such a key name.


IMO it would be great if fields could have annotations for types that fall back as strings. Datetimes could be annotated for easier parsing.


Would it be correct to say that this is basically any valid JS code that describes an object, excluding the use of references and function definitions?

If not, what is the difference, and why was it made to be different?


I've always thought it would be nice to have a language whose spec describes a few different "power levels":

1. Literal values only (objects, arrays, strings, numerics, booleans, maybe datetimes), a la JSON, with standardized semantics

2. Literals, variable definitions and references, pure function definitions, and pure function calls, but prohibiting recursion, such that evaluation is not Turing-complete and any value is guaranteed to be able to be evaluated into its literal form in finite time

3. All of the above plus recursion and impure, side-effectful functions, custom type definitions, etc.

This way, implementing a literal parser in other languages would be comparatively straightforward (much like JSON), since it wouldn't have to support variables or functions, but it would also be possible to define values using variables and pure functions (much like HCL or Nix) and then use the language's own evaluator binary (or perhaps a wrapped FFI library) to safely convert these into literal values that another language can parse, while guaranteeing that evaluation will not have side-effects. It would also leave open the escape hatch of using a "full" Turing-complete language with side-effects and recursion, while ensuring that using that escape hatch is a deliberate choice and not the default.

I'm sure there are a few additional or hybrid levels that could be useful too (2 but with recursion? 1 but with variables?) but this seems like it would be a solid starting point.


You might like the Noether programming language design https://github.com/noether-lang/noether/blob/4115cdb3f472360... , although I don't know if it'll ever be actually implemented.


Agree, and this could be related to the libs you import (adding abilities), or the traits you specify (restricting).


Seems like the escapes are slightly different. No astral unicode escapes (e.g. \u{123456} ). No octal escapes \123

No template literals (backticks)

No regex literals

No octal numbers (0123, 0o123)

No boolean numbers (0b1001)

No big ints (the "n" suffix)


Most of these things seem like missing features that would fit right in. Except for maybe template strings which could contain references to template functions


A common thing in JSON/YAML alternatives is to support more types through syntax. I don't think this is a good idea. YAML already did this badly with the Norway problem, but JSON also has issues with eg "is it float or int", what about nulls, what about precision... and so on.

There are many, many more types to support and all this does is complicate syntax; the types can be relegated to a schema. For example, where are dates, with or without timezones, what about durations, what about SI units for mass, current, what about currency, what about the positive integers only, numbers as hex, as octal, as base64...

One format that _nearly_ gets it is NestedText https://nestedtext.org/en/latest/basic_syntax.html ... which means everything gets ingested as strings, dicts or lists, which vastly simplifies things; my quibbles with it would be it still went for multiple syntaxes (for dictionaries, multiline strings, inline vs multiline dicts&lists. And yet, it still didn't make comments part of the data model (which is so useful when processing or refactoring files). While it's not perfect, it does separate the validation of scalars, not stuffing someone's priority list of validations into incomprehensible syntax.

YAML's been a decades long mistake and making JSON more like YAML is not the way to fix that.


When I manage a project and have the freedom to choose my configuration structure, then I always use typescript. I never understood the desire to have configuration be in ini/json/jsonnet/yaml. A strongly typed configuration with code completion seems so much more robust. Except of course your usecase is to load or change the config via an API.

I like what apple is doing with https://pkl-lang.org/ though.


You can apply typescript-based strong typing and code completion to JSON and similar. And then you can avoid making arbitrary code execution part of your config format.


I think main problem people trying to solve is treating JSON as computer-human interface. It was not designed for it and I don’t think we need to expand its use-case. You can perfectly use subset of YAML with much better readability for human interactions. I wrote custom parsers for subset I need with like 100 lines of Python code. JSON should stay as a loggable system-to-system format that you can render in a more readable way.


Actually, right this moment I am writing docs for my mini-yaml to generate JSON Schema Draft 4 for our EDA. Easy-peasy


JSON is pretty close and much more widely used/available. It's fine.


I find json5 much better than json, but it has still many of the same annoyances.

- instead of trailing commas, how about making them completely optional? It's not like they are needed in the first place.

- curly braces for top-level objects could be optional too.

- For a data exchange format, there should really be a standard size for numbers, like i32 for integers and f64 for floats.

- parsing an object with duplicate keys is still undefined behavior.


I don’t know how to feel about this. Personally I want to write configs like code, and I want to avoid using yet another specific DSL for it. So currently working on a tool that allows you to write configs in Typescript - https://github.com/typeconf/typeconf


Good API design dictates that you should be flexible as to what you accept and strict about what you serve. Being flexible doesn't really break anything.

Elasticsearch and Opensearch both actually have partial support for JSON5 (comments), which is a nice feature if you want to document e.g. a complex query or mapping choice. It won't return any comments in the response. So it won't break other parsers. Implementing JSON 5 support like this is a valid thing to do for any server. More broad support for this in parsers would be nice.

I'd probably enable this on my own servers if this was possible. I'd need that to be supported in kotlinx.serialization. See discussion on this here: https://github.com/Kotlin/kotlinx.serialization/issues/797


> Good API design dictates that you should be flexible as to what you accept and strict about what you serve. Being flexible doesn't really break anything.

Do you have a source on that? I am not sure I agree. My gripe is with HOCON that accepts so many formats that after a while you have no idea what it is you are actually writing. You can have a conf file with 5 different formats of the same type of setting. Probably added to by 5 different developers.

I'd rather have it throw an error in my face when I don't adhere.


I've always been a big fan of KDL in principle, haven't used it in anger. After that HCL, then YAML, with JSON and others being my least favourite to use.

Of course the hard part is gaining enough critical mass to make a significant switch. JSON had AJAX. YAML had Rails. What could make JSON5 or KDL break out?


If browser/node etc.. starts to support json5, i am sure it won't take that much time to get adopted.


If you're looking for a human-friendly json superset (comments, non-quoted keys) that can also abstract away repetitive configuration with variables and list comprehensions, check out https://rcl-lang.org/.


I'm looking at the JSON5 spec and it appears it does not introduce a capital \U escape sequence for Unicode characters outside the Basic Multilingual Plane (BMP). It's not brought up often, but in JSON you do need UTF-16 surrogates to write an escape sequence for Unicode characters outside the BMP. Consider the Hamburger Emoji (U+1F354). Instead of escaping it as "\U0001F354", you need to escape it with UTF-16 surrogates "\uD83C\uDF54". This is both cumbersome for humans and not in accordance with the Unicode Standard [1]. It's ironic, but many (most?) of the "JSON for Humans" flavors of JSON tend to overlook this.

[1] See Chapter 3.8 "Surrogates" of the Unicode Standard.


When you export Instagram data as JSON, the resulting JSON files include encoded strings like "\uD83C\uDF54".

Parsing and converting these strings can be cumbersome because a single Unicode character is often represented by a single escape sequence, but sometimes it requires two.


How often are humans going to be using unicode escape sequences?


I believe in most cases, they are generated by programs. Refer to my other comment for real-world examples.


There’s no \U in JavaScript: it is spelled \u{10ffff}


I find that these efforts to make something that is almost but not quite JSON to be counterproductive.

It means that something you can't tell if it's JSON or another format. You'll have some tools that can work with it, while other tools will choke because they expect valid JSON. Oh, someone just switched the quoting style so now your jq based automation is all broken.

And now you have to figure out which of these not-quite-JSON formats this is. Is it HuJSON/JWCC? Is it JSON5? Does my editor have a mode that supports this particular variant, or am I always going to be fighting with it?

And finally, having used HuJSON for Tailscale config: the issue isn't just things like comments and trailing commas, or quoting styles. JSON is just a kind of heavyweight and cumbersome syntax for writing config files. I find that I prefer writing a script to auto-generate my Tailscale config, because editing it by hand is cumbersome.

There are a number of other possible config file formats, with varying levels of JSON data model compatibility. YAML has its issues, but we've all learned to live with them by now. TOML isn't bad, though good luck remembering the array of tables syntax. KDL is pretty nice; it has a slightly different data model than JSON, but it's actually one that is somewhat better suited for config files.

I'd rather use any of these for config files than something that is almost, but not quite, JSON.


I always thought that “JSON5” is a deceptive name. It is not the fifth version of JSON; it is an alternative/extension of JSON, of which there are many alternatives, and this one is no more official than any other.


JSON5 is from ECMAScript 5.1, which is called ES5.

As an (unfortunate) JavaScript developer, it was clear to me the intent was to "update" JSON with ES5 features, and not say ES4 or ES6.

Why ES5? ES5 is when trailing commas were introduced. Commas are one of defining features of JSON5. Other languages, like Go, also made this a priority.


The name may have a logical reason for being what it is. But it is still misleading. I have seen people implicicly assume that JSON5 is what they should be using instead of JSON, just because of the name.


I like JSON5 and have used it some. When GPT was younger and I was parsing its JSON output directly, JSON5 was forgiving in useful ways.

The one thing I really wish it had was some form of multi-line string, mostly so I could use it with line diffs. Also sometimes for really simple property editors it's nice to put the serialization in a textarea, and that works fine for everything but multiline strings.

(I suppose you can escape newline characters to put a string on multiple lines, but I find that rather inhumane and fragile)


Hjson!


Oh! Format preserving editing too, very nice... https://github.com/hjson/hjson-js?tab=readme-ov-file#modify-...


This is very close to what the ruby REPL will accept.

I tried to paste in the kitchen sink - it didn't like dangling decimals and the comment format, everything else worked as expected.


Hjson looks friendlier for direct manipulation, no string quotes

What would be the advantages/disadvantages?

https://hjson.github.io


I use it for personal scripts and it's been wonderful. I get to write more beautiful and concise configuration than any other format I've used.

If I were doing a professional project, I'd be hesitant to use it over something with more popularity and support. The syntax has so many variations two files can look like two totally different config languages, which is both cool and alarming.


If it’s designed for hand authoring it should support an ISO8601 date format; mere mortals cannot author numeric timestamps without tools.


Just store the ISO9601 date/time in a string. No need for special support on the format level.


Why have numbers, true, false or null then? Why not only support strings?

    {"foo": "2", "bar": "null", "baz": "false"}
I’m not really being facetious: that’s what canonical s-expressions did:

    ((foo 2) (bar null) (baz false))


It's of course a trade-off in how far you want to go with special types.

Booleans and numbers have extremely common use cases, moreso than date/times.

But perhaps more importantly, they are quite easy to define. Date/time is a susprisingly complex topic with many variants (date, time, datetime, local/relative date/time, point in time, offset-based, timezone-based...) with all of them being quite important. The spec to define date/time types would likely be longer than for the whole rest of JSON, and you still wouldn't be able to correctly interpret the date/time from the spec alone, since timezone data/designations are dynamic.

Now the question is - what value does this extra complexity bring? I'm not saying there isn't any, but it doesn't seem to justify the cost.


Numbers aren't "surprisingly easy to define". Indeed, JSON is a very good example of how to not define numbers for interoperability. The original spec literally doesn't place any limits on valid ranges, precision etc, with the result that the later RFC notes that "in practice" you probably want to assume 64-bit floating point because that's what most parsers use (but still doesn't actually guarantee at least that much precision!).


I wrote "quite easy to define" and they are, esp. in comparison to date/time. JSON messing it up notwithstanding.


Type declarations can imply syntax, semantics or both.

Yes, you could represent everything as a string; in that case, the serialization format is no longer providing any assistance in verifying or enforcing syntax.

But it’s often useful to be able to verify syntax independently. And it helps avoid authoring errors (like using “1” instead of “true” etc.) that are ambiguous if your only hint is semantic.


In that case one should also have a type for dates.

What I’m getting that is that a format ought to commit.


The only thing I worry about is how do you parse this, then modify some fields and write back the file with all the comments still in place?


Or, just don't use JSON for config files. There are plenty of human-friendly config file options, so there is no reason to frankenstein JSON in this way.


Why not just not use JSON for config? In a sane world YAML wouldn't even exist and everyone would use something like TOML.


What is the benefit of this over something like Pkl[0]? Pkl compiles down to JSON, YAML, etc., but the language itself is user-friendly. This way you get best of both worlds: readable and editable for humans, and parsable for computers.

[0]: https://pkl-lang.org


There are a few more tolerant versions of JSON. In OjG I called the format SEN https://github.com/ohler55/ojg/blob/develop/sen.md


Shameless plug for my JSON/5 parser written in zig: https://github.com/berdon/zig-json

There is a std json library as well but the aesthetics weren’t great imo.

The specs are quite pleasant to implement.


I find HOCON[0] to be great for this need in JVM-based languages.

0 - https://github.com/lightbend/config/blob/main/HOCON.md


The "official" JSON should be enhanced to cover a few of the pain points.


This may be heretical but surely the problem isn't lack of comments et al in JSON, rather that people try to use JSON for everything, when it was designed to be a text representation of javascript objects?


If it was really designed for representing JS objects then it was really bad job.

Neither JSON supports JS objects (lack of NaN) nor JS supports JSON (lack of arbitrary precision decimals).


Fair enough - perhaps I should have said it was inspired by Javascript literal syntax and simplified to make it a platform independent data exchange format and not a application configuration format. Though I can see how the latter is tempting if your application is in JS.


> JSON for Humans

The emoji in the first paragraph seems to convey the understanding that humans like expressiveness, but the format itself doesn't allow Unicode values in keys, which seriously limits said expressiveness...


It feels like AI has made this redundant.

I honestly cannot imagine hand typing out some JSON now, or most code for that matter.

I just write in natural language what I want and the AI will perfectly output valid JSON.


That JSON you get might be syntactically valid, but how do you know that it is accurate wrt your original input? That, for example, no values have a one-character-off misspelling?


Just being able to add comments to a .JSON5 file is a godsend though, no matter who/what created it.

Oh... and TRAILING COMMAS!


I wish json had a simple version/convention like elixir sigils so I could pass datetimes around as first class entities instead of always having to [de]stringify them.


Yaml is for people, json is for machines


I think the killer feature of JSON is that there’s one version and that won’t ever change. You don’t have to think about compatibility.

All JSON is valid YAML. So you clearly can make yet another one of these and make it support JSON. But JSON doesn’t support the stuff you’re adding, so calling it JSON5 just makes things confusing as if it’s a version and not a whole new thing altogether.

The ugliest thing the authors could accomplish is making this sufficiently popular that there’s a ton of .json files out there that aren’t actually valid JSON. I hope they’re being careful about strongly discouraging ever writing these outputs to files with a .json filetype.


Comments are nice. I wonder if they can also be inserted programmatically.


It kinda becomes a question of "does this comment annotate the line it's on, the next one, or the arbitrary number of succeeding lines" since the order of the objects is not guaranteed by the standard and when writing comments by hand it's common to say "the next section shall do X".


When I work on some ad hoc configuration format I usually end up with quite a family of different comment types. Disabled values and prose about values are are the core set, but there might also be different prose types to separate the intention for a certain value (authored by the one setting the value) from documentation about the purpose of the field (authored by the one introducing the option). Also a type for key value pairs that have not been consumed (perhaps because of a typo in the key), and another for pairs that are applied as default, but should not be explicitly in the config if you want to go with the new default of they change in a software update.

Yes, this is for situations where the config is two way, e.g. when a GUI can be used to set some values. But I find some of those features so useful that I might sometimes be tempted to write out a processed version of the file parsed even when there isn't anything like a configuration UI.


I am using nickel[1] myself for writing what basically amounts to a pipeline that ultimately generates a json or toml. It has contracts that can validate a field or an object as well as set a default value if the field is not present.

[1]: https://nickel-lang.org/


I wish languages adopted structured comments (as in, semantically applying to syntax tree nodes rather than lines) more broadly. It used to be a thing in some early PLs but has mostly died out.


Just use jsonnet if you want this IMO. No need to change json into yaml.


Switching from JSON to Jsonnet to get comments and trailing commas is like switching from a butter knife to a chainsaw because your steak is too tough. Jsonnet is literally Turing-complete!


Isn't JSON for humans essentially YAML? (only kinda joking)


That was the idea, before it all went terribly wrong.


To the reader: if you haven't before, take a stroll through the YAML spec to see what people are talking about when they bemoan its complexity: https://yaml.org/spec/1.2.2/

Then take a look at the JSON spec: https://datatracker.ietf.org/doc/html/rfc7159



I'm a huge fan. We use it for all our configs.


Unfortunately this is basically that XKCD cartoon about proliferating standards. I think I’d avoid this additional standard and just use JSON or a JavaScript object if I really need this level of flexibility.


Eh, if you drink, then drink...

1. Add `;` as a separator of elements, so you may have:

   { a: "foo"; b:"bar; }
2. Add array tags and space separated value lists so you may have

   { a: 12 13 14; }

   to be treated as [12, 13, 14] with the tag " ". Normal arrays are parsed with the tag ","
3. Add "functors" as, again, tagged arrays

   rgb(128,128,14);

   will be parsed to an array with the tag  "rgb". Also you may have calc(128 + 14);
4. Add tagged numbers so

   90deg 

   will be parsed as a number with the tag "deg"
And you will get pretty much CSS that is proven to define quite complex constructs with minimal syntax.


` leadingDecimalPoint: .8675309, andTrailing: 8675309.,`

Sorry but what is the benefit of this? Lazy shorthand? This is too much. Is this a string in other languages? PHP the `.` is a string concat.


No I don’t need this thing.


Still no timestamps :-(


What's wrong with Unix? Or is the complaint that there's no data type for time stamps specifically?

I agree it would be nice to have something with more data types. Binary b64/hex would be nice.


> What's wrong with Unix?

When you read 847548, is that a number or is that Saturday, 10 January 1970?

Having a type removes that ambiguity. It would be more JSONish for it to be human readable, maybe @1970-01-10T19:25:48.


Or maybe extend this with types for the timestamp, `@ms:1623132000` or `@unix:1623132000`, so in a value: `{"now":@unix:1623132000}`

Mongo types the field name, `{"$date":1623132000}`, if I'm not mistaken. Rust style would be `1623132000_unix`.

Or with anticipation of a more full typing system, where the time is explicitly named:

`{"now": unix:1623132000}`

For now, when I need typing, I use https://json-schema.org


For a basic timestamp use a number.

If you need more, you enter territory that is much too complex to build into the "simplest data format" spec.


Yeah. But I think the goal was here to sugar the syntax while keeping semantics intact.


Looks very nice, but I feel it's one of those "yeah, it's better, but only useful for personal projects, or until it gets critical mass which won't be until after I'm dead, if at all" projects.


can this be used to convert llm output to json?


JSON6 will feature triple-quote string quoting. Either made out of “ or ‘. Wake me up when it happens. PS: quoting line returns for multiline sounds weird. But who am I to comment…


KISS


It's infuriating that we're still struggling with this after so many years.

Every time I learn a new format I want to scream "why can't you be normal?"


Have you looked at CUE? I haven't looked back since discovering it. CUE is a proper language for configuration that two-way integrates with many other config and schema formats. All JSON is valid CUE too

https://cuelang.org | https://cuetorials.com


Take a look at https://typeconf.dev We wanted to define configs with types and avoid custom DSLs as much as possible. So we ended up with using Typespec.io for schema and plain Typescript for authoring configs. This should be as normal as possible!


Now, it would be great if we have parsers, editor plugins and json schema supprot as well.

until then, jsonc works for me


Oh he'll no not another standard no one needs. JSON is good enough


only needs trailing commas


Why not just use YAML at that point?


Hey look another xkcd://927


Isn't this simply YAML but with curly braces?

Another thing is that It feels wrong to have comments in JSON, like allowing comments in CSV files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: