Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fat JSON (tbray.org)
208 points by _freu on May 6, 2014 | hide | past | favorite | 121 comments


An update on Keybase, since it was chosen as the example. The API now supports field declarations. For example, these all work:

   https://keybase.io/_/api/1.0/user/lookup.json?username=chris&fields=basics,pictures
   https://keybase.io/_/api/1.0/user/lookup.json?username=chris&fields=basics,pictures,profile
   https://keybase.io/_/api/1.0/user/lookup.json?username=chris&fields=basics,public_keys
This change was intended from the beginning, but we are still in alpha. It was a quick addition, so the HN post made it timely. The nice thing is that this isn't just a client-sensitive change. Loading a user's info is module, so those above examples should be faster and less work for our server than getting everything about someone.


I should add that we're not fundamentally opposed to some kind of query language in the API requests, but most of our API objects lend themselves pretty well to just passing a list of fields you want. The above technique is very simple and seems to work well.

Aside, it also has cross-origin support (as does search), just added for anyone who wants to interface with Keybase on another web page, in the front end.


I'm a little surprised not to hear any mention of JSON Pointer (RFC 6901). It deals with exactly this:

http://tools.ietf.org/html/rfc6901

With JSON Pointer syntax, it would be something like:

    JWalk.getStringPtr(user, "/them/public_keys/primary/bundle");
It's concise, complete/unambiguous, and has implementations in a growing number of environments, so I think it could be worth mentioning as an approach. It also defines a useful URL fragment syntax for referencing nodes within documents, which would be a good thing for the JSON world.


Oh dear, I’ve probably hurt some feelings for having missed 6901. Interesting that when I googled around for JSON analogues of XPath, I didn’t turn that up. Having said that, it’s not obvious that any syntax is a win; an list of strings is an excellent selector that hits an 80/20 point & meets all my needs.


His omission of RFC 6901 is especially remarkable considering he is listed by name in that RFC's acknowledgements section as having contributed to the specification!


That works very well combined with RFC 6902 - JSON-PATCH for partial changes to JSON objects, which in turn works nicely with HTTP PATCH method, RFC 5789.


Shameless plug: I wrote a small JSON database the supports JSON Pointer and JSON Patch: https://bitbucket.org/robertodealmeida/jsonstore


I wrote a comment pointing to 6901, but it's probably still in moderation.


I make heavy use of a similar thing in PHP. I have a function "lookup" which lets me say:

    lookup($foo, array('bar', 'baz' 15, 'quux'))
This is equivalent to any of the following:

    $foo->bar->baz[15]->quux
    $foo->bar->baz[15]['quux']
    $foo->bar['baz'][15]->quux
    $foo->bar['baz'][15]['quux']
    $foo['bar']->baz[15->quux
    ... and so on
It's useful in Drupal when the required data is often at the end of a long chain of objects-containing-arrays-containing-objects-....

I've been a bit naughty with its types for the sake of convenience: if a non-array is given as the second argument, it's wrapped as a singleton array, ie.

    lookup(array('foo' => 'bar'), 'foo') === 'bar'
Also if any of the components aren't found, it returns NULL, ie.

    lookup(array('foo' => NULL), 'foo') === lookup(array(), 'foo')
It would be theoretically better to return an empty array on error and a singleton array on success, since they can be distinguished, but in practice that's so far down the list of problems with PHP that it's not worth the effort of adding such wrappers at the moment.

A really nice thing about this function is that it curries nicely:

    $find_in_foo = partially_apply('lookup', $foo)

    $get_blah = partially_apply(flip('lookup'), 'blah')

    $find_in_foo('x') === lookup($foo, 'x')

    $get_blah($foo) === lookup($foo, 'blah')
Actually, my argument-flipping function is already curried, so I can just say:

    $get_blah = flip('lookup', $foo)
This currying is great for filtering, mapping, etc.


Hi, Drupal dev here. Got to the point yesterday afternoon where I'm going to have to implement exactly what you've already done in order to put out some reasonable JSON from a node_load_multiple call. Don't suppose you have this function in a gist anywhere, do you? Please?


Here's the definition and a few tests.

https://gist.github.com/Warbo/9d8425fcdd7c026c795a

The SimpleTest test class probably doesn't work as-is, since it is originally derived from our own class hierarchy, but shouldn't be too hard to fix.


Convenience-wise, you could also make a "varargs" version with `func_get_args()`, e.g:

    f($structure, 'bar','baz', 15, 'quux');


Which version of Drupal? D7 has https://api.drupal.org/api/drupal/includes%21common.inc/func... which does exactly that.


Indeed I haven't seen that before. Still, it doesn't support objects like mine does :)


Another implementation of this pattern is available as a component from the Symfony2 ecosystem[1]. It is flexible, well-tested, and can be easily used standalone either from Github[2] or through composer[3]

[1] http://symfony.com/doc/current/components/property_access/in...

[2] https://github.com/symfony/PropertyAccess

[3] https://packagist.org/packages/symfony/property-access


Have you seen guzzle's implementation of this? It's pretty useful and something I'd love to see split out of guzzle and made its own library.

https://github.com/guzzle/guzzle/blob/master/src/functions.p... is the actual code, although it's pulled into the guzzle collection: https://github.com/guzzle/guzzle/blob/master/src/Collection....


That smells to me: the path argument given to "get_path" is a string of components, separated by "/". The first thing get_path does is split the path apart at "/".

Why doesn't it just take an array? That way, it would be a simple reduction:

    function get_path($data, $path) {
      return array_reduce(
        $path,
        function($result, $index) {
          return (is_array($result) && isset($result[$index]))? $result[$index] : NULL;
        },
        $data);
    }
Unfortunately PHP's function namespace is separate from its value namespace. We can't define "get_path" using a couple of combinators :(

Also, requiring 'strings without slashes' is a pretty bad idea in PHP in particular, since it's meant for Web programming. This means a) we tend to have / in strings due to URLs and b) someone will inevitably pass user input as one of these components, which would allow a very limited form of code injection attack.


Honestly, what's happening here is that you've hit the limit of JSON. JSON trades fairly significant verbosity for ease-of-use... when it stops being easy-to-use, well, you've stopped needing JSON. While you may be forced to hack around it, I'm not sure I'd spend too much time trying to figure out how to be principled in that hacking, because hacking it shall ever be.

JSON's nifty and convenient, but it's huge... with JSON from "the wild" I often find it gzips by a factor of 16. And that's just gzip, which isn't even the best at this sort of thing. If the API provides only a vague question that you can ask it, and it hands you back a huge chunk of very fluffily-serialized data, well... in a lot of ways you've already lost, twice (once for fluffy serialization and once for a presumably-foreign API giving you too much data).


Most web servers can compress the response if the client accepts it... anyhow this is a fairly universal aspect of passing long-ish strings over HTTP. I don't see any reason to blame JSON for being "huge," it is a somewhat compact serialization format within the constraints of needing to be text. ProtoBuf would be nice but requires quite a different set of client tools.

Overall the point of the article is a bit lost on me. Making APIs that return only the necessary data with only the appropriate structural complexity is and will continue to be incumbent on the developer... efforts like OData might drive products towards this goal but contrary cases will flourish for a long, long time.


Not too long ago on a personal Android app I was dealing with huge JSON strings that would take 2 seconds to decode on my test phone. I started cheating by plucking substrings out of the response and parsing them, cutting the parse time down by a factor of 10. Thankfully the server never changed the order of their response fields.


Sounds robust... Doesn't that defeat the purpose of JSON?


Hence why I called it "cheating" and I was "thankful" that the server never re-ordered fields (since proper JSON fields have absolutely no guarantee on order).


I feel like Tim (like a lot of people connected to the web standards community and systems-oriented folks in general) rushes too quickly to thinking about how to standardize and generalize an approach to breaking down large JSON documents like this. But this isn't a standards failure, it's just an API failure. APIs will always need to be refined in various ways after they're released. A "standard" method of asking the server to chop up a JSON document for you would solve this particular problem, at the expense of creating a lot more work on the server side (likely a new layer of abstraction), and there's a limit to how useful that is. Versus just tweaking the API to make it more focused and flexible is a process that's always going to be necessary, no matter what JPath/JWalk/etc standards are developed.


> ...it's just an API failure

This. As far as the bandwidth issue goes, effective use of caching and compression can go a long way. Schemes varying the responses via the URL compromise schema validation based on media type (see JSON Schema, for example). Specific cases where fat or thin requests truly make a difference can be addressed with additional media types.


I had this exact problem in Python and wanted to be able to use dot-notation (i.e "book.metadata.title") to query the structure (like MongoDB, etc do) so built my own library for doing it:

https://github.com/imranghory/pson

(Also available from pypi via "pip install pson")


  class MyDict(object):
    def __init__(self, d):
      self.__dict__.extend(d)
    def __repr__(self):
      return repr(self.__dict__)

  import json
  r = json.loads('{"body": {"translations": [{"tr...', object_hook=MyDict)

  >>> r.header.title
  u'Hello World'

  >>> r.body.translations[1]
  {u'translation': u'guten tag', u'language': u'german'}

  >>> map(attrgetter('language'), r.body.translations)
  [u'french', u'german']


The code in the library is actually pretty simple, but deals with many practical edge-cases (stepping into an array, handling missing values gracefully, etc.) that the above code doesn't.

My longer term intention is adding other useful json manipulation and search functions to the library.


Yeah. I just don't like putting syntax in a string; I'd rather use Python itself.

By the way, you should remove the pprint import from the library, since you don't use it :)


I considered that approach but found the JSON returned by some third-party apis contained characters in keys (dashes, spaces, etc) which have special meanings in python so can't be used natively.


Interesting, I thought I was a rare breed using the following function quite a bit in my code. However, it seems these things are a dime a dozen. -- I modeled this one after the way Django does it. I think the only thing it does that yours doesn't is all a function if one of the dict values is an object. Thats useful for get_somevalue() type things on a tree of objects.

    def rget(obj, attrstr, default=None, delim='.'):
        try:
            parts = attrstr.split(delim, 1)
            attr = parts[0]
            attrstr = parts[1] if len(parts) == 2 else None
            if isinstance(obj, dict): value = obj[attr]
            elif isinstance(obj, list): value = obj[int(attr)]
            elif isinstance(obj, tuple): value = obj[int(attr)]
            elif isinstance(obj, object): value = getattr(obj, attr)
            if attrstr: return rget(value, attrstr, default, delim)
            return value
        except:
            return default


The jmespath library from boto is quite similar to this, and possibly has way more traction. Some differences:

- jmespath uses `body.translations[0].language` instead of `body.translations.0.language`, and it can also do `body.translations[].language`.

- jmespath doesn't support the "missing" use case


Somewhat related, jq is a command-line utility which allows filtering JSON data, and uses its own path/filter syntax:

http://stedolan.github.io/jq/


Quite related to this somewhat related thing, jgrep also does filtering and selection on JSON data:

http://jgrep.org/


Somewhat in relation to these things that are related, pipeline is a DSL for manipulating JSON in a Unixy manner.

http://github.com/qnectar/pipeline


This is immediately solved by Lenses

    _Object . key "them" . key "public_keys" . key "primary" . key "bundle"
which you might complain saying that this is yet another idiosyncratic method of traversing data types. In fact this whole thread is full of examples of other idiosyncratic methods of traversing (JSON only) data types.

Except lenses aren't. Lenses are highly principled and compose and combine in ridiculous ways while maintaining their exact behavior. Half the features proposed in this thread exist naturally and "emergently" by the nature of lenses. Finally, they follow mathematical laws describing how everything should work together all to the T.

At the highest level a Lens is a getter plus a setter bound together. You can extract either component to get or set a subpart of a value.

At the next level, you can take note that lenses "compose" (as a category) by letting you reach deeper and deeper into a structure. This is what I took advantage of with the JSON example above: composing them with `(.)`.

At the next level, lenses generalize naturally to "traversals" which target multiple subparts all at once and "folds" which build exotic "getters" over multiple targeted subparts.

At the next level, lenses "dualize" to prisms which deal with branching types. This is hard to explain if you haven't used a language with a true sum type (Scala, Haskell, ML, and lets not get into lazy/strict sums) but I used one above to traverse into the "Object" indicating failure if my assumption of the structure of the JSON blob were wrong.

At the next level you generalize these into isomorphisms which have nice mathematical properties determining when two types are identical by giving you invertible mappings between the two. This is like a lens which focuses on the entire type as its "subpart".

(http://hackage.haskell.org/package/lens)

---

At the end of the day there are even more steps in that hierarchy. It sounds ridiculously complex, and it is a little bit to learn. The advantage is, however, that the intuition of "focusing on a subpart" applies over any data type and with pretty near any weird combination of "lens-like" operators you can dream up.

---

Usually when someone first hears about lenses they say "Oh, getters and setters. I have those already, no big deal". That's really far from the case, however... Lenses end up being the XPath of everything.


Haskell's Lens package (linked to above) is indeed nice. For a practical quickstart guide, I'de recommend the Github page over the Hackage page:

https://github.com/ekmett/lens#lens-lenses-folds-and-travers...


Apologies for the self-plug, but I tried to write up a "friendly" intro to the Haskell lens library as well

https://www.fpcomplete.com/school/to-infinity-and-beyond/pic...


Yes this is the most practical use of lenses I've found. Entirely principled access to every flavor of object...multiple results even fall out naturally with traversals.


I did think of lenses when I read this, but have yet to learn or use them so didn't want to give incorrect advice. Thanks for the overview :)


Any relation to lenses in http://augeas.net/ ?


On first glance it looks like they're drawing from the same source material even if they end up in different places.


I just don't see how

  JWalk.getString(user, "them", "public_keys", "primary", "bundle");
is better than

  user.them.public_keys.primary.bundle
What am I missing?


In Ruby or Python or JS, you should be able to do the user.them... form. In a statically typed language like Java or Go it’s harder, and I happened to be working on an Android app, thus Java.


Failure semantics are the obvious ones. At the very least you'll need some flavor of

    try {
      user.them.public_keys.primary.bundle
    } rescue {
      OhDamn
    }
Less obvious is the potential for multiple targets. JWalk probably doesn't do this, but it could. For more ideas for how this could work take a look at

http://research.microsoft.com/pubs/77415/theessenceofdataacc...


That JWalk library is in Java so there won't be something you can use dynamically like that.


You can use GSON and just parse your objects into POJOs and access them with dot notation.

In fact, now that I think about it, I wonder if you can rig up your POJOs as Java 8 Option<> and remove some of the null checking. I'll have to look into that.


That's a good place to start. Basing it on the list monad instead of the option monad gives you traversals with monoidal summaries.


AH! That's the missing piece for me. I thought this was all in JS.


What if them is `undefined`? You'll get an exception when usually you just want `undefined` as the answer.

As a side note, Ember.js has a nice method for this built in, that works on plain old Javascript objects as well as the heavier Ember ones:

Em.get(obj, 'user.them.public_keys.primary.bundle')


try{} ?


Not blowing up when one part of the chain goes "undefined"


Happy to see one of our Keybase API responses used as the example here. I can share our intentions with that API call in the long run, and how it'll end up smaller, which is especially important for mobile devices.

The dictionary describing the user you requested comes back with some high level sub-dictionaries in it: "profile", "basics", "public keys", "pictures", etc., and the API replies, currently, with all the data to which you're entitled. This is quite huge as Tim mentioned in his post.

Server-side we have a module for loading user information, which allows you to request which of these fields you want when loading a "user object". For example, on a certain page of the site we might load a dozen users but only need [user.BASICS, user.PICTURES], so that's the only data that will be loaded.

The API simply doesn't expose this filtering mechanism, yet, as calls the API hit [user.ALL] which is a combo of everything available about a user. So, in essence, all we need to do to trim these down is allow you to pass a filtering parameter.

Btw, going in the other direction, there'll be a way to query for multiple users at once, fattening things back up :-). So for example you could request just basics + pictures for an array of usernames or userids.


Help me out here HN... I remember somewhere seeing an API format where you passed up the empty JSON object which you wanted filled and returned.

Something like:

  (request)
  {username:"",address:"",credits:""}

  (response)  
  {username:"Manilow, Barry", address:"Hollywood Bowl", credits: 99}  
  
Clearly not optimal, but it worked pretty well and was very intuitive.


I think you might be looking for this: https://news.ycombinator.com/item?id=7681973


Not it, but it does about the same thing. Thanks!


If you generate your JSON with Java using Jackson, it offers Jackson-Views, a very nifty way to define sets of JSON properties according to use case.

* http://wiki.fasterxml.com/JacksonJsonViews

* http://techtraits.com/programming/2011/08/12/implementing-ja...


Neat! Jackson also has a tree data model that can be used like so:

    user.path("them").path("public_keys").path("primary").path("bundle").textValue(); // null if any node is missing


So we're ending up with JSON with schema and query support.

In a couple of years: XML is the new best thing (as people finally realise that they've needed it all along).


It's easy to argue with strawmen like you do, but query and schema support was never the problem of XML.

The problem was that XML is a markup language, and JSON is an object notation. It's right in the name. And this results in different tradeoffs for both.

The XML schema was designed to create schemas for documents. This is why different types of schemas evolved specifically for services (like SOAP) but having to be slapped on top of a markup language base, it couldn't be good even though it tried. JSON doesn't have that initial complexity and the goal matches the use.

Just like it'd be silly to write a manual in JSON, it'll forever remain silly to serialize generic object structures in XML.

Now I could also argue XML is a mediocre markup language, and show alternatives, but as they say, that's a whole 'nother story.


So you do you serialise and describe a ring buffer object in JSON? How do you provide a decimal type or a complex number? How do you validate this? How do you describe this object to a foreign system? How do you query the object? How do you transform the object from one type to another? How do you transform that object to a document?

You completely misunderstand XML. It's more than an adequate markup language and more than an adequate object format.

XML has few tradeoffs other than complexity. JSON has all tradeoffs but complexity.


I see, so XML has few tradeoffs other than complexity. So I'm sure given your insistent questions, XML has a native representations for:

- A ring buffer.

- A decimal type.

- A complex number.

No, it doesn't. It's all up to the contract. And there's nothing in XML that makes it more convenient to describe a complicated contract, than JSON (or any other format).

So XML made a tradeoff of complexity, and gained nothing.

Oh, and this is one issue you won't see happen with JSON:

http://blog.detectify.com/post/82370846588/how-we-got-read-a...

Yes, they managed to get full blown read access to Google's servers, including "/etc/passwd" and "/etc/hosts" by passing an XML file and using a standard XML feature.

"[N]aive XML parsers that blindly interpret the DTD of the user supplied XML documents. By doing so, you risk having your parser doing a bunch of nasty things. Some issues include: local file access, SSRF and remote file includes, Denial of Service and possible remote code execution. If you want to know how to patch these issues, check out the OWASP page on how to secure XML parsers in various languages and platforms."

You might want to reevaluate your point about complexity after reading this.

JSON has only two features:

1. Simple.

2. Readable.

The first feature make it possible for your wristwatch to parse JSON with its pin-sized CPU. The second feature makes it possible for you to parse JSON with your pin-sized... Anyway, just kidding. I'm trying to say it's easy to debug.

As for how to describe circular structures and references, and meta-types, you can see what JSON serializers like Jackson do in Java. You'll find that JSON can stretch easily to accommodate such needs.

But again, the problem was never having a format with native representation of everything under the sun.

XML's problem was that its parsers were big, heavy, complicated, poorly understood (as the XXE vulnerability shows). You would never need 90% of what an XML parser supports.

We needed the simplest, dumbest possible format that makes no assumptions about what it is you want to describe in it (except: values and collections), with the simplest, dumbest possible parser (no surpises, no complexity), so that we can then port it everywhere, and build upon it as a reliable base.

And while JSON ain't perfect, it's hell of a lot closer to that ideal than XML is.


You're right. It's so much sillier to do this:

    <fruit id="10" name="orange"/>
Than this:

    {"fruit": {"id":10, "name":"orange"}}
because the first one is called a "markup language" and the second one is called an "object notation". I'm not buying it.


It can be done like this too:

<fruit> <id>10</id> <name>orange</name> </fruit>

Which one is better? I find it hard to decide, and I believe most people do. And that's why we see it mixed, often in the same XML document.


The one you use is better (correct).


Utterly disagree. Attributes can actually have validated contents, such as enumerated lists, etc, and are attractively terse.

Over-reliance on elements are why Maven pom files are such a verbose disaster, and probably the main reason why web developers puke when trying to stream data. Restating element names make for illegible, bloated data. Attribute-heavy XML is attractively terse and benefits from validation (unlike JSON).


But you can't change an attribute to a composite type in the future easily.

As for maven POMs, I use Netbeans "add dependency" and that's about it so it's a non issue for me.


Far better than either is:

    (fruit (id 10) (name orange))
At least IMHO.


As the joke goes you can write COBOL in any language, and you're the proof.

In JSON a more typical format would be:

    {"id":10,"name":"orange"}
Now why you need an id and a name is another smell, but let's leave that in for the sake of the example.

You don't need "node names" in JSON. Typically objects of type "fruit" will be hosted in an array whose type you're always aware of by the contract of your service.

Sure you can have polymorphism and hint the type in those exceptional cases:

    {"type":"fruit","id":10,"name":"orange"}
But at least you include that if you need it, it signifies intent, and it's not done just to appease a markup language bent out of shape as a serialization format.

By the way, it's curious you chose to use attributes in your XML example. Attributes aren't typically used to serialize object fields. Can you guess why?


You seem to have only one use case which appears to be serialization, that isn't nearly the world of what xml covers. Yeah I guess if you don't want to use a database for your blog application json serialization is fine.

A lot of us need much more from our data than that. We need validation, we need a machine readable description format, and we need apis to leverage all this that doesn't change every day.

The json community is constantly reinventing the latter. Who cares if it saves a few bytes in transmission, I don't know anyone who grumbles "oh great, the xml is making my internet slow today."


THANK YOU.

I've been raising cain about this in the chef community for some time - node objects can easily be as large as 128kb+ of json, which can consume over 1-2MB once parsed into a ruby json object. An empty search of a system with over a thousand nodes can consume over a gigabyte of ram!

The worst case I experienced this was writing out an /etc/hosts file, for which you only need two fields : name and ip, of each host, but you still get a list of every cpu core, dimm, etc..

Excited to see the potential examples, I might try to work on into chef if I get time. The chef solution has been to have 'whitelisted attributes' which is a whole mess unto itself.


Once you have used a parsing library to create generic data structures (in your programming language) from your JSON all of this no longer has anything to do with JSON, right? That's something that confuses me about this blog post. To me it seems that it talks about a very generic issue in very specific terms.


Agreed. In something like JS (or dynamic C#) one could simply access obj.results[0].address.street, which casts the object nesting as useful/self-documenting rather than needlessly complex.


Given that someone uses fat JSON, it seems plausible that you'll have to face those sorts of problems (either simple selectors with logic for potentially dealing with multiple responses, or complex selectors into the tree). What you're really saying in JSON-land is something like, "this whole object should be destructured; I just want a flat object with short keys."

That's the right design approach for the "structs" of JSON; it's wrong unilaterally (JSON also has "hashes" with the same syntax, and they should be separated from that context. Similarly you don't want to destructure an array from {data: [1, 2, 3]} into {data_0: 1, data_1: 2, data_2: 3} unless you absolutely have to.)

Once you flatten it, then partial responses for things which return a struct do exactly what you want; you say e.g.:

    ["myQuery", {on: "stuff", _fields: ["a", "b", "c"]}]
and you just get {"a":1,"b":2,"c":3} as your JSON response.

So what I'm saying in summary is that if you write your own APIs you can get this sort of functionality without building a magic tool; the reason that the magic tool is not mainstream is because it's only right for dealing with structs and not hashes (because if there's a hash elsewhere in the object a user might register their own key in the hash called "ctime"); and given that some API gives you a complex structure, flattening it the way you're doing is potentially a little risky because later updates might say that there's another ctime to some other part of the Users object.


I must be missing something obvious. How is Tim's JWalk example different than doing:

  try { 
     var key = json_object.them.public_keys.primary.bundle;
  } ...
Is it to better allow dynamic keys? More consistent error handling?


Well, it's in Java, for one thing.


So this "fat JSON" problem is Java's problem, not JSON's, then? The above code could be perfectly valid C#, as well as valid JavaScript.

On the other hand, if you're hardcoding assumptions about the JSON structure into your Java (which you're doing even if you pass a string literal to some API to look up a value for you), you could probably employ some lightweight code generation to spin you up some classes with strongly typed public fields named things like 'them' and 'public_keys' to help you write more idiomatic JSON access code.


> So this "fat JSON" problem is Java's problem, not JSON's, then?

Pretty much.

I would expect that Java could offer some sort of solution to this problem, ideally with an accessor syntax similar to what you'd find in Javascript or, as you say, C#. On the other hand, I've never delved deeply enough into Java to know whether it's feasible; I suppose it's possible that the language simply isn't flexible enough to make such a solution workable.


D'oh, thank you.


If path features became common it would likely lead to even more bloated and less thoughtful APIs.


My security senses are tingling. A server evaluating potentially hostile client provided expressions? Proceed with extreme caution.


I know everyone is saying this is an oversimplification and just another person rushing to create a library etc, etc. But the fact of the matter is if you ask any Java developer who has dealt with JSONObject they'd probably want to use this. And that says something.

Why can't I make a contract with my JSON parser. Saying: look, last time there was a 200 OK response all of these fields were there. I promise, they'll be there next time too. I don't need to try { get field } catch {} every single time.

The best way around this currently is to hope that the API has a client library, but that just means that every API maintainer now has to write my Java for me too.

I'm not sure what the solution looks like, but I want a modern Java JSON parser that understands how the API landscape looks today. This extends to other languages that are static typed and have exceptions as well.

Edit: for the record OP is Tim Bray who invented the XML spec, so he has some experience with traversing documents.


First world problem, man. Important only to the performance obsessed. Most developers concerned about this have too much time and not enough commercial imperative.


As we move intelligence to the client, we're starting to use a lot of data queries which were never meant for the wire. (Particularly in IT applications.)

Having a way to filter in those cases would save having to write & maintain a whole new CRUD layer. In the long run, it could even lead to more efficient queries when objects are composed of many records.


If only there were some kind of structured query language we could invent that allowed us to choose the fields and records we needed to look at. We could get some kind of standards institute to ratify it so everyone used the same interface.

A pipedream, I know. A crazy, wild pipedream.


While we're pipedreaming ....

how about some way to validate streamed data against a schema? Offering such advanced features as ... an array with a single element? Or even crazier, enumerated values.

Maybe even a way to transform streams using a selector-based query syntax. What a world ...


I rolled my own solution to this for a mobile turn-based game I have in development. Most of the time, the device is just polling to see whether there's anything that needs updating. As part of my polling query I pass a signature of the current game state (game-round, with a few other bits). On the server, if that checks out, then the reply is tiny. If there's the need for an update, I send back what's changed and update my views on the client-side. It's definitely not the right solution for every scenario, but I've found it works well for my specific situation.


RFC 6901 specifies a simple XPath-esque notation.


I had a similar idea[0] that I haven't actually had time to finish. The README.md kind of explains where I was thinking of going philosophically. It's a bit out-of-reach with my current workload but I'd love to contribute with others that could tackle the areas I find difficult.

[0] https://github.com/sebinsua/jstruct


The Open Data Protocol (OData) specifies ways to declare server-side filters and to restrict the fields sent in the response.


This is the typical nice-to-have feature. It takes time to implement, adds server-side overhead, adds complexity but adds very little value. OP is arguing that it costs him resources to traverse the entire JSON. There are way more clients then servers, thus increasing server-load to make it cheaper for the clients sounds weird to me.


> There are way more clients then servers, thus increasing server-load to make it cheaper for the clients sounds weird to me.

It matters if your client is a 1ghz phone with a poor cellular connection. The user can do the filtering on their end, but you're paying for that offloading in terms of a worse experience for the user because of the larger download.


JSONSelect lets you do queries on JSON objects using CSS-style selectors. There's an interactive demo here:

http://jsonselect.org/

The code is on github:

https://github.com/lloyd/JSONSelect


http://agave.js includes (prefix)getPath by default on any object.

    var mockObject = {
      foo: 'bar',
      baz: {
        bam:'boo',
        zar:{
          zog:'something useful'
        }
      }
    }
So:

   mockObject.getPath('/baz/zar/zog')
or, alternatively:

    mockObject.getPath(['baz','zar','zog'])
will return:

    'something useful'
It's also got a bunch of other useful stuff like 'kind' (closest prototype of an object, that works consistently everywhere), number methods like (2).weeks().ago(), and reads more cleanly than underscore as it uses actual methods.


The main problem with string-based paths is that the keys are just a string, so an object with slashes embedded in the keys is perfectly valid. So, ambiguity.


Yep, that's one reason for the array version.


Why would you use mockObject.getPath('/baz/zar/zog') over mockObject.baz.zar.zog? Unless getPath is null safe and you expect that sometimes part of the chain doesn't exist, but the docs for getPath don't indicate that as a feature.


> Unless getPath is null safe and you expect that sometimes part of the chain doesn't exist

Yes, that's exactly why.

> the docs for getPath don't indicate that as a feature.

The docs currently state 'If any of the keys are missing, return undefined. ' Perhaps they should be more explicit? If so I'm happy to take suggestions / pull requests.


I must have missed that in the docs.


Letting clients choose the fields they want to receive leads to the awkward realization that any generic, flexible and future-proof mechanism for doing this leads to system where a client can suck down an entire website with a single request.

(If clients can choose not to receive some information, then by symmetry, they should also be able to choose to receive some additional information that's not included by default. And since everything is linked to everything else (orders are linked to users, etc.), you end up with a single resource that potentially embeds everything else.)

These systems also break caching, of course, and also to some extent the principle that within-server links are indistinguishable from cross-server links. The web is not optimized for performance or file size.


That's really a non-issue though. Just because they can decide to receive less information, it doesn't follow that they should be allowed to receive more. I don't see why this particular type of symmetry would be important or desirable.

The way I've implemented this in the past is to start with a standard API endpoint with a defined data-set that it returns. Then I allow the client to select to receive only a subset of those fields, or make no selection and receive all of the defined fields. The client cannot request fields that are not part of the data-set defined by that endpoint. This is at least as easy to work with going forward as an inflexible return object. The API can be updated to include more data without affecting clients who don't care about that. If there is a breaking change, then that requires a new version, just as it would normally.


I don't think this breaks caching. The 'fields' param is part of the query string and should be used as part of any key used for the cache.

I'll contend it does make retrieving from the cache less likely to occur since different clients may have different values for 'fields'. That said, I've anecdotally found that using a reasonably good default value for 'fields' (with associated reasonably small JSON body) means most clients end up not having to send 'fields' anyways. This makes the cache hit rate stay high.


One of my pet theories when I developed intercooler.js (http://intercoolerjs.org/) was that, by targeting specific UI elements with only the data necessary, you might actually cut down on the amount of data transfer between the client and server when compared with some general JSON APIs, despite the fact that HTML is a less efficient data format.

I'd expect this to hold, in particular, in areas where JSON isn't particularly a particularly efficient encoding mechanism (e.g. tables)

It's an interesting thing to consider, at least.


This is definitely a problem. I work with an existing API and am building a mobile client based around it, and while there's limited support for selecting which fields you bring down, it doesn't work for nested object.

This results in a bloated response, which on mobile is a real problem for responsiveness and data costs.

We considered building a proxy which would form responses tailored for the mobile client, but that felt like a hack. Mind you, so does Google's partial response solution.

Maybe some services could allow you to build your own response, creating a custom version of an API just for yourself?


Weird I also made a small dot-lens javascript library to do the same thing just the other day...

https://github.com/jb55/dot-lens

It even works for zooming into arrays


As I understand it, there's a lot of places overhead creeps in that tends to make this sort of thing vastly more efficient than making multiple calls. Sure you'll maybe send data people don't want this time, but it will probably save processing and networking overhead that would be spent building and sending a long list of the fields they want. When designing an API I tend to lean to being a bit more verbose than I need to be, if only to save the HTTP overhead of another request. At least until HTTP2 helps us out there.


Use JSONPath, it's a JSON version of XPath and it's awesome.

http://goessner.net/articles/JsonPath/


It is important to note that when you accept selected fields to output you must validate those field names as well.

Sometimes people has a giant object from database, and on return they return a subset of it. But someone may make a mistake by iterating over that object to return selected fields.

    if options:
       return {key: object[key] for key in options}
    else:
       return safe_output_for_this_api(object)
So collapse that into safe_output_for_this_api instead :D


I do this on a current project. Certain fields are rarely need and turned off by default, others are usually needed and turned on by default. It's not done via a special syntax though, just query parameters, so something like /person/123?bio=false&salary=true. A standard path syntax might be nice but for handcrafted APIs this works well.

The front-end models are reusable and don't really need to care so long as the properties they need are available.


The Play framework has JsPath, which I like quite a bit. It allows for traversals of the kind I assume Tim would want to do with XPath.

http://www.playframework.com/documentation/2.1.1/api/scala/i...


That sort of syntax reminds me of Objective C's key paths.

  [object valueForKeyPath:@"them.public_keys.primary.bundle"];
which is part of the larger, more in depth, Key-Value coding interface. It was definitely something I missed going back to PHP, Javascript and other languages after using Objective C.


We built a jackson extension to offer this sort of friendly filtering automatically with JAX-RS endpoints: https://github.com/HubSpot/jackson-jaxrs-propertyfiltering


I'm a little late to the party, but I recently wrote something that can help with this problem. It has the added benefit of acting as a kind of reverse proxy:

http://github.com/alexose/sieve


Couldn't a "possibly null" member access operator help here? E.g.:

   var key = user?.them?.public_keys?.primary?.bundle;
where

    object?.property
is equivalent to

    object ? object.property : null


I'd rather just have a functions similar to clojure/clojurescript's get-in[0], update-in[1], assoc-in[2], dissoc-in[3] for working with nested objects/arrays. These are pretty easy to write in Python, Ruby, and Javascript, although I'm not aware of any public library that implements them. It really makes working with JSON-returning apis a breeze.

[0] - http://clojuredocs.org/clojure_core/clojure.core/get-in

[1] - http://clojuredocs.org/clojure_core/clojure.core/update-in

[2] - http://clojuredocs.org/clojure_core/clojure.core/assoc-in

[3] - http://clojuredocs.org/clojure_core/clojure.core/dissoc-in


The authors JWalk is basically get-in. Lenses, as popularised by Haskell, solve a similar problem.



underscore.js does this with a pick function

_.pick( {a:1,b:2}, "a" );

underscore is cool.


If you think underscore is cool, you'll go bananas over lodash.


Yet another single letter namespaced project to collide with other single letter namespace projects.


lodash is a replacement for underscore, so it's definitely not as bad as all those '$' libraries. Just make sure lodash gets loaded after underscore, since it's a superset (and faster)


thanks for the suggestion i shall look it up!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: