Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Introducing Google Cloud Natural Language API, Speech API and New Data Center (googleblog.com)
224 points by ppoutonnet on July 20, 2016 | hide | past | favorite | 49 comments


Accuracy still seems to leave a fair bit to be desired. For example, when parsing "Blue, Brown, Orange, Green and Red Lines were running normally" from this news article [1], "Blue" was interpreted as the color [2], "Brown" was interpreted as the Brown Bears football team [3], "Orange" was interpreted as the Orange Line Washington (not Chicago) Metro [4], "Green" was interpreted as "environmentally friendly" [5], and "Red lines" was interpreted as the organization (but without a Wikipedia page).

Works pretty well on places, though.

[1] http://chicago.cbslocal.com/2011/02/02/dangerous-blizzards-w...

[2] http://en.wikipedia.org/wiki/Blue

[3] http://en.wikipedia.org/wiki/Brown_Bears_football

[4] http://en.wikipedia.org/wiki/Orange_Line_(Washington_Metro)

[5] http://en.wikipedia.org/wiki/Environmentally_friendly


Google Research PM here -- my team built the language understanding tech that powers the API. Thanks for checking it out!

You picked a really interesting, and really hard, sentence to use to test us with. It has a couple of interesting phenomena: a reduced conjunction ("Line" goes with each color to make a name, like "Blue Line" even though "Blue" and "Line" are far apart), and high ambiguity ("Green" could be the color, the environmental movement, the political party, one of several people, or lots of other things (https://en.wikipedia.org/wiki/Green_(disambiguation) ).

These are hard! So hard, in fact, that I'm reasonably sure that there's no system in existence that would get these ones right. (I hope I'm wrong, actually, I'd love to see approaches that can solve problems like this generally.)

Our systems are state-of-the-art, or in some cases better than any other published system. But language is really hard, and even the world's best systems are way worse than any human at understanding language. That's what makes working on this stuff so much fun and so challenging. It feels so easy for us as humans, but we just haven't figured out how to model all this so that the computers can do as well.

(I'm going to steal this sentence to use internally as a great "NLP is hard" example, thanks!)


>"Green" could be the color, the environmental movement, the political party, one of several people, or lots of other things

It says 'train line' right in that sentence. Where is the ambiguity?

even if it missed that why didn't any of the 'cta' , 'chicago' , 'train', 'run' or 'line' nudge it in the right direction. It seems to have identified 'cta' entity correctly but completely ignored that context for the next words in the sentence.


It says 'train line' right in that sentence. Where is the ambiguity?

That's the thing -- in your (wetware) mind it says "train line." But the sentence itself it just says "line", which can mean a whole much of things.


It does say "'L' trains" in the previous sentence and "trains" in the following sentence. And from bduerst's post, it appears that GCNLP does use non-local information to determine an interpretation. So in theory, there should be enough information within the paragraph to identify these.

The tricky part is that I don't really see a way that a computer could disambiguate between "train line" and "bus line" given the information in that paragraph; the only way that you could do this is to have knowledge of Chicago's transit system. (Indeed, as a human, I didn't know for sure that it was talking about a train line rather than some other mass transit system until Googling.)


But the sentence itself it just says "line", which can mean a whole much of things.

Ok i changed the sentence to include 'train lines'

CTA buses were moving, but slowly, and ‘L’ train lines were operating, but with delays. Blue, Brown, Orange, Green and Red Lines were running normally.

Even this is no good

CTA buses were moving, but slowly, and ‘L’ train lines were operating, but with delays. Green train line was operating normally.

didn't make any difference. Were you able to get it to parse correctly somehow?


Now we have a different problem -- in that the new examples aren't idiomatic. There's no reason we should expect it to identify the phrases 'L' train line and 'L' line with equal accuracy, because the former is something basically nobody says -- while the latter is something everyone (even someone unfamiliar with that particular city) understands.

But hey -- I'm not defending the API's accuracy ;)


Right, this isn't ambiguous to people. But you know that CTA and Green Line are related concepts. We don't yet have a way to model all of that huge amount of common sense knowledge that people have and use to figure out what a sentence means.

It's the curse of NLP, really. All the easy things are hard. (And the hard things are nigh impossible.)


As someone who's been adding schema.org to organization location pages to try and fix a possible Google Maps NLP bug[0], I can say that if Google or others ever improved their testing/analytics tools, they'd get much more adoption internet-wide on this kind of stuff. Particularly if it showed up prominently with some NLP or Google-fu inside the Developer Tools console.

I mean, if Google My Business can show me what hours and info it pulled out of my site, why can't it also suggest an editor and code snippet to use to embed that as Schema.org LD+JSON? Sorry for going off on a tangent, but it's been days since I added LD+JSON structured data and I only just yesterday learned that some parts of Google (but not the testing tool) only recognize LD+JSON inside the head of a page. There's no immediate feedback from Google whether the data I added is actually useful or not to any part of the Borg.

I think if you wanted better data, as an entity, you could easily push the engineers behind websites to give it to you. It starts with the tooling and encouragement, though.

In this specific example, I bet Google Maps via GTFS knows plenty. Now if only GTFS could be updated to use webpages and some form of Schema.org, we'd have a standard for knowledge, right? ;-) [see Footnote 3]

[0]: I'm adding the data to try and resolve a Google Maps bug where searches for "Toronto Public Library" are instantly featuring only one of the 100 branches of the library, the one closest to Toronto City Hall. Examples[1][2]. I'm now beginning to suspect NLP, since when I search for "Toronto Public Library near me" it works as intended, but when I do just "Toronto Public Library", I think it wants to find libraries closest to Toronto, and picks City Hall Branch automatically. It also is linked, strangely, to the Wikipedia entry on the Toronto Reference Library, a different branch entirely. My Schema.org is an attempt at disambiguation using parentOrganization and subOrganization references, but if the problem is the name Toronto Public Library and how Google choses to interpret that, then there's not much I can do to change it, can I?

[1]: https://www.google.ca/maps/?q=Toronto+Public+Library

[2]: https://www.google.ca/maps/?q=Toronto+Public+Library+near+No...

[3]: GTFS used to, maybe not now, but at least when I last had to read it, required mapping transit agency specifics to general Maps fields, so for example, the Heading data would lack semantic meaning but looked good in Google Maps if you put the route in it, etc.


But not even the proposed example does a great job regarding entity salience.

> Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones.

Results in:

Google - Organization, salience: 0.27

Mountain View - Location, salience: 0.10

Sundar Pichai - Person, salience: 0.07

CES - Event, salience: 0.07

Android - Consumer Good: 0.07

So "Android" which is a central piece in the announcement ranks lower than "Mountain View", which is mostly irrelevant.


Does the technology behind this system do any classical parsing work? Or is it mostly seeing implicit structure via embeddings?


okay but "this is really hard" is not a good argument when ("beta" notwithstanding) this is being pitched pretty heavily by your firm with this public announcement. None of my professors or bosses have ever accepted "this is really hard" as a legitimate answer to the tasks I was given if I myself had pitched my ability to perform them. A firm like Google, with all its towering resources, is going to have a hard time getting cred with excuses.


You know nothing about NLP. Also there is a difference between good and good enough.


I know everything there is to know about the opportunistic business practise of using the public to beta test an incomplete product, for free, and then making loads of money. Transparent.

You clearly know nothing about business ethics, nor the concept of quality.


Let's ask a rhetorical question: Who'd use this kind of information?

My answer: It's mostly search/listing apps. The biggest merit of tagging/parsing for now is that you can search the sentences who might possibly refer to some specific subway line. Toward this goal, you don't need an exact interpretation of each phrase. What you'd need is a high-level function that asks if "sentence X".refers_to("Blue Line") and the API should be designed for this kind of goals in mind.

I think the current premise that you can/should get a exact parse/tagging for a given text is kinda misleading, because an interpretation of natural language is fundamentally vague. IMO, we should go after the performance of these high-level tasks (e.g. searching or translation) rather than microscopic accuracy. It sucks that we wouldn't be able to parse the text once for all and cache the canonical representation for future use, and we'd always have to run the matching API or some sort. But I think it's kinda the nature of NLP.


Hmm, which NLP request did you run it as?

It seemed to run better when I ran it through analyzeEntities: http://pastebin.com/raw/AfXvKAzw


It was in context - I pasted the whole text of the CBS Local article I linked to into the "Try it out!" text field on the product home page, and these were the entities it identified.


Free till August 1st (with request limit) and then there'll be a free tier. Pricing is here: https://cloud.google.com/natural-language/pricing


luis.ai is 100k requests free per month vs Google's 5k.


Regarding non-streaming transcription: It doesn't seem like the return includes timings, which would limit its usefulness in transcription for accessibility (where, for example, a post-processing step on our side would consume the Cloud Speech API and build SRT subtitle files for class recordings)

Is there a product / service / library out there that could help align a known-good transcript with the original audio (creating the timings) if the Cloud Speech API doesn't plan to return timings?


Yeah that was a bummer to see in the alpha and I was holding out that they'd expose that data at release date (as IBM Watson has), but maybe they want to limit the ability for competitors to easily appropriate the sound-to-transcribed-word data.


Thanks for mentioning Watson; I keep meaning to go back to it and see how limits have improved.


I'm currently developing a system along those lines. Not publicly available quite yet, but should be soon. Bug fixing at the moment.


Why would you need the Speech API if you already have a known-good transcript?


I'm sorry if I was unclear. If I used the Cloud Speech API to produce the transcript, is there another library/service/etc that would align it to the original audio again.


Watson does this, VoiceBase does this and I believe HP and MS do this... it's very common for speech API's to provide time offset information for each word.


Speechmatics sells this product. My company, CastingWords, uses it to align our human produced transcripts, and make SRTs. It works well.


You can use Speechmatics directly


Similar offering suite from Microsoft,

https://www.microsoft.com/cognitive-services/


I wonder how they compare.


AFAICT, the MS offering looks pretty basic by comparison [0]. No named entity recognition or sentiment analysis.

[0] https://www.microsoft.com/cognitive-services/en-us/linguisti...


Through a combination of cognitive API sentiment analysis and entity recognition is also supported.

https://www.microsoft.com/cognitive-services/en-us/text-anal...


Open Calais has been there for quite some time


Is this technology good enough to be able to have a relatively intelligent 'conversation' with a real person?


On the other side of the NLP spectrum, what players out there are involved in NLG (natural language generation)? I've seen Arria and a couple other small projects, but I'm just starting to investigate the space and haven't found a whole lot yet.


We are working on pre-processing text corpus to generate assumptions and presuppositions using NLG.

Our initial aim is to index the whole Wikipedia and have pre-generated NLG for all pages as additional info for better searches.

Right now we are using Lucene & Syntaxnet but I haven't found a good library for hierarchical clustering of text

Check us out at www.shoten.xyz


Check out Wordsmith by automatedinsights.com - it does NLG for Yahoo! Fantasy Football, among other places.


I'm curious to see whether clients actually want to move very large datasets through these APIs, or whether that's too costly. It strikes me that cloud services work best for data generated and managed in the same cloud...


I've been moving into the (google) cloud and one of the main benefits is that I no longer have to down- and (especially) upload the data I work with, but simply remote-control a pipeline with a much larger pipe.

I guess for data that is generated on site it's a net negative, but I'd guess almost all data has to go across the internet at some point and google is probably not the worst warehouse on that highway.


So about $1.44 per hour, that's reasonable-ish.


We run Apache OpenNLP which gives comparable results to this service. The advantage of Google Natural Language right now is the Wikipedia link it provides to the entities it detects but I haven't seen it's results beat Open NLP


He's talking about the speech recognition. Google's speech recognition is far ahead of the competition, and also slightly cheaper (except for Baidu's which is free, but good luck getting it to work).


So how long until these new services get summarily discontinued?


Between negative comments about Google's service lifecycle, choice of font/colors on a page and general nitpicking of every little detail on a linked page or someone's comment I'm getting fed up. It's gratuitously negative and repetitive to see this kind of stuff every day. If I wanted that kind of negativity and snideness I'd be using reddit.

Maybe some kind of "meta" comment section is the solution so we can have discussions about or derived from the content in one section and the folks that want to bash the company/person/font/colors/title/grammar/credentials/etc. can use the other.


You can read the terms of service here: https://cloud.google.com/terms/

They include a deprecation policy.


Let me guess, still upset about Google reader?


Enterprise should be treated differently if the ever wants to tackle AWS.


Seriously, there's no way I would use Google APIs for anything production at this point.


Well, if you want uptime that's better than AWS then I would.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: