Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here is the working paper [1] if anyone is more interested in the details. The gist of it is that there are two ways to get insider transaction reports from EDGAR. One way is to scrape the EDGAR website (polling), the other is to sign up for a push service [2] where the filings are sent to you as soon as they hit EDGAR. You have to pay for the latter service. The paper calls the scraping solution "public" even though both solutions are open to the public (though you have to pay for the push service). According to the paper, it turns out (maybe not so shockingly) that the push service is better than the scraping/polling solution about half the time. The paper is written by economists and not technologists, and so there is no mention of CDNs or web caches a.k.a. the types of things most developers would suspect as being the most obvious source of the discrepancy.

1. http://online.wsj.com/public/resources/documents/SECDissemin...

2. http://www.sec.gov/info/edgar/ednews/dissemin.htm

edit: There is a WSJ article suggesting the price for the push feed is around $1500 / month.

http://online.wsj.com/articles/fast-traders-are-getting-data...



Ah thanks for the link to the working paper, I had been looking for it. This jumps out from the footnote on page 7:

"As far as we know, this time is not available on any publicly available database. We initially obtained these times using real time “scrapes” of the SEC EDGAR site. We subsequently used a collection of these times made available by the Tier 1 subscriber. Given that this entity’s business model depends, at least in part, on obtaining and disseminating these filings in the most timely manner possible, they have strong incentives to collect accurate information about when filings become available on the SEC website."

Basically, clock skew is not accounted for at all and timings are potentially done from disparate and non-common observation points. It's quite possible that the delays they observe are strictly due to noise.


Noise from clock skew and timings of half a second? They make a point that their information source for timing is from a subscriber which appears to trade on the information, so timing is probably accurate within 100ms. It sounds like this information can be used for a variant of a "news" trade, which can be extremely latency-sensitive.


Clock skew in that the clocks used by the participant may or may not have been synchronized. There are, effectively, no details about how the timing is done on the news side and there is allusion to some use of EDGAR posting timestamps which are from an entirely different clock.

Even more concerning, is that they map news events to market data using non-synchronized market data timestamps. It appears to be TAQ data. The timing aspect of the entire article is very poorly described but yet is fundamental to the claim they are making.


>You have to pay for the latter service.

There are no public prices and you need to email some company for more details. With most businesses this means you contact a salesperson who will play an extended game of "how much you got?" and the pricing AND the level of service provided will vary widely depending upon your negotiating power.

The dearth of information about this 'product' sold by a 3rd party suggests something shady is probably going on.

>The paper is written by economists and not technologists, and so there is no mention of CDNs or web caches

Caching is a pretty poor excuse. Caches can be invalidated. And CDN basically means caching by another name.

It makes much more sense (both economically and technically) that they simply crippled the free feed. Especially since it is provided by a private third party who makes bank on the people who purchase the premium edition.


You're just making allegations. It says in the contract that the subscription price is based upon how many subscribers subscribe to the service (they need to cover their costs of implementing 24/7 support and a help desk). It probably isn't public, but Google-fu turns up some older information:

http://contracts.onecle.com/edgar-online/trw.svc.1998.09.11....

    A.   Broadcast Service Subscription Charge
    This charge will be set on October 1, 1998 based on the number of signed
    contracts received by that date. The table below contains a 14-month price at
    various subscription levels:

    SUBSCRIBERS AS OF
    OCTOBER 1, 1998                PRICE
                          
    1 to 8                        $152,172
    9 to 15                         92,967
    16 to 25                        55,346
    25+                             42,179


This table strongly suggests that there's something like a "slice of pie" pricing scheme in the government's contract with the service provider. That's a very common arrangement where the per-subscriber pricing is very high for few subscribers and drops as more are added.

At the limit, the slice of pie pricing is completely continuous and each incremental subscriber lowers the cost for everyone. This one may be coarser grained.

Nothing inherently shady or anticompetitive about such a scheme. It also tends to arise where governments (like SEC, or like county clerk offices) have a dual mission of providing public access while offsetting costs, and need/want to outsource it to a for profit service provider.


$150,000 = you are paying for early access to insider data. You are NOT paying for a fucking help desk at that price.

Edit: downvoting does not change this fact either.


According to the WSJ article on this, the current price is around $1500 per month. [1]

Also, note that you are paying to receive an electronic feed of a company's public SEC filings, which include filings detailing purchase and sale of stock by the company's employees, which are called "insiders" in this context.

The phrase "insider data" conjures images of insider trading, which relies on trading on material non-public information, and something quite different.

[1] http://online.wsj.com/articles/fast-traders-are-getting-data... (paywall, Google the title to bypass)


you are being downvoted for picking a number out of context and unnecessarily provocative language.


"It makes much more sense (both economically and technically) that they simply crippled the free feed. Especially since it is provided by a private third party who makes bank on the people who purchase the premium edition." Not sure this makes sense at all - if the push version is better half the time, that means the other half the time the free or non-push version is better. Since "better" is a matter of seconds here, it could be something as simple as "half the time, push notifications go out before the site finishes replicating".


The SEC website explicitly states: "The subscription price is set annually by Attain, LLC, using a weighted average methodology based on the number of primary feeds for subscribers at the beginning of the each year. Subscriber organizations can be invoiced annually or monthly."

There are two very detailed documents on the linked website that have both business and technical details along with specifications for the service.

There is nothing shady going on here.


No published prices + a seemingly HIGHLY profitable portion of a government service provided by a private company = something pretty shady.

Even the contracts to run private prisons don't give this much opportunity to gouge.

Were this service run on an at-cost basis would maybe cost $80 or so per month. $200 at most.


"HIGHLY profitable"? Even at the highest price quoted on this thread, this looks like a rounding error --- in the sense that if you were a startup providing this information at these prices to its total addressable market, you'd have a hard time even getting funded.

The number of firms that can profit from trading off realtime access to fundamentals is not large. It's a market where viable real products need to have customer lifetime values in the many millions.


I'm wondering if data could be distributed ahead of time in an encrypted fashion and then only a key needs to be published/pushed, which is a far more trivial thing to get out to many people simultaneously.


I like this idea.


I think it's very unfortunate that they offer two different data sources which provide an unfair advantage to paying subscribers. Then again I don't think there's a single serious investor out there, who attempts to gain an advantage by scraping the crummy EDGAR site. Regardless I guess this is just what you get when you outsource a public service to a private company like EDGAR Online.


I don't think EDGAR is outsourced. I think EDGAR is a project run by the SEC. There appears to be a separate company called EDGAR Online which is all about taking data from the SEC's EDGAR and making it more palatable for institutional investors. The only outsourced component here is the EDGAR Dissemination System (the subscription push feed of EDGAR filings), but that is outsourced to a company called Attain, LLC.


Distributing a digital document in a way where everyone in the world gets it at very close to exactly the same time sounds like an incredibly difficult problem.


Does it have to be? I'm sure we can come up with something.

Off the top of my head - distribute encrypted blob ahead of time, then broadcast the key using longwave radio from the South Pole.


I'm not sure that lowers the bar out of "incredibly difficult".

For instance, how would you feel if you heard the SEC invested in a bespoke technology for doing this just to mitigate a document timing issue only interesting to a very small number of market participants? I'd be upset about the waste of my tax dollars personally.


It doesn't have to be a SEC-specific thing. We can make it a general service - it publishes a public key + "release time" well in advance, then publish the public key at the exact "release time". This allows anyone to release information on the exact specified time.

Sort of like GPS - maintained by one party, useful for the whole world - but much simpler to implement.


I like this idea but you're not thinking big enough. The public key must be transmitted from space to avoid giving first mover advantage to those sneaky penguins.

This is SpaceX's killer app.


Penguins might get a leg up on the information, but they won't be able to reach New Your Stock Exchange before those who live right next to it, so it all evens out.

The only creatures who are disadvantaged by this scheme are polar bears and others living way north of the stock exchange - they also get their data late, but can't compensate by shorter distance to exchange.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: