Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>I was surprised to find that around 20% of listings were over $50k and around 6% were over $70k. I expected these high prices to be more rare.

This seems to be because you scraped new car prices instead of used. The used market has a wild variance in price data, even if viewing only CPO vehicles directly from dealerships. That in itself could be a whole other fascinating week spent training it on that data. It could also be that Kelley Blue Book's data is wrong as always. KBB pumps up the prices on used cars well beyond their actual market value, and I've yet to figure out what benefits they receive from doing so. It might be used dealerships gaming things. A Chevrolet Cobalt regardless of year or location sells for about $3,500 from private sellers, but KBB says they go for $6,600, or almost twice that.

I'd also be interested in seeing market location weighting for the data. I expect vehicles to be more expensive in places like California and New York compared to Florida or Oklahoma for example.

>After some Googling, I found out that this was a limitation of EXT4 when creating directories with millions of files in them.

I've experienced this myself with faulty Flatpaks spamming my drives with log files. And also Blender, interestingly. If you split a frame into six layers (AO, diffuse, glossy, alpha, shadows, Z-depth) and the animation is about sixty six thousand frames, that's three hundred and ninety six thousand images for one forty five minute animation. Often times you split the scene into three pieces; foreground, focus, and background, which triples that output to a million a hundred and eighty eight thousand images. When put into separate folders to separate each scene or shot, and with multiple renders to test or be sent for approval, it's very easy to run into the EXT4 hash table limitation for just three or four animations over twenty minutes.



OP here. Interesting, I had not researched the used car market enough to know about the price inflation. Can you think of any other sites that have more realistic prices?


From experience, no online used marketplace seems to be immune from price manipulation or inflation. KBB is used as a reference tool to find a starting point for vehicle prices, but the only way to really know how much they cost is going through multiple websites to see. I'd suggest scraping data from Cars.com, eBay Motors, Craigslist, Facebook Marketplace, Autolist, and Cargurus.

For a starting point I'd have it look at vehicles from 2012 or newer. This should prevent "classic" or newly collectible vehicles from spiking the data. Secondly, be aware that Craigslist prices on the front of the ad are often listed as fake numbers such as "0$" or "12345$" with the actual soft or hard price contained within the ad's body of text. Facebook Marketplace can be much worse about this, and much more manipulative with the number of fake listings, so check for duplicates or suspiciously low prices to prevent spiking the data. eBay Motors has a quirk shared with eBay, where they have both an auction price and a Buy It Now price. The only reliable way to gauge prices there would be using Buy It Now listings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: