Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given we are living in 2020, there have to have been efforts or thought-experiments dedicated to archiving YouTube (would love to read about them if they exist). It seems 'almost' impossible like you said, even if we disregard, say channels with less than 100 subscribers (just as an example).


I was at a conference at IBM around 1993 at which a very smart person pointed out that we had reached peak “knowledge”, in the sense that the new explosion of data on the Internet, including the the new “www” meant indexing would soon be impossible. “You’d need to keep a copy of every host”. We all nodded because this seemed pretty obviously true.

It also wasn’t an urgent problem as experiments like http were unlikely to reach broad usage due to critical failures like the one way links.

I mention this not to mock your “‘almost’ impossible” statement but to point out that you’re in good company


From an indexing/search perspective it's already impossible to find some of the more obscure videos I saw in 2013 because of the relentless crap flood of "things sort of like what you searched for that are newer."


Well that and the growing penchant for revising history by destroying dissent, blacking out unpleasant thoughts and facts, and the onslaught of ridiculous copyright strikes against fair use content.


I constantly have trouble trying to find "that one tweet" from yesterday, a much less intimidating problem.


Twitter has notoriously flaky search in my experience; it is clearly geared towards discovering new content and not towards finding content you engaged with previously. This misses the point often, as more and more people use Twitter as a sort of social note taking tool.

I think this is one area where general purpose search engines are failing users currently. We should be able to tell Google "that one tweet from yesterday" and have it return something meaningful to us. Of course I understand that this is hard and becoming harder as social media companies hold on to their walled gardens, but clearly (to me anyway) the status quo is to the detriment of the user.


Doesn’t that suggest a market need for another search engine for this purpose?


Meta-search? Searching the search engine..I like it.


Metasearch engines are a thing: https://en.wikipedia.org/wiki/Metasearch_engine

Here's a nice one with many public instances you can try: https://github.com/asciimoo/searx


use the "before:YYYY-MM-dd" search parameter


... which you can combine with after:YYYY-MM-dd


It's not impossible. It's merely a matter of resources and will. Google not only has to store all that video they also need to transmit it over and over again. An archivist would only have to write to store the video. If humanity makes it to the year 3000 we're going to be pretty disappointed that all that video was lost due to a lack of foresight.


I bet the number of times the typical youtube upload is watched is zero. I think YouTube itself already fills the archival, non-transmitting role here.


Average views per video is around 10k. I don't know what the mode (most frequent number of views) is, but I'm almost certain it's not zero, since the uploader checking if the video uploaded correctly already counts as a view.

Anyway, for archival purposes you don't need to save the plethora of formats Youtube keeps at hand for each video, you just need one size in one codec, and it probably doesn't have to be 4K or 8K, which probably reduces the storage requirements by one order of magnitude or so.


I'm the highly downvoted parent that said this is possible.

It's only 45 years of video per day. That is very, very possible to save. That's roughly 600 terabytes of storage space a day. Which means that per month you're generating a long term storage bill that is about $18k more per month.

So after a decade of operation your monthly bill is about $2.1m which is far, far, far cheaper than what we spend on plenty of things. Like I said above: This is not impossible. In fact, I bet the NSA is probably already doing it. At least for public videos. This truly is peanuts.


Is the average as relevant as some other statistical measure that I'm not knowledgeable enough to know about? For example if the vast majority of views are on 10% of the videos, the other 90% could be stored on less expensive hardware that has less bandwidth maybe?

edit: for example the average net worth of an American is 76,000$ but toss a couple billionaires in the mix, whose net worth is larger by such an enormous factor, that the "average" is misleading


You’re thinking of the median (the value that ends up in the middle after sorting), which is indeed more useful in such cases.

(Of course, boiling a wide distribution of values down to a single value, no matter by which process, is fraught with problems.)


Well, to show the video to the person who uploaded it you don't need to reencode it or push it to the edge or any of that. And I see no reason to believe that most videos are even watched by their creators.


We get time travel by 2034 so no issues - https://en.wikipedia.org/wiki/John_Titor




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: