Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice project, I wanted to build meme search engine myself one day, but figured Tesseract will fail at most of the memes because of how stylized those have become. So I tuned down my meme source to only /r/bertstrips as those contain sane looking text and it's working quite alright - project has no frontend yet, I search from cli and click links.

> Initial testing with the Postgres Full Text Search indexing functionality proved unusably slow at the scale of anything over a million images, even when allocated the appropriate hardware resources.

I can guarantee you that correctly setup PostgreSQL text search will be faster than ES with much, much less hardware resources needed, it's just a matter of correctly creating tsvector column and creating GIN index on it (and ofc asking right queries so it's actually used). I can help you out setting postgres schema up and debugging queries if you are interested, for testing purposes at least.



I recently worked on a project using lnx.rs. Simple to setup and use and fast at the scale I was using it. Built on Tantivy with a custom fast fuzzy search feature.

If you want to go beyond meme sites and possibly detect memes in the wild, common crawl might be something to start with.


One issue I've had with postgres full text search is when you want to rank using ts_rank you end up with a full table scan.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: