Nice project, I wanted to build meme search engine myself one day, but figured Tesseract will fail at most of the memes because of how stylized those have become. So I tuned down my meme source to only /r/bertstrips as those contain sane looking text and it's working quite alright - project has no frontend yet, I search from cli and click links.
> Initial testing with the Postgres Full Text Search indexing functionality proved unusably slow at the scale of anything over a million images, even when allocated the appropriate hardware resources.
I can guarantee you that correctly setup PostgreSQL text search will be faster than ES with much, much less hardware resources needed, it's just a matter of correctly creating tsvector column and creating GIN index on it (and ofc asking right queries so it's actually used). I can help you out setting postgres schema up and debugging queries if you are interested, for testing purposes at least.
I recently worked on a project using lnx.rs. Simple to setup and use and fast at the scale I was using it. Built on Tantivy with a custom fast fuzzy search feature.
If you want to go beyond meme sites and possibly detect memes in the wild, common crawl might be something to start with.
> Initial testing with the Postgres Full Text Search indexing functionality proved unusably slow at the scale of anything over a million images, even when allocated the appropriate hardware resources.
I can guarantee you that correctly setup PostgreSQL text search will be faster than ES with much, much less hardware resources needed, it's just a matter of correctly creating tsvector column and creating GIN index on it (and ofc asking right queries so it's actually used). I can help you out setting postgres schema up and debugging queries if you are interested, for testing purposes at least.