Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building a custom code search index in Go (boyter.org)
118 points by boyter on Nov 23, 2022 | hide | past | favorite | 18 comments


Very cool to see this here, Ben! It was fun hearing the ins and outs of your work on this in the TZ discord, and the final result is fast.

Also, off-topic but as you know, I recently tried out your scc tool and am eagerly awaiting its support for Elixir templates (.eex, .heex)! You said it was a day from done a while back and would go out in the next release. What’s the release schedule like?

https://github.com/boyter/scc


It’s actually sitting on my hdd. Just need to finish it off. I got sidetracked.


Wow the search is screamingly fast. And it's custom made! I enjoyed the writing. Thanks for taking the time to condense the knowledge involved in words.


I really did try to make it as fast as I could. Always happy to write it down too.


See also https://codesearch.debian.net/ - https://github.com/Debian/dcs for a similar project that may fit your needs better. I've not compared them both, but I use dcs frequently


The blog posts about it are great too. This one https://michael.stapelberg.ch/posts/2019-09-29-dcs-positiona... in particular.


Assuming this is different from: https://github.com/boyter/searchcode-server which is in Java.


Yes. Very different.


Congratulations to the author. It seems like they have excelled at a very fast pace from using third party solutions to building their own in a short time. I look forward to their progress and seeing where this goes, to maybe becoming an excellent open source copilot alternative.


I don't know about a fast pace, but I did have fun with it!


Great read, but a basic search yields zero results:

https://searchcode.com/?q=kong.New+lang%3Ago

Same search on GitHub:

https://github.com/search?type=code&q=kong.New

Edit: there also doesn't seem to be any ranking at all, such as exact word matches being boosted


Probably because I don’t prioritise GitHub anymore. Their own search is great, but it might get picked u[ eventually.

There is ranking, first by a pre rank popularity of the repository and secondly by tf/idf of the trigrams. It’s weighted towards longer matches in the display as well.

But let me know where it’s not doing what you expect and I’ll fix it.


> But let me know where it’s not doing what you expect and I’ll fix it.

I would expect > 0 results for the above search

Same search on SourceGraph: https://sourcegraph.com/search?q=context:global+kong.New&pat...


Ah I see a vanity search! The ones that always cause issues :)

Its just down to it not being in the index, I shall ensure I add it just for you based on this.

Done. That repository https://github.com/alecthomas/kong will get picked up when I kick off the indexing again (sometime next week once all the activity dies down)


What a patronising response to a report of 500+ results missing from your index :) How will you improve your product if you can't accept feedback?

Edit: typo


Hah!

I actually checked, it was aware of your repository and has been tracking it for a while. Just it didn't get indexed due to me de-prioritizing github. If I can scrounge some more RAM back I will try to expand the index yet again.

In any case I manually updated your github star score, so it should pop to the top when I do get around to starting things again. Probably after re:invent.


Is fitting the index in ram really that important? Obviously it is fast, but if you can get away with storing it on a fast disk like an nvme gen4 then why not?


Extremely important, search indexes are cache optimized. Live update indexes even more so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: