Since it seems A100s top out at 80GB, and appear to start at $10,000 I'd say it's a steal
Yes, I'm acutely aware that bandwidth matters, but my mental model is the rest of that sentence is "up to a point," since those "self hosted LLM" threads are filled to the brim with people measuring tokens-per-minute or even running inference on CPU
I'm not hardware adjacent enough to try such a stunt, but there was also recently a submission of a BSD-3-Clause implementation of Google's TPU <https://news.ycombinator.com/item?id=44111452>
Since it seems A100s top out at 80GB, and appear to start at $10,000 I'd say it's a steal
Yes, I'm acutely aware that bandwidth matters, but my mental model is the rest of that sentence is "up to a point," since those "self hosted LLM" threads are filled to the brim with people measuring tokens-per-minute or even running inference on CPU
I'm not hardware adjacent enough to try such a stunt, but there was also recently a submission of a BSD-3-Clause implementation of Google's TPU <https://news.ycombinator.com/item?id=44111452>