the only thing I can say to this is that Apple have seemed laser focused on tuning their silicon for ML crunching, that that focus is clearly now going to be amped up further still, and that in tandem the software itself will be tuned to Apple silicon.
GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.
Apple's new M2 Max has a neural engine which can do 15 trillion flops. Nvidias's A100 chip (released almost 3 years ago) can do 315 trillion flops. Apple is not going to close this 20x gap in a few years.
FTFY, remember it takes 8 of those to even load the thing. And when the average laptop has that much compute, GPT 4 will seem like Cleverbot in comparison to the state of the art.
I think the tuning the models to the hardware piece is important, and of course there is much more incentive to do this for Apple than nvidia because of the distribution and ecosystem advantages Apple have.
But also, I don't know... let's see what the curve looks like! It's only been a couple of years of these neural engines. Let's see how many flops M3 can hit this year. And then m4 the next. Again, 5 years is a long time actually when real improvement is happening. I am optimistic.
That doesn't sound likely with the current architectures. There may be some kind of specialisation, but NN is like the chip design nightmare. We can't do chips that that many crossed lines. It's going to have to keep the storage+execution engine pattern unless we have done breakthroughs.
Well, we'll see what the future manufacturing brings, but right now we're not even at thousands of layers (as far as I know... please link if there's been more), and we'd need to be in hundreds of thousands range. Given the rate of defects also adding up and the need for some way to dissipate the heat... (almost all of that chip will be engaged while running - no chance for balancing power between systems) Yeah, still lots of challenges there.
(I'm assuming the original comment meant literally putting the network as is in the purpose designed chip)
The M2 and the 4090 are both very general purpose. In fact, the 4090 allocates proportionally more silicon area to the tensor cores than Apple allocates to the neural engine.
The M series is basically the only "big" SoC with a functional, flexible NPU and big GPU right now, which is why it seems so good at ML. But you can bet actual ML focused designs are in the pipe.
I don't think so. M chips just happen to have a really good memory subsystem and good simd performance through accelerate, so the CPU performance is pretty good.
Some stable diffusion implementations can use the NPU or GPU, or (experimentally and unsucessfully) both.
Curious, why do you think that? My knowledge is limited to marketing material and my M2 vs my 3090, and my conclusion so far would be that’s in every hardware makers marketing claims the past couple years.
GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.