Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, and another reason for the small model size and the novelty of the underlying paper [1], is that the diffusion model is not acting on the pixel space but rather on a latent space. This means that this 'latent diffusion model' does not only learn the task at hand (image synthesis) but in parallel also a powerful lossy compression model via an outer auto encoder structure. Now, the number of weights (model size) can be reduced drastically as the inner neural network layers act on a lower dimensional latent space rather than a high dimensional pixel space. It's fascinating because it shows that deep learning at its core comes down to compression/decompression (encoding/decoding), with close relation to Shannon's Information Theory (e.g. source coding/channel coding/data processing inequality).

[1] https://arxiv.org/abs/2112.10752



Oh, wow. Now that you mention how it's similar to lossy (if not the same as) compression it all makes a LOT of sense. This is great. I teach IT and I already do a bit on how lossy compression works, (e.g. hey, if you see a blue pixel and then another slightly darker one next to it, what's the NEXT likely to be?) and this is something of an extension of that.


Correction: the auto encoder is pre-trained :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: