Cool ideas in Deep Learning and where to find more about them

I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary of some hot ideas and then provide a link to some other page where someone describes it way better than me.

The Lottery Ticket Hypothesis

This idea has to do with pruning a model, which is when you remove a parts of your model to make it more computationally efficient while barely loosing accuracy. The lottery ticket hypothesis also has to do with how weight are initialized in neural networks and why larger models often achieve better performance.

Anyways, the hypothesis says the following: “Dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations.” In their analogy, the random initialization of a models weights is treated like a lottery, where some combination of a subset of these weight is already pretty close to the network you want to train (winning ticket). For a better description and a summary of advances in this field I would recommend this blog post.

SAM: Sharpness aware minimization

The key idea here has to do with finding the best optimizer to train a model capable of generalization. According to this paper, a model that has converged to a sharp minima will be less likely to generalize than one that has converged to a flatter minima. They show the following plot to provide an intuition of why this may be the case.

In the SAM paper (and ASAM for adaptive) the authors implement an optimizer that is more likely to converge to a flat minima. I found this blog post by the authors of ASAM gives a very good description of the field.

DALL-E 2

There are some amazing memes being generated by DALL-E mini, but its older brother DALL-E 2 has been shown to be capable of generating some beautiful images. I would break the magic of DALL-E 2 into two main ideas: CLIP and denoising diffusion.

CLIP (Contrastive Language–Image Pre-training) is a training method in which you task the model with finding the best fitting caption from a limited set of captions for a given image. In theory, to achieve this, the model has to learn to recognize visual concepts and associate them with their names. For a great description of CLIP check out this blog by OpenAI.

Denoising diffusion is a unsupervised way of training generative models that basically consists on adding increasing amounts of random noise to your data and then asking your model to denoise it. Once the added noise is large enough, you have a model capable of generating new data points by sampling from noise. For a nice intro into denoising diffusion, I would recommend this blog post. And finally, for a proper description of how DALL-E 2 works I would point you to this nicely written blog post or their paper.

So there you go!! A bunch of links to deep learning blogs written by people that can write way better than me!! If you have read this far I hope this was not a complete waste of your time.

Author