{"id":8417,"date":"2022-07-18T15:30:35","date_gmt":"2022-07-18T14:30:35","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=8417"},"modified":"2022-08-17T10:53:11","modified_gmt":"2022-08-17T09:53:11","slug":"cool-ideas-in-deep-learning-and-where-to-find-more-about-them","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2022\/07\/cool-ideas-in-deep-learning-and-where-to-find-more-about-them\/","title":{"rendered":"Cool ideas in Deep Learning and where to find more about them"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary of some hot ideas and then provide a link to some other page where someone describes it way better than me. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"> The Lottery Ticket Hypothesis<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This idea has to do with <a href=\"https:\/\/towardsdatascience.com\/pruning-deep-neural-network-56cae1ec5505\">pruning a model<\/a>, which is when you remove a parts of your model to make it more computationally efficient while barely loosing accuracy. The lottery ticket hypothesis also has to do with <a href=\"https:\/\/www.deeplearning.ai\/ai-notes\/initialization\/\">how weight are initialized<\/a> in neural networks and why <a href=\"https:\/\/www.blopig.com\/blog\/2021\/04\/is-bigger-better\/\">larger models often achieve better performance<\/a>. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Anyways, the <a href=\"https:\/\/arxiv.org\/abs\/1803.03635\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/abs\/1803.03635\">hypothesis<\/a> says the following: &#8220;Dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that\u2014when trained in isolation\u2014reach test accuracy comparable to the original network in a similar number of iterations.&#8221; In their analogy, the random initialization of a models weights is treated like a lottery, where some combination of a subset of these weight is already pretty close to the network you want to train (winning ticket). For a better description and a summary of advances in this field <a href=\"https:\/\/roberttlange.github.io\/posts\/2020\/06\/lottery-ticket-hypothesis\/#the-lottery-ticket-hypothesis-how-to-scale-it-blackjoker\">I would recommend this blog post<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SAM: Sharpness aware minimization<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The key idea here has to do with <a href=\"https:\/\/github.com\/jettify\/pytorch-optimizer\">finding the best optimizer<\/a> to train a model capable of generalization. According to <a href=\"https:\/\/arxiv.org\/pdf\/1609.04836.pdf\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/pdf\/1609.04836.pdf\">this paper<\/a>, a model that has converged to a sharp minima will be less likely to generalize than one that has converged to a flatter minima. They show the following plot to provide an intuition of why this may be the case. <\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/cdn.codeground.org\/nsr\/images\/img\/researchareas\/ai-article10_04.png?resize=625%2C263&#038;ssl=1\" alt=\"\" width=\"625\" height=\"263\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In the <a href=\"https:\/\/arxiv.org\/pdf\/2010.01412.pdf\">SAM paper<\/a> (and <a href=\"https:\/\/arxiv.org\/pdf\/2102.11600.pdf\">ASAM for adaptive<\/a>) the authors implement an optimizer that is more likely to converge to a flat minima. I found <a href=\"https:\/\/research.samsung.com\/blog\/ASAM-Adaptive-Sharpness-Aware-Minimization-for-Scale-Invariant-Learning-of-Deep-Neural-Networks\">this blog post by the authors of ASAM <\/a>gives a very good description of the field.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">DALL-E 2<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There are some amazing memes being generated by <a href=\"https:\/\/www.craiyon.com\/\">DALL-E mini,<\/a> but its older brother <a href=\"https:\/\/openai.com\/dall-e-2\/\">DALL-E 2 has been shown to be capable of generating some beautiful images<\/a>. I would break the magic of DALL-E 2 into two main ideas: <a href=\"https:\/\/openai.com\/blog\/clip\/\">CLIP<\/a> and <a href=\"https:\/\/hojonathanho.github.io\/diffusion\/\">denoising diffusion<\/a>.  <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">CLIP (<em>Contrastive Language\u2013Image Pre-training<\/em>) is a training method in which you task the model with finding the best fitting caption from a limited set of captions for a given image. In theory, to achieve this, the model has to learn to recognize visual concepts and associate them with their names. For a <a href=\"https:\/\/openai.com\/blog\/clip\/\">great description of CLIP check out this blog by OpenAI<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Denoising diffusion is a unsupervised way of training generative models that basically consists on adding increasing amounts of random noise to your data and then asking your model to denoise it. Once the added noise is large enough, you have a model capable of generating new data points by sampling from noise. For a nice intro into denoising diffusion, I would recommend <a href=\"https:\/\/medium.com\/graphcore\/a-new-sota-for-generative-modelling-denoising-diffusion-probabilistic-models-8e21eec6792e\">this blog post<\/a>. And finally, for a proper description of how DALL-E 2 works I would point you to <a href=\"https:\/\/medium.com\/augmented-startups\/how-does-dall-e-2-work-e6d492a2667f\">this nicely written blog post<\/a> or <a href=\"https:\/\/arxiv.org\/abs\/2204.06125\">their paper<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So there you go!! A bunch of links to deep learning blogs written by people that can write way better than me!! If you have read this far I hope this was not a complete waste of your time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary [&hellip;]<\/p>\n","protected":false},"author":71,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[632,189],"tags":[],"ppma_author":[552],"class_list":["post-8417","post","type-post","status-publish","format-standard","hentry","category-deep-learning","category-machine-learning"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":552,"user_id":71,"is_guest":0,"slug":"brennan","display_name":"Brennan Abanades Kenyon","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/5c85dcbb5b1499e82ecfc264ec387c8302ac238c786e68cc5c92e9c21904d260?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Abanades Kenyon","first_name":"Brennan","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8417","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/71"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=8417"}],"version-history":[{"count":2,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8417\/revisions"}],"predecessor-version":[{"id":8431,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8417\/revisions\/8431"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=8417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=8417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=8417"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=8417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}