{"id":5155,"date":"2019-10-10T23:09:08","date_gmt":"2019-10-10T22:09:08","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=5155"},"modified":"2019-10-11T00:46:39","modified_gmt":"2019-10-10T23:46:39","slug":"a-few-more-reasons-why-unix-is-awesome","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2019\/10\/a-few-more-reasons-why-unix-is-awesome\/","title":{"rendered":"A few more reasons why UNIX is awesome"},"content":{"rendered":"\n<p>One could easily find dozens of reasons for which UNIX &#8212; mainly Ubuntu &#8212; is simply, the best operating system. Although I remember people in my proximity mentioning this for ages, it&#8217;s been only a few months that I&#8217;ve realized what are the true advantages. Helpful for this were all the people teaching\/demonstrating in various modules during my first year in SABS\/DTC: quite often we would be asked to do something in the console rather than by clicking the mouse. In the meanwhile, I&#8217;d wonder why using the console can be better from a nice, user-friendly GUI (i.e. Windows\u2026). Tools like <em>sed<\/em>, <em>grep,<\/em> <em>tar<\/em> and of course <em>alias-<\/em>ing form a quick answer. I will not argue more about these but demonstrate two more tools\/tricks.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h4 class=\"wp-block-heading\">AWK: an ultra-fast and simple tool for manipulating data <\/h4>\n\n\n\n<p>AWK is a utility for processing text-data either from files or streams, with a minimum amount of instructions. In brief, you can quickly parse a document, search for specific values, re-arrange or replace elements or calculate statistics. No matter how big your data are and, most importantly, without having to write long scripts where you would normally load libraries etc. You can also include it in pipes with other UNIX tools, or write a script and execute it as with common languages.<\/p>\n\n\n\n<p>I&#8217;m not aiming in giving a tutorial about awk &#8211; there are plenty of websites available doing that &#8211; but just a few tips to advertise its usefulness. Every statement has usually the form <code>awk (how to load) '{what to do}' (how to output)<\/code>. Some important keywords are <code>-F<\/code> for the field separator, <code>OFS<\/code> for output field separator, <code>-v<\/code> for passing a variable,  and <code>for<\/code>\/<code>if<\/code> structures of course. Columns\/fields are indicated as <code>$1<\/code>,<code>$2<\/code> etc, with <code>$0<\/code> referring to a complete row, and awk parses each file line by line. Let&#8217;s assume we need to work with a document like the following, saved as <em>toy.csv<\/em>, which includes some types of values in the first two fields followed by some IDs, with an arbitrarily large number of rows.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Value1,Value2,Comp-ID,Targ-ID\n382.91,163804,CHEMBL317956,CHEMBL236\n167.84,166666,CHEMBL99895,CHEMBL1804\n178.39,167742,CHEMBL104951,CHEMBL204\n........<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">awk -F, 'NR>1{print $3,$4,$1} NR==50{exit}' OFS='\\t' toy.csv > new.tab<\/pre>\n\n\n\n<p>Hitting the above command will make awk treat commas as the field separators, skip the header, print only the 3 out of 4 columns, in a new order,&nbsp; stop at the fiftieth line, use tabs as new field separators, and save that in <em>new.tab<\/em>.&nbsp; Now, if we run the following,<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">awk 'END{print \"Fields = \" NF \" and rows = \" NR}' new.tab<\/pre>\n\n\n\n<p>we&#8217;ll get <code>Fields = 3 and rows = 49<\/code>, since that&#8217;s how we created the new file. What if we omit the <em>END<\/em> keyword? It would print the number fields per line followed by&nbsp;the line-index. Notice the simplicity on how we can print stuff. Now let&#8217;s calculate an average:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">cat new.tab | awk '{sum+=$3} END{print \"mean = \" sum\/NR}'<\/pre>\n\n\n\n<p>Starting at 0 (by default) <em>sum<\/em> is incremented by the values of  the third field; I just used a different way of feeding awk. When we  reach the final line, thus <em>NR<\/em>=number of elements, we ask awk to  print the ratio\/result. Remember that it skips any checking so it\u2019s our  responsibility to ensure there are no strings or nans (usually regarded  as zeros) among the values to be added. As one last example, what if we  need to get those compounds which correspond to values larger than <em>thr<\/em> ? The next will do the job.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> awk -v thr=$thr '($3>thr) {print $1}' new.tab<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Latex and massive plot production<\/h4>\n\n\n\n<p>As a next reason showing the awesomeness of UNIX, let&#8217;s turn our  attention on latex and massively-printing figures. Assume we need to run  <em>myscript.py<\/em> for six parametrisations and six set-ups and get  six^2 of figures (replace six with something more realistic&#8230;). How  could we easily put all of them in the same page? Firstly we&#8217;re gonna  need a latex template like the following, assuming that it&#8217;s saved as <em>latex-template.tex<\/em>:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\\documentclass{article}\n\\usepackage[top=2cm,bottom=2cm]{geometry}\n\\usepackage{graphicx}\n\\usepackage{subfigure}\n\\usepackage{caption}\n\\begin{document}\n    \\begin{figure}\n    \\centering\n        \\subfigure[Figure1]{\\includegraphics[width=7cm]{REPLACE-Figure1.png}}\n        \\subfigure[Figure2]{\\includegraphics[width=7cm]{REPLACE-Figure2.png}}\n        \\vskip3ex\n        \\subfigure[Figure3]{\\includegraphics[width=7cm]{REPLACE-Figure3.png}}\n        \\subfigure[Figure4]{\\includegraphics[width=7cm]{REPLACE-Figure4.png}}\n        \\vskip3ex\n        \\subfigure[Figure5]{\\includegraphics[width=7cm]{REPLACE-Figure5.png}}\n        \\subfigure[Figure6]{\\includegraphics[width=7cm]{REPLACE-Figure6.png}}\n    \\caption{Performance of REPLACE on all six Figures.}\n    \\end{figure}\n\\end{document}<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">for x in {..params..}\ndo\n   python myscript.py $x \n    #assume this produces some plots for param $x \n    # which are named as $x-Figure1.png, ..., $x-Figure6.png\n   sed -e 's\/REPLACE\/'$x'\/g' latex-template.tex > latex-$x.tex;\n   pdflatex latex-$x.tex; rm -rf *.log *.aux;\ndone<\/pre>\n\n\n\n<p>Provided  that names and indexing are correct, the above snippet will produce a set  of pdf files, each containing the six figures corresponding to each  parametrisation. Something to keep in mind is that latex is struggling  with filenames that contain many dots and that&#8217;s why I&#8217;m using dashes.  Apparently, we&#8217;d need some time to initialise the template accordingly  but I believe such a pipeline can save a lot of time when someone needs  to repeat an experiment for different algorithms, data sets or metrics,  thus having to deal with dozens of &#8220;similar&#8221; plots.<\/p>\n\n\n\n<p>Thank you for your time reading this. I hope my examples were clear and useful!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One could easily find dozens of reasons for which UNIX &#8212; mainly Ubuntu &#8212; is simply, the best operating system. Although I remember people in my proximity mentioning this for ages, it&#8217;s been only a few months that I&#8217;ve realized what are the true advantages. Helpful for this were all the people teaching\/demonstrating in various [&hellip;]<\/p>\n","protected":false},"author":66,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[14,15],"tags":[],"ppma_author":[544],"class_list":["post-5155","post","type-post","status-publish","format-standard","hentry","category-howto","category-technical"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":544,"user_id":66,"is_guest":0,"slug":"georgios","display_name":"Yiorgos Kalantzis","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/7eed96972545785e197ac47b11868a683ba72e6397f61dbfc095f59c8b2b77d7?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/5155","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/66"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=5155"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/5155\/revisions"}],"predecessor-version":[{"id":5168,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/5155\/revisions\/5168"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=5155"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=5155"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=5155"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=5155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}