{"id":1089,"date":"2013-09-06T17:03:43","date_gmt":"2013-09-06T16:03:43","guid":{"rendered":"http:\/\/www.blopig.com\/blog\/?p=1089"},"modified":"2013-09-09T17:44:14","modified_gmt":"2013-09-09T16:44:14","slug":"django-for-scientific-applications","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2013\/09\/django-for-scientific-applications\/","title":{"rendered":"Django for scientific applications"},"content":{"rendered":"<p>In my current work I am developing a cheminformatics tool using structural and activity data to investigate protein-ligand binding. I <del datetime=\"2013-09-02T16:26:00+00:00\">have only ever properly used<\/del> love python and I listen to Saulo, so I decided to used <a href=\"https:\/\/www.djangoproject.com\/\">Django<\/a> to develop my application. I didn&#8217;t understand what it was and why it might be useful before I started using it but below I thought I&#8217;d discuss a few of the features that I think have been useful and might encourage others to use it.<\/p>\n<p>Firstly I will outline how Django works. I wanted to download all the PDB structures for CDK2 and store the information in a data structure that is robust and easily used. We have a Target and a Protein. A Target is associated to a particular UniProt accession. Cyclin-dependent kinase 2 (CDK2) is a Target. A Protein is a set of 3D coordinates, so 1AQ1 is a Protein.<br \/>\n<code><br \/>\n<\/code><\/p>\n<pre class=\"lang:python decode:true\" title=\"example\">class Target(models.Model):\r\n\"\"\"A Django model to define a given protein target\"\"\"\r\n    UniProt = models.CharField(max_length=20,unique=True)\r\n    InitDate = models.DateTimeField(auto_now_add=True)\r\n    Title = models.CharField(max_length=10)<\/pre>\n<p>In the above Target model I have three different fields. The first field denotes the UniProt accession for the Target and is &#8220;unique&#8221;. This means that only one Target can have any given UniProt accession in my data structure. If I try to add another with the same value in the UniProt field it will throw an exception. The second field denotes the time and date that the model was created. This means I can check back to when the target was created. The third is the Title I would like to use for this, for example CDK2.<\/p>\n<p>I can then make a new Target objects by:<code><br \/>\n<\/code><\/p>\n<pre class=\"lang:python decode:true\">new_target = Target()\r\nnew_target.Title = \"CDK2\"\r\nnew_target.UniProt = \"P24941\"<\/pre>\n<p>and save it to the database by:<\/p>\n<pre class=\"lang:python decode:true\">new_target.save() # Django takes care of the required SQL<\/pre>\n<p>The next model is for the Protein molecules:<\/p>\n<pre class=\"lang:python decode:true \">class Protein(models.Model):\r\n\u00a0 \u00a0 \"\"\"A Django model to define a given protein\"\"\"\r\n\u00a0 \u00a0 Code = models.CharField(max_length=6,unique=True)\r\n\u00a0 \u00a0 InitDate = models.DateTimeField(auto_now_add=True)\r\n\u00a0 \u00a0 TargetID = models.ForeignKey(Target)\r\n\u00a0 \u00a0 Apo = models.BoolenField()\r\n\u00a0 \u00a0 PDBInfo = models.FileField(upload_to='pdb')<\/pre>\n<p>The model contains the PDB Code, e.g. 1AQ1, and the date it was added to the database. It also consists of a foreign key, relating it to its Target and a boolean indicating if the structure is apo or holo. Finally there is a file field relating this entry to the appropriate file path where the PDB information is stored.<\/p>\n<p>Once the data has been added to the database, Django then deals with all SQL queries from the database:<code><br \/>\n<\/code><\/p>\n<pre class=\"lang:python decode:true\">my_prot = Protein.objects.get(Code=\"1aq1\") # Gives me the Protein object \"1aq1\"\r\nCDK2_prots = Protein.objects.filter(TargetID__Title=\"CDK2\") # All PDB entries associated to CDK2, as a query set, behaving similarily to a list\r\nCDK2_list = [x for x in CDK2_prots] # Now exactly like a list<\/pre>\n<p>The &#8220;__&#8221; in the above query allows one to span the foreign key relationship, so it is searching for the Title of the Target not the Title of the Protein. Finally I can then access the PDB files for each of these proteins.<\/p>\n<pre class=\"lang:python decode:true \">my_prot = Protein.objects.get(Code=\"1aq1\") # Gives me the Protein object \"1aq1\"\r\nprint my_prot.Code # prints \"1aq1\"\r\n# my_prot.PDBInfo has the behaviour of a file handle\r\npdb_lines = my_prot.PDBInfo.readlines()# Reads the lines of the file<\/pre>\n<p><span style=\"line-height: 1.714285714; font-size: 1rem;\">There, you&#8217;ve made a queryable database, where Django deals with all the hard stuff and everything is native to python. Obviously in this example it might not be so difficult to imagine alternative ways of creating the same thing using directory structures, but as the structure of your data becomes more complex, Django can be easily manipulated and as it grow it utilises the speed advantages of modern databases.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my current work I am developing a cheminformatics tool using structural and activity data to investigate protein-ligand binding. I have only ever properly used love python and I listen to Saulo, so I decided to used Django to develop my application. I didn&#8217;t understand what it was and why it might be useful before [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[29,10,14],"tags":[],"ppma_author":[508],"class_list":["post-1089","post","type-post","status-publish","format-standard","hentry","category-code","category-groupmeetings","category-howto"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":508,"user_id":14,"is_guest":0,"slug":"anthony","display_name":"Anthony Bradley","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/6ebe5584e1527e5886cfbda3ee205d94b11936a2763a3adeda6c06a2702cc2ea?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1089","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=1089"}],"version-history":[{"count":34,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1089\/revisions"}],"predecessor-version":[{"id":1153,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1089\/revisions\/1153"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=1089"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=1089"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=1089"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1089"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}