Django for scientific applications

In my current work I am developing a cheminformatics tool using structural and activity data to investigate protein-ligand binding. I have only ever properly used love python and I listen to Saulo, so I decided to used Django to develop my application. I didn’t understand what it was and why it might be useful before I started using it but below I thought I’d discuss a few of the features that I think have been useful and might encourage others to use it.

Firstly I will outline how Django works. I wanted to download all the PDB structures for CDK2 and store the information in a data structure that is robust and easily used. We have a Target and a Protein. A Target is associated to a particular UniProt accession. Cyclin-dependent kinase 2 (CDK2) is a Target. A Protein is a set of 3D coordinates, so 1AQ1 is a Protein.

class Target(models.Model):
"""A Django model to define a given protein target"""
    UniProt = models.CharField(max_length=20,unique=True)
    InitDate = models.DateTimeField(auto_now_add=True)
    Title = models.CharField(max_length=10)

In the above Target model I have three different fields. The first field denotes the UniProt accession for the Target and is “unique”. This means that only one Target can have any given UniProt accession in my data structure. If I try to add another with the same value in the UniProt field it will throw an exception. The second field denotes the time and date that the model was created. This means I can check back to when the target was created. The third is the Title I would like to use for this, for example CDK2.

I can then make a new Target objects by:

new_target = Target()
new_target.Title = "CDK2"
new_target.UniProt = "P24941"

and save it to the database by:

new_target.save() # Django takes care of the required SQL

The next model is for the Protein molecules:

class Protein(models.Model):
    """A Django model to define a given protein"""
    Code = models.CharField(max_length=6,unique=True)
    InitDate = models.DateTimeField(auto_now_add=True)
    TargetID = models.ForeignKey(Target)
    Apo = models.BoolenField()
    PDBInfo = models.FileField(upload_to='pdb')

The model contains the PDB Code, e.g. 1AQ1, and the date it was added to the database. It also consists of a foreign key, relating it to its Target and a boolean indicating if the structure is apo or holo. Finally there is a file field relating this entry to the appropriate file path where the PDB information is stored.

Once the data has been added to the database, Django then deals with all SQL queries from the database:

my_prot = Protein.objects.get(Code="1aq1") # Gives me the Protein object "1aq1"
CDK2_prots = Protein.objects.filter(TargetID__Title="CDK2") # All PDB entries associated to CDK2, as a query set, behaving similarily to a list
CDK2_list = [x for x in CDK2_prots] # Now exactly like a list

The “__” in the above query allows one to span the foreign key relationship, so it is searching for the Title of the Target not the Title of the Protein. Finally I can then access the PDB files for each of these proteins.

my_prot = Protein.objects.get(Code="1aq1") # Gives me the Protein object "1aq1"
print my_prot.Code # prints "1aq1"
# my_prot.PDBInfo has the behaviour of a file handle
pdb_lines = my_prot.PDBInfo.readlines()# Reads the lines of the file

There, you’ve made a queryable database, where Django deals with all the hard stuff and everything is native to python. Obviously in this example it might not be so difficult to imagine alternative ways of creating the same thing using directory structures, but as the structure of your data becomes more complex, Django can be easily manipulated and as it grow it utilises the speed advantages of modern databases.

Author