Tag Archives: Python

Deploying a Flask app part I: the gunicorn WSGI server

Last year I wrote a post about deploying Flask apps with Apache/mod_wsgi when your app’s dependencies are installed in a conda environment. The year before, in the dark times, I wrote a post about the black magic invocations required to get multiple apps running stably using mod_wsgi. I’ve since moved away from mod_wsgi entirely and switched to running Flask apps from containers using the gunicorn WSGI server behind an Apache reverse proxy, which has made life immeasurably easier. In this post we’ll cover running a Flask app on localhost using gunicorn; in Part II we’ll run our app as a service using Singularity and deploy it to production using Apache as a HTTP proxy server.

Continue reading →

AI Can’t Believe It’s Not Butter

Recently, I’ve been using a Convolutional Neural Network (CNN), and other methods, to predict the binding affinity of antibodies from their sequence. However, nine months ago, I applied a CNN to a far more important task – distinguishing images of butter from margarine. Please check out the GitHub link below to learn moo-re.

https://github.com/lewis-chinery/AI_cant_believe_its_not_butter

“The Rise of ChatGPT 4.0: Is the Future of Work in Jeopardy?”

In my previous blog post, I explored the capabilities of ChatGPT 3.5, testing its skills as a programmer and mathematician’s assistant. The results were mixed, to say the least. While it could handle simple coding tasks with ease, it faltered when faced with more complex mathematical problems and image manipulation tasks. I concluded that while ChatGPT 3.5 was impressive, it was far from replacing us in our jobs. It was a tool, yes, but one that needed careful supervision and frequent corrections.

Fast forward to today, and we have ChatGPT 4.0 with code interpreter enabled. I approached this new version with the same scepticism I had for its predecessor, but I must admit, my initial impressions have been challenged. The leap from 3.5 to 4.0 is not just a step forward; it’s a giant leap. You can see my experiment here https://chat.openai.com/share/0099a12f-e577-44bd-aa10-7c7508f15f19.

In my previous experiment, I asked ChatGPT 3.5 to translate a 2D image, a task that combines both programming and a bit of maths. The results were less than satisfactory. It was able to write code to translate images if they were in real space, a task that can be easily found searching on the web. However, it struggled to handle images represented in Fourier space correctly, and even with multiple hints, the best it could do was a rough approximation of the correct solution. It took six attempts to get the code automatically generated, and even then, it was far from perfect.

With ChatGPT 4.0, however, the story is different. When asked to perform a similar task, it provided a clear, concise, and correct solution. It understood the concept of image translation in the context of computer vision and image processing, and it was able to generate Python code using the scikit-image package to perform the task. The code was not only syntactically correct but also logically sound, demonstrating a significant improvement in understanding and problem-solving capabilities.

"""
#Me: Can you know provide some python code to translate (pixel shifts) images? And can you execute it?
#Chat gpt: Here's a simple Python example using scikit-image to translate an image:
"""

import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import AffineTransform, warp

# Load an example image
image = data.camera()

# Define a translation transform
tform = AffineTransform(translation=(50, 50))

# Apply the transform to the image
translated_image = warp(image, tform.inverse)

# Plot the original and translated images
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image, cmap='gray')
ax[0].set_title('Original Image')
ax[1].imshow(translated_image, cmap='gray')
ax[1].set_title('Translated Image')
plt.show()

"""
#Chat gpt: 
In this code, AffineTransform is used to define a translation of 50 pixels to the right and 50 pixels down, and warp is used to apply this transform to the image. The result is an image that has been shifted.
However, I must clarify that the current environment doesn't support displaying plots, so the matplotlib part of the code won't work here. You should be able to run it in a Jupyter notebook or similar environment on your local machine. Let me know if you need further help or clarification!
"""


"""
ME: perform the same operation as before but with an image represented in Fourier space. 
#Chat gpt: 
Sure! However, we need to keep in mind that translating an image in the Fourier domain is not the same as translating an image in the spatial domain. In the Fourier domain, translations are performed by multiplying the image by a complex exponential, which essentially changes the phase of the Fourier transform.

Here is a Python code snippet demonstrating translation in the Fourier domain:


"""

import numpy as np
from skimage import data
from scipy.fft import fft2, ifft2, fftshift, ifftshift

# Load an example image
image = data.camera()

# Compute the 2-dimensional FFT of the image
image_fft = fft2(image)

# Define a translation
dy, dx = 30, 30

# Generate an array with the same size as the image
x = np.arange(0, image.shape[1])
y = np.arange(0, image.shape[0])
xv, yv = np.meshgrid(x, y)

# Create the translation function in Fourier space
translation = np.exp(-2j * np.pi * (dx * xv / image.shape[1] + dy * yv / image.shape[0]))

# Translate the image in Fourier space
translated_image_fft = image_fft * translation

# Transform back to spatial domain
translated_image = np.abs(ifft2(translated_image_fft))

# Now you could plot the images
"""
#Chat gpt: 
This code first computes the Fourier transform of the image using fft2. Then it generates a complex exponential function that represents a shift in the Fourier domain. This function is multiplied with the Fourier-transformed image to achieve the translation. Finally, the inverse Fourier transform ifft2 is used to transform the image back to the spatial domain.

As before, the current environment doesn't support displaying plots, so you would need to run this on your local machine to see the translated image.
"""

Another test I revisited was a mathematical problem I had posed to ChatGPT 3.5: “What is the intersection between the unit circle and the curve y = x^2?” The previous version had failed to provide a correct answer. However, ChatGPT 4.0 not only understood the problem but also wrote its own code to solve it, correctly identifying the intersection points. This was a clear demonstration of the AI’s improved mathematical problem-solving capabilities although It’s important to note that only the premium version of ChatGPT 4.0 can run the code it suggests, as it did in the example of the intersection of the unit circle and the parabola. +

ME: Can you tell me which is the intersection between the unit circle and the curve “y= x*x”?
ChatGPT: The unit circle is defined by the equation x²+y²=x²+y²=1. The curve y=x² is a parabola. To find their intersection, we can substitute y² from the equation of the parabola into the equation of the circle, and solve for x². So, we have x²+(x²)²= 1 or x²+x⁴=1 This is a quartic equation and can be difficult to solve, but let’s give it a try. [AND IT STARTS WRITING AND EXECUTING CODE]

So, is my job in danger now? Well, it’s a complex question. The advancements in AI, as demonstrated by the jump from ChatGPT 3.5 to 4.0, are indeed impressive. The AI’s ability to understand complex tasks and generate accurate solutions is growing quite fast. However, it’s important to remember that AI, at its core, is a tool. It’s a tool that can augment our capabilities, automate mundane tasks, and help us solve complex problems. In the end, whether AI becomes a threat or an ally in our jobs depends largely on how we choose to use it. If we see it as a tool to enhance our skills and productivity, then there’s no danger, only opportunity. But if we see it as a replacement for human intelligence and creativity, then we might indeed have cause for concern. For now, though, I believe we’re safe. The Turing test might be a thing of the past, but the “human test” is still very much alive.

Exploring the Observed Antibody Space (OAS)

The Observed Antibody Space (OAS) [1,2] is an amazing resource for investigating observed antibodies or as a resource for training antibody specific models, however; its size (over 2.4 billion unpaired and 1.5 million paired antibody sequences as of June 2023) can make it painful to work with. Additionally, OAS is extremely information rich, having nearly 100 columns for each antibody heavy or light chain, further complicating how to handle the data.

From spending a lot of time working with OAS, I wanted to share a few tricks and insights, which I hope will reduce the pain and increase the joy of working with OAS!

Continue reading →

Pairwise sequence identity and Tanimoto similarity in PDBbind

In this post I will cover how to calculate sequence identity and Tanimoto similarity between any pairs of complexes in PDBbind 2020. I used RDKit in python for Tanimoto similarity and the MMseqs2 software for sequence identity calculations.

A few weeks back I wanted to cluster the protein-ligand complexes in PDBbind 2020, but to achieve this I first needed to precompute the sequence identity between all pairs sequences in PDBbind, and Tanimoto similarity between all pairs of ligands. PDBbind 2020 includes 19.443 complexes but there are much fewer distinct ligands and proteins than that. However, I kept things simple and calculated the similarities for all 19.443*19.443 pairs. Calculating the Tanimoto similarity is relatively easy thanks to the BulkTanimotoSimilarity function in RDKit. The following code should do the trick:

from rdkit.Chem import AllChem, MolFromMol2File
from rdkit.DataStructs import BulkTanimotoSimilarity
import numpy as np
import os

fps = []
for pdb in pdbs:
    mol = MolFromMol2File(os.path.join('data', pdb, f'{pdb}_ligand.mol2'))
    fps.append(AllChem.GetMorganFingerprint(mol, 3))

sims = []
for i in range(len(fps)):
    sims.append(BulkTanimotoSimilarity(fps[i],fps))

arr = np.array(sims)
np.savez_compressed('data/tanimoto_similarity.npz', arr)

Sequence identity calculations in python with Biopandas turned out to be too slow for this amount of data so I used the ultra fast MMseqs2. The first step to running MMseqs2 is to create a .fasta file of all the sequences, which I call QUERY.fasta. This is what the first few lines look like:

Continue reading →

Checking your PDB file for clashing atoms

Detecting atom clashes in protein structures can be useful in a number of scenarios. For example if you are just about to start some molecular dynamics simulation, or if you want to check that a structure generated by a deep learning model is reasonable. It is quite straightforward to code, but I get the feeling that these sort of functions have been written from scratch hundreds of times. So to save you the effort, here is my implementation!!!

Continue reading →

PLIP on PDBbind with Python

Today’s blog post is about using PLIP to extract information about interactions between a protein and ligand in a bound complex, using data from PDBbind. The blog post will cover how to combine the protein pdb file and the ligand mol2 file into a pdb file, and how to use PLIP in a high-throughput manner with python.

In order for PLIP to consider the ligand as one molecule interacting with the protein, we need to modify the mol2 file of the ligand. The 8th column of the atom portion of a mol2 file (the portion starts with @<TRIPOS>ATOM) includes the ID of the ligand that the atom belongs to. Most often all the atoms have the same ligand ID, but for peptides for instance, the atoms have the ID of the residue they’re part of. The following code snippet will make the required changes:

ligand_file = 'data/5oxm/5oxm_ligand.mol2'

with open(ligand_file, 'r') as f:
    ligand_lines = f.readlines()

mod = False
for i in range(len(ligand_lines)):
    line = ligand_lines[i]
    if line == '@&lt;TRIPOS&gt;BOND\n':
        mod = False
        
    if mod:
        ligand_lines[i] = line[:59] + 'ISK     ' + line[67:]
        
    if line == '@&lt;TRIPOS&gt;ATOM\n':
        mod = True

with open('data/5oxm/5oxm_ligand_mod.mol2', 'w') as g:
    for j in ligand_lines:
        g.write(j)

Continue reading →

Molecular conformation generation with a DL-based force field

Deep learning (DL) methods in structural modelling are outcompeting force fields because they overcome the two main limitations to force fields methods – the prohibitively large search space for large systems and the limited accuracy of the description of the physics [4].

However, the two methods are also compatible. DL methods are helping to close the gap between the applications of force fields and ab initio methods [3]. The advantage of DL-based force fields is that the functional form does not have to be specified explicitly and much more accurate. Say goodbye to the 12-6 potential function.

In principle DL-based force fields can be applied anywhere where regular force fields have been applied, for example conformation generation [2]. The flip-side of DL-based methods commonly is poor generalization but it seems that force fields, when properly trained, generalize well. ANI trained on molecules with up to 8 heavy atoms is able to generalize to molecules with up to 54 atoms [1]. Excitingly for my research, ANI-2 [2] can replace UFF or MMFF as the energy minimization step for conformation generation in RDKit [5].

So let’s use Auto3D [2] to generated low energy conformations for the four molecules caffeine, Ibuprofen, an experimental hybrid peptide, and Imatinib:

CN1C=NC2=C1C(=O)N(C(=O)N2C)C CFF
CC(C)Cc1ccc(cc1)C(C)C(O)=O IBP
Cc1ccccc1CNC(=O)[C@@H]2C(SCN2C(=O)[C@H]([C@H](Cc3ccccc3)NC(=O)c4cccc(c4C)O)O)(C)C JE2
Cc1ccc(cc1Nc2nccc(n2)c3cccnc3)NC(=O)c4ccc(cc4)CN5CCN(CC5)C STI

Continue reading →

The ultimate modulefile for conda

Environment modules is a great tool for high-performance computing as it is a modular system to quickly and painlessly enable preset configurations of environment variables, for example a user may be provided with modulefile for an antiquated version of a tool and a bleeding-edge alpha version of that same tool and they can easily load whichever they wish. In many clusters the modules are created with a tool called EasyBuild, which delivered an out-of-the-box installation. This works for things like a single binary, but for conda this severely falls short as there are many many configuration changes needed.

Continue reading →

Ligands of CASF-2016

CASF-2016 is a commonly used benchmark for docking tools. Unfortunately, some of the provided ligand files cannot be loaded using RDKit (version 2022.09.1) but there is an easy remedy.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: Python

Deploying a Flask app part I: the gunicorn WSGI server

AI Can’t Believe It’s Not Butter

“The Rise of ChatGPT 4.0: Is the Future of Work in Jeopardy?”

Exploring the Observed Antibody Space (OAS)

Pairwise sequence identity and Tanimoto similarity in PDBbind

Checking your PDB file for clashing atoms

PLIP on PDBbind with Python

Molecular conformation generation with a DL-based force field

The ultimate modulefile for conda

Ligands of CASF-2016