Category Archives: AI

The Surprising Shape of Normal Distributions in High Dimensions

Multivariate Normal distributions are an essential component of virtually any modern deep learning method—be it to initialise the weights and biases of a neural network, perform variational inference in a probabilistic model, or provide a tractable noise distribution for generative modelling.

What most of us (including—until very recently—me) aren’t aware of, however, is that these Normal distributions begin to look less and less like the characteristic bell curve that we associate them with as their dimensionality increases.

Continue reading

AI Can’t Believe It’s Not Butter

Recently, I’ve been using a Convolutional Neural Network (CNN), and other methods, to predict the binding affinity of antibodies from their sequence. However, nine months ago, I applied a CNN to a far more important task – distinguishing images of butter from margarine. Please check out the GitHub link below to learn moo-re.

https://github.com/lewis-chinery/AI_cant_believe_its_not_butter

A simple criterion can conceal a multitude of chemical and structural sins

We’ve been investigating deep learning-based protein-ligand docking methods which often claim to be able to generate ligand binding modes within 2Å RMSD of the experimental one. We found, however, this simple criterion can conceal a multitude of chemical and structural sins…

DeepDock attempted to generate the ligand binding mode from PDB ID 1t9b
(light blue carbons, left), but gave pretzeled rings instead (white carbons, right).

Continue reading

Lucubration or Gaslighting?​

Or: The best lies have a nugget of truth in them.​

Lucubration – The action or occupation of intensive study originally by candle or lamplight.

Gaslighting – Psychological abuse in which a person or group causes someone to question their own sanity, memories, or perception.

I was recently having a play with Google Bard. Bard, unlike ChatGPT has access to live data. It also undergoes live feedback and quality control. I was hoping to see if it would find me any journals with articles on prion research which I’d previously overlooked.

Me: Please show me some recent articles about prion research.
(Because always be polite to our AI overlords, they’ll remember!)

Continue reading

What can you do with the OPIG Immunoinformatics Suite? v3.0

OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).

Continue reading

“The Rise of ChatGPT 4.0: Is the Future of Work in Jeopardy?”

In my previous blog post, I explored the capabilities of ChatGPT 3.5, testing its skills as a programmer and mathematician’s assistant. The results were mixed, to say the least. While it could handle simple coding tasks with ease, it faltered when faced with more complex mathematical problems and image manipulation tasks. I concluded that while ChatGPT 3.5 was impressive, it was far from replacing us in our jobs. It was a tool, yes, but one that needed careful supervision and frequent corrections.

Fast forward to today, and we have ChatGPT 4.0 with code interpreter enabled. I approached this new version with the same scepticism I had for its predecessor, but I must admit, my initial impressions have been challenged. The leap from 3.5 to 4.0 is not just a step forward; it’s a giant leap. You can see my experiment here https://chat.openai.com/share/0099a12f-e577-44bd-aa10-7c7508f15f19.

In my previous experiment, I asked ChatGPT 3.5 to translate a 2D image, a task that combines both programming and a bit of maths. The results were less than satisfactory. It was able to write code to translate images if they were in real space, a task that can be easily found searching on the web. However, it struggled to handle images represented in Fourier space correctly, and even with multiple hints, the best it could do was a rough approximation of the correct solution. It took six attempts to get the code automatically generated, and even then, it was far from perfect.

With ChatGPT 4.0, however, the story is different. When asked to perform a similar task, it provided a clear, concise, and correct solution. It understood the concept of image translation in the context of computer vision and image processing, and it was able to generate Python code using the scikit-image package to perform the task. The code was not only syntactically correct but also logically sound, demonstrating a significant improvement in understanding and problem-solving capabilities.

"""
#Me: Can you know provide some python code to translate (pixel shifts) images? And can you execute it?
#Chat gpt: Here's a simple Python example using scikit-image to translate an image:
"""

import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import AffineTransform, warp

# Load an example image
image = data.camera()

# Define a translation transform
tform = AffineTransform(translation=(50, 50))

# Apply the transform to the image
translated_image = warp(image, tform.inverse)

# Plot the original and translated images
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image, cmap='gray')
ax[0].set_title('Original Image')
ax[1].imshow(translated_image, cmap='gray')
ax[1].set_title('Translated Image')
plt.show()

"""
#Chat gpt: 
In this code, AffineTransform is used to define a translation of 50 pixels to the right and 50 pixels down, and warp is used to apply this transform to the image. The result is an image that has been shifted.
However, I must clarify that the current environment doesn't support displaying plots, so the matplotlib part of the code won't work here. You should be able to run it in a Jupyter notebook or similar environment on your local machine. Let me know if you need further help or clarification!
"""


"""
ME: perform the same operation as before but with an image represented in Fourier space. 
#Chat gpt: 
Sure! However, we need to keep in mind that translating an image in the Fourier domain is not the same as translating an image in the spatial domain. In the Fourier domain, translations are performed by multiplying the image by a complex exponential, which essentially changes the phase of the Fourier transform.

Here is a Python code snippet demonstrating translation in the Fourier domain:


"""

import numpy as np
from skimage import data
from scipy.fft import fft2, ifft2, fftshift, ifftshift

# Load an example image
image = data.camera()

# Compute the 2-dimensional FFT of the image
image_fft = fft2(image)

# Define a translation
dy, dx = 30, 30

# Generate an array with the same size as the image
x = np.arange(0, image.shape[1])
y = np.arange(0, image.shape[0])
xv, yv = np.meshgrid(x, y)

# Create the translation function in Fourier space
translation = np.exp(-2j * np.pi * (dx * xv / image.shape[1] + dy * yv / image.shape[0]))

# Translate the image in Fourier space
translated_image_fft = image_fft * translation

# Transform back to spatial domain
translated_image = np.abs(ifft2(translated_image_fft))

# Now you could plot the images
"""
#Chat gpt: 
This code first computes the Fourier transform of the image using fft2. Then it generates a complex exponential function that represents a shift in the Fourier domain. This function is multiplied with the Fourier-transformed image to achieve the translation. Finally, the inverse Fourier transform ifft2 is used to transform the image back to the spatial domain.

As before, the current environment doesn't support displaying plots, so you would need to run this on your local machine to see the translated image.
"""

Another test I revisited was a mathematical problem I had posed to ChatGPT 3.5: “What is the intersection between the unit circle and the curve y = x^2?” The previous version had failed to provide a correct answer. However, ChatGPT 4.0 not only understood the problem but also wrote its own code to solve it, correctly identifying the intersection points. This was a clear demonstration of the AI’s improved mathematical problem-solving capabilities although It’s important to note that only the premium version of ChatGPT 4.0 can run the code it suggests, as it did in the example of the intersection of the unit circle and the parabola. +

  • ME: Can you tell me which is the intersection between the unit circle and the curve “y= x*x”?
  • ChatGPT: The unit circle is defined by the equation x2+y2=x2+y2=1. The curve y=x2 is a parabola. To find their intersection, we can substitute y2 from the equation of the parabola into the equation of the circle, and solve for x2. So, we have x2+(x²)²= 1 or x²+x⁴=1 This is a quartic equation and can be difficult to solve, but let’s give it a try. [AND IT STARTS WRITING AND EXECUTING CODE]

So, is my job in danger now? Well, it’s a complex question. The advancements in AI, as demonstrated by the jump from ChatGPT 3.5 to 4.0, are indeed impressive. The AI’s ability to understand complex tasks and generate accurate solutions is growing quite fast. However, it’s important to remember that AI, at its core, is a tool. It’s a tool that can augment our capabilities, automate mundane tasks, and help us solve complex problems. In the end, whether AI becomes a threat or an ally in our jobs depends largely on how we choose to use it. If we see it as a tool to enhance our skills and productivity, then there’s no danger, only opportunity. But if we see it as a replacement for human intelligence and creativity, then we might indeed have cause for concern. For now, though, I believe we’re safe. The Turing test might be a thing of the past, but the “human test” is still very much alive.

9th Joint Sheffield Conference on Cheminformatics

Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:

  • De Novo Design
  • Open Science
  • Chemical Space
  • Physics-based Modelling
  • Machine Learning
  • Property Prediction
  • Virtual Screening
  • Case Studies
  • Molecular Representations

It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:

Continue reading

Machine learning strategies to overcome limited data availability

Machine learning (ML) for biological/biomedical applications is very challenging – in large part due to limitations in publicly available data (something we recently published about [1]). Substantial amounts of time and resources may be required to generate the types of data (eg protein structures, protein-protein binding affinity, microscopy images, gene expression values) required to train ML models, however.

In cases where there is sufficient data available to provide signal, but not enough for the desired performance, ML strategies can be employed:

Continue reading

Academic Reading? There’s an AI for that.

AI tools are literally everywhere. Recently, I stumbled across an AI aggregator website (theresanaiforthat.com) that, given a task, will find an AI solution. At the time of writing this article, there are 4871 AI’s across 1369 tasks, with solutions ranging from scribes to polygraph examiners. Recently, I stumbled across SciSpace (formerly typeset – https://typeset.io), an “AI assistant to understand scientific literature.” So, of course, I tested it out. In this blog post, we will explore the capabilities of SciSpace and discuss how it can potentially enhance your literature review process.

The user experience of a tool can make or break its adoption. Thankfully, SciSpace isn’t bad. Its main website offers basic search functionality, enabling you to find specific papers, topics, or authors within their database. I did notice that it is missing many new papers in its database; however, users have the option to upload a PDF for analysis. Additionally, each search result includes a TL;DR summary, providing a concise overview of the paper’s contents at a glance. As expected, this summary serves as a helpful reminder for familiar papers, but I often found it inadequate in providing enough information to grasp the main arguments or story of a paper. One interesting feature of SciSpace is the ability to “trace” papers in their database. By following the citations of a paper, users can navigate through related works, authors, and topics. I think this feature would be helpful during exploration and makes finding connections between related topics a little easier.

The best thing about SciSpace is the Copilot Chrome extension. Available whenever you open a paper’s PDF or journal link, it offers text analysis, summarization, and mathematical or table comprehension. It provides a set of common template prompts, which I found helpful. For example, “What were the key contributions of that paper?”, “What data and methods have been used in this paper?”, or “What are the limitations of this paper?” I found these prompts helpful in getting a quick overview of the work faster than reading the abstract, figures, and conclusion.

To put SciSpace Copilot to the test, I used it on my recent publication. The extension provided an accurate summary of the abstract and introduction. It effectively extracted the key result and arguments plus highlighted the main contributions of the work well. To be honest, it also offered a fair and accurate summary of the limitations of the study. It was helpful; however, it does not replace the need to read the full paper.

Tools like SciSpace are clearly becoming more popular and could potentially play a larger role in how we write, read, and understand research output. In the meantime, I’ve found it helpful to significantly improve the efficiency and effectiveness of my academic reading. Its clean, user-friendly interface, TL;DR summaries, and the impressive Copilot Chrome extension save me time. Plus, it’s completely free! I do expect that at some point it will become a paid tool. Until then, it’s a great way to stay on top of published work and build an understanding of related, but unfamiliar, fields.

Unclear documentation? ChatGPT can help!

The PyMOL Python API is a useful resource for most people doing research in OPIG, whether focussed on antibodies, small molecule drug design or protein folding. However, the documentation is poorly structured and difficult to interpret without first having understood the structure of the module. In particular, the differences between use of the PyMOL command line and the API can be unclear, leading to a much longer debugging process for code than you’d like.

While I’m reluctant to continue the recent theme of ChatGPT-related posts, this is a use for ChatGPT that would have been incredibly useful to me when I was first getting to grips with the PyMOL API.

Continue reading