Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 1): Ensemble Comparison

When working with structural ensembles from molecular dynamics, AlphaFold2 subsampling, or ensemble reweighting against experimental data, you quickly run into visualization problems. Many of these problems standard PyMOL tutorials don’t address: what do you do when there’s no single reference structure?

In this two-part series, I’ll share the PyMOL techniques I’ve developed for visualizing weighted ensembles where multiple conformational states coexist. Part 1 covers reference state handling, RMSD-based coloring, and cluster visualization. Part 2 will tackle efficient SASA surface generation for large ensembles. To the best of my knowledge, this is the most advanced PyMOL guide EVER.

The code snippets here are extracted from full scripts attached at the end of this post. All examples use two systems: TeaA (a membrane transporter with distinct open/closed states) and MoPrP (mouse Prion Protein with partially unfolded forms).

The Multi-Reference Problem

Traditional structure visualization assumes you have one reference to align everything against. But ensemble methods often produce populations spanning multiple metastable states. For TeaA, we have open and closed conformations. For MoPrP, we have a folded state plus two partially unfolded forms (PUF1 and PUF2).

Trying to show “how well our ensemble matches the data” requires thinking carefully about:

How to display multiple references without visual clutter
How to align structures when the reference itself is an ensemble
How to color structures by their conformational similarity to different states

Distinguishing Reference States with Cartoon Styles

When overlaying multiple reference structures, color alone isn’t enough—especially in grayscale-friendly publications. For multiple references, cartoon dash is a great choice: provides solid cartoons for the ensemble structures of interest. In my experience this is often clearer than transparency.

# Load reference structures
cmd.load("TeaA_ref_closed_state.pdb", "closed_ref")
cmd.load("TeaA_ref_open_state.pdb", "open_ref")

# Style references as dashed cartoons
cmd.show_as("cartoon", "closed_ref")
cmd.cartoon("dash", "closed_ref")
cmd.set("dash_width", 0.1, "closed_ref")
cmd.color("white", "closed_ref")
cmd.set("cartoon_transparency", 0.4, "closed_ref")

cmd.show_as("cartoon", "open_ref")
cmd.cartoon("dash", "open_ref")
cmd.set("dash_width", 0.1, "open_ref")
cmd.color("black", "open_ref")
cmd.set("cartoon_transparency", 0.4, "open_ref")

# Show all states if reference is an NMR ensemble
cmd.set("all_states", 1, "closed_ref")

The dashed style immediately signals “this is a reference” while the transparency prevents it from dominating the visualization. For two-state systems like TeaA, I use a white/black or blue/orange color scheme that remains distinguishable even without color.

Left: Grayscale-friendly visualization using dash width variation. Right: Color-coded open (orange) and closed (blue) references.

Kabsch Alignment to Frame-Averaged Coordinates

When your reference is itself an ensemble (like an NMR structure or MD trajectory), aligning to a single frame introduces bias. Instead, compute the mean coordinates across all reference frames and align to that.

import numpy as np
from pymol import cmd

def get_reference_target_coords(reference_pdb, align_selection):
    """
    Get target alignment coordinates from reference PDB.
    If multi-state, returns frame-averaged coordinates.
    """
    ref_obj = "__temp_ref__"
    cmd.load(reference_pdb, ref_obj)
    num_states = cmd.count_states(ref_obj)
    
    sel = f"{ref_obj} and {align_selection}"
    
    if num_states > 1:
        # Compute frame average for multi-state references
        sum_coords = None
        count = 0
        
        for i in range(1, num_states + 1):
            coords = cmd.get_coords(sel, state=i)
            if coords is not None and len(coords) > 0:
                if sum_coords is None:
                    sum_coords = np.zeros_like(coords)
                sum_coords += coords
                count += 1
        
        target_coords = sum_coords / count if count > 0 else None
    else:
        target_coords = cmd.get_coords(sel, state=1)
    
    cmd.delete(ref_obj)
    return target_coords

The alignment selection matters enormously. For TeaA, I align on stable secondary structure elements that don’t change between open/closed:

# TeaA: align on stable helices, not the mobile domains
ALIGN_SELECTION = "(resi 226-256 or resi 16-27 or resi 66-69) and name CA"

For MoPrP, I use the stable helix region (residues 79-93) that remains folded even in partially unfolded forms:

# MoPrP: align on the stable helix core
ALIGN_START = 79
ALIGN_END = 93

Now we can align every frame of our trajectory to these averaged reference coordinates using the Kabsch algorithm:

def kabsch_alignment(mobile_coords, reference_coords):
    """Kabsch algorithm to find optimal rotation matrix."""
    mobile_centered = mobile_coords - mobile_coords.mean(axis=0)
    reference_centered = reference_coords - reference_coords.mean(axis=0)
    
    H = np.dot(mobile_centered.T, reference_centered)
    U, S, Vt = np.linalg.svd(H)
    R = np.dot(Vt.T, U.T)
    
    # Handle reflection case
    if np.linalg.det(R) < 0:
        Vt[-1, :] *= -1
        R = np.dot(Vt.T, U.T)
    
    t = reference_coords.mean(axis=0) - np.dot(mobile_coords.mean(axis=0), R.T)
    return R, t


def align_trajectory_to_coords(obj_name, target_coords, align_selection):
    """Align every state of a PyMOL object to target coordinates."""
    num_states = cmd.count_states(obj_name)
    sel_align = f"{obj_name} and {align_selection}"
    
    for i in range(1, num_states + 1):
        mobile_align_coords = cmd.get_coords(sel_align, state=i)
        R, t = kabsch_alignment(mobile_align_coords, target_coords)
        
        # Transform ALL atoms, not just alignment selection
        all_coords = cmd.get_coords(obj_name, state=i)
        new_coords = np.dot(all_coords, R.T) + t
        cmd.load_coords(new_coords, obj_name, state=i)

RMSD-Based Coloring of Top Weighted Structures

For weighted ensembles, you often want to show the highest-weight structures colored by their RMSD to a reference state. This reveals whether your top structures cluster near one conformational state or span multiple states.

def show_top_structures_by_rmsd(obj_name, weights, n=20, reference_obj=None):
    """
    Extract top N structures by weight, color by RMSD to reference.
    """
    top_indices = np.argsort(weights)[::-1][:n]
    
    # Calculate RMSD to reference for each top structure
    rmsds = []
    for idx in top_indices:
        state = int(idx) + 1
        temp_name = f"temp_rmsd_{state}"
        cmd.create(temp_name, obj_name, state, 1)
        
        # Align and calculate RMSD
        cmd.align(temp_name, reference_obj, cycles=0)
        rmsd = cmd.rms_cur(f"{temp_name} and name CA", 
                          f"{reference_obj} and name CA")
        rmsds.append(rmsd)
        cmd.delete(temp_name)
    
    # Create individual objects colored by RMSD
    rmsd_min, rmsd_max = min(rmsds), max(rmsds)
    
    for i, idx in enumerate(top_indices):
        state = int(idx) + 1
        name = f"{obj_name}_top_{i+1:02d}"
        cmd.create(name, obj_name, state, 1)
        
        # Store RMSD in B-factor for spectrum coloring
        cmd.alter(name, f"b={rmsds[i]}")
        
        # Set transparency based on weight
        w = weights[idx]
        transparency = max(0.1, min(0.9, 0.5 - w * 2))
        cmd.set("cartoon_transparency", transparency, name)
        
        # Visualization settings
        cmd.show("cartoon", name)
        cmd.cartoon("tube", name)
        cmd.set("cartoon_tube_radius", 0.2, name)
        
        # Color by RMSD: cyan (low/similar) -> yellow (high/different)
        cmd.spectrum("b", "cyan white grey yellow", name, 
                    minimum=rmsd_min, maximum=rmsd_max)
    
    # Group all top structures
    members = " ".join([f"{obj_name}_top_{i+1:02d}" for i in range(n)])
    cmd.group(f"{obj_name}_top_{n}_group", members)

For TeaA, this reveals whether the highest-weight structures favor the open or closed conformation:

Top weighted structures from different fitting methods, colored by RMSD to the open state. Cyan indicates structures similar to the open conformation; yellow indicates structures closer to closed.

Visualizing Cluster Mean Structures with Per-Residue Deviation

When you have clustered your ensemble (e.g., into Folded, PUF1, PUF2 states), showing the mean structure of each cluster with per-residue RMSD coloring communicates both the average conformation and its internal variability.

First, split your trajectory into individual frames:

Then compute per-residue RMSD and store in B-factors:

# Accumulate squared deviations for mean structure RMSD
    sum_sq_dev = {}
    n_frames = len(frame_objs)
    
    for obj in frame_objs:
        model = cmd.get_model(f"{obj} and name CA")
        
        for atom in model.atom:
            resi = atom.resi.strip()
            curr_pos = np.array(atom.coord)
            
            if resi in ref_coords_dict:
                ref_pos = ref_coords_dict[resi]
                dist = np.linalg.norm(curr_pos - ref_pos)
                
                # Accumulate for ensemble RMSD
                if resi not in sum_sq_dev:
                    sum_sq_dev[resi] = 0.0
                sum_sq_dev[resi] += dist ** 2
        
        # Apply deviation to this frame's B-factors
        stored.dev_map = {resi: np.linalg.norm(
            np.array(cmd.get_model(f"{obj} and name CA and resi {resi}").atom[0].coord) - 
            ref_coords_dict.get(resi, np.zeros(3))
        ) for resi in ref_coords_dict}
        
        cmd.alter(obj, "b=stored.dev_map.get(resi.strip(), 0.0)")

Finally, create the mean structure with ensemble RMSD coloring:

# Create mean structure object
    mean_obj = f"{obj_name}_mean"
    cmd.create(mean_obj, frame_objs[0])
    
    # Compute mean coordinates for ALL atoms
    mean_coords = compute_all_atom_mean(obj_name)
    cmd.load_coords(mean_coords, mean_obj, state=1)
    
    # Calculate per-residue RMSD and apply to B-factors
    rmsd_map = {resi: np.sqrt(sq_sum / n_frames) 
                for resi, sq_sum in sum_sq_dev.items()}
    
    stored.mean_rmsd = rmsd_map
    cmd.alter(mean_obj, "b=stored.mean_rmsd.get(resi.strip(), 0.0)")
    
    # Visualize with putty representation
    cmd.show("cartoon", mean_obj)
    cmd.cartoon("putty", mean_obj)
    cmd.spectrum("b", "paleyellow orange pink red black", mean_obj,
                minimum=1.0, maximum=4.0)
    
    # Putty settings scale tube width by B-factor
    cmd.set("cartoon_putty_transform", 7, mean_obj)
    cmd.set("cartoon_putty_scale_min", 0.5, mean_obj)
    cmd.set("cartoon_putty_scale_max", 3.5, mean_obj)

Mean structures for Folded, PUF1, and PUF2 clusters. Tube width and color indicate per-residue RMSD: thin/yellow regions are stable; thick/black regions show high variability.

Ray Tracing Modes for NMR Ensembles

NMR structures with multiple models benefit from careful ray tracing. Mode 3 (quantized color + black outline) works beautifully for showing ensemble spread:

# Load NMR ensemble and show all states
cmd.load("2L39_crop.pdb", "nmr_ref")
cmd.set("all_states", 1, "nmr_ref")

# Style settings
cmd.show("cartoon", "nmr_ref")
cmd.cartoon("tube", "nmr_ref")
cmd.color("white", "nmr_ref")
cmd.set("cartoon_tube_radius", 0.5, "nmr_ref")
cmd.set("cartoon_transparency", 0.3, "nmr_ref")

# Ray trace settings for clean ensemble visualization
cmd.set("orthoscopic", 1)
cmd.set("ray_trace_mode", 3)  # Quantized + outline
cmd.set("ray_shadows", 0)
cmd.set("antialias", 2)
cmd.bg_color("white")

# Render
cmd.ray(1200, 1200)

NMR ensemble (PDB: 2L39) rendered with ray_trace_mode 3. The black outlines help distinguish overlapping conformers.

Compare with mode 1 (standard) for a softer look, or experiment with ray_transparency_oblique for depth effects:

# Alternative: softer rendering with oblique transparency
cmd.set("ray_trace_mode", 1)
cmd.set("ray_transparency_oblique", 1.0)
cmd.set("transparency_mode", 1)

Bringing It Together: A Complete Visualization Pipeline

Here’s how these pieces combine for a weighted ensemble analysis:

# 1. Set up environment
cmd.bg_color("white")
cmd.set("orthoscopic", 1)

# 2. Load and style reference(s)
target_coords = get_reference_target_coords("reference.pdb", ALIGN_SELECTION)

cmd.load("reference.pdb", "ref")
cmd.cartoon("dash", "ref")
cmd.set("cartoon_transparency", 0.4, "ref")

# 3. Load ensemble and trajectory
cmd.load("topology.pdb", "ensemble")
cmd.load_traj("trajectory.xtc", "ensemble")

# 4. Align to reference
align_trajectory_to_coords("ensemble", target_coords, ALIGN_SELECTION)

# 5. Load weights and show top structures
weights = np.load("weights.npz")["weights"]
show_top_structures_by_rmsd("ensemble", weights, n=20, reference_obj="ref")

# 6. Final rendering settings
cmd.set("ray_trace_mode", 3)
cmd.set("ray_shadows", 0)
cmd.set("antialias", 2)
cmd.zoom()

Comparison of ensemble fitting methods for MoPrP. Top structures colored by weight (cyan=low, magenta/dark=high, overlaid on the NMR reference (gray).

Summary

Visualizing weighted structural ensembles requires moving beyond single-structure thinking:

Use cartoon dash to distinguish references from ensemble members
Align to frame-averaged coordinates when your reference is itself an ensemble
Color by RMSD to reveal conformational preferences in top structures
Show cluster means with putty representation to communicate both average structure and variability
Experiment with ray trace modes for clean multi-model rendering

In Part 2, we’ll tackle generating SASA surfaces from weighted ensembles—including efficient algorithms for handling the computational demands of large trajectory datasets.

Author

Alexi Hussain

View all posts

Oxford Protein Informatics Group

or "OPIG" to friends