A guide to fixing broken AMBER MD trajectory files and visualisations.

You’ve just finished a week-long molecular dynamics simulation. You’re excited to see what happened to your protein complex, so you load up the trajectory in VMD and… your protein looks like it’s been through a blender. Pieces are scattered across the screen, water molecules are everywhere, and half your complex seems to have teleported to the other side of the simulation box. This chaos is caused by periodic boundary conditions (PBC).

PBC

PBC is a computational trick that simulates bulk behaviour by treating your simulation box like a repeating tile. When a molecule exits one side, it immediately reappears on the opposite side. This works perfectly for physics as your protein experiences realistic bulk water behaviour.

But it creates visualisation mess. During simulation, different parts of your protein complex cross box boundaries at different times. When you load the raw trajectory, your protein’s domains appear scattered across space despite being properly bonded. The simulation knows they’re connected (the bonds are intact), but visualisation software shows raw coordinates, making your protein look like it exploded.

But periodic boundary artifacts are just one piece of the puzzle. From my experience raw MD trajectories suffer from several interconnected issues that make them unsuitable for analysis or presentation

The four horsemen of MD trajectory chaos

1. Periodic boundary artefacts

Imagine your simulation box as one tile in an infinite repeating pattern. When molecules move across the box edges, they pop up on the other side. This creates visual chaos where your carefully prepared protein complex looks like it exploded.

2. Solvent overload

Your 50,000-atom protein is swimming in 200,000 water molecules plus ions. While important for realistic simulations, all that solvent makes analysis slow and visualisation cluttered. For most post-simulation analysis, you would only care about the protein.

3. Structural drift

Even though your protein stays folded, the entire complex tumbles and translates through space during the simulation. Without alignment, measuring distances or calculating RMSDs becomes meaningless.

4. Bloated file sizes

Raw trajectories with solvent can be massive, sometimes gigabytes per microsecond. These cumbersome files slow down analysis and eat up storage space.

If you’ve ever run MD simulations, you’ve probably encountered these. The good news? Your simulation is perfectly fine. The bad news? Raw MD trajectories need some serious cleanup before they’re ready for analysis and/or visualisations.

This guide cuts straight to what you actually need to piece together the right commands, especially if you’re dealing with protein complexes where getting the imaging right can be tricky.

Enter CPPTRAJ your trajectory cleanup crew

AMBER’s CPPTRAJ tool is designed to solve transforming your messy trajectories into analysis-ready datasets.

Fix the periodic boundary mess

# Load your system
parm system.prmtop
trajin simulation.nc

# centre on your most stable component (usually the largest protein)
center :1-300 mass origin

# Unwrap other components so they stay connected
unwrap :301-450    # Second protein/domain
unwrap :451-460    # Small molecule/ligand

# Use autoimage to fix the overall presentation
autoimage anchor :1-300 fixed :301-460

It is key to pick your most stable component (usually your main protein) as an anchor, then unwrap and re-image everything else relative to it, which keeps your complex looking intact while preserving the correct physics.

Stripping the excess for smaller files

# Remove water and ions
strip :WAT
strip :Na+
strip :Cl-
strip :K+

# Save the stripped topology for later use
parmout system_clean.prmtop

Simple but effective as you’ve just reduced your system size by 80% while keeping everything that matters for most analyses.

Align for consistency

# Fit to remove overall translation/rotation
rms fit :1-300@CA    # Align to backbone carbons of main protein

# Alternative: fit to the whole protein
rms fit :1-300

Now every frame is consistently oriented, making distance measurements and structural comparisons meaningful.

Output Your Clean Trajectory

trajout simulation_clean.nc
run

After running these straightforward commands your trajectory files PBC would be fixed, your analysis runs 5-10x faster without all that water as file sizes drop dramatically (often 80-90% smaller)and significantly, distance measurements and structural analyses actually make sense, and visualisations will look better instead of like molecular confetti.

Python libraries/wrapper equivalent

If you do not use CPPTRAJ on the terminal directly, Python offers several libraries for MD trajectory processing such as MDAnalysis (my preferred), Pytraj, and MDTraj

MDAnalysis

# This does the FULL cleanup pipeline:
# Unwrap PBC artefacts
# Center on stable domain  
# Align all frames
# Remove solvent automatically
# Save clean trajectory + topology

!pip install MDAnalysis MDAnalysisTests

import MDAnalysis as mda
from MDAnalysis.transformations import unwrap, center_in_box, fit_rot_trans

def cleanup_trajectory(topology_file, trajectory_file, output_prefix):
    # load trajectory
    u = mda.Universe(topology_file, trajectory_file)
    
    # define selections (adjust residue numbers for your system)
    protein = u.select_atoms('protein')
    main_domain = u.select_atoms('resid 1-250')  # most stable domain
    
    # set up transformations (equivalent to CPPTRAJ commands)
    transformations = [
        unwrap(protein),                                    # unwrap
        center_in_box(main_domain, center='mass'),          # center
        fit_rot_trans(main_domain, main_domain,             # rms fit
                     weights='mass', check_continuity=True)
    ]
    
    u.trajectory.add_transformations(*transformations)
    
    # write clean trajectory (solvent automatically excluded)
    with mda.Writer(f"{output_prefix}_clean.nc", n_atoms=protein.n_atoms) as writer:
        for ts in u.trajectory:
            writer.write(protein)
    
    # save aclean topology 
    protein.write(f"{output_prefix}_clean.prmtop")
    
    print(f"Cleanup complete: {output_prefix}_clean.nc")

# example usage
cleanup_trajectory("system.prmtop", "md_production.nc", "system")

# For multiple trajectories
trajectory_files = ["md_1.nc", "md_2.nc", "md_3.nc", "md_4.nc"]
for i, traj in enumerate(trajectory_files, 1):
    cleanup_trajectory("system.prmtop", traj, f"system_{i}")

PyTraj (CPPTRAJ Python wrapper)

import pytraj as pt

# direct CPPTRAJ commands in python
traj = pt.load('system.nc', 'system.prmtop')
traj = pt.center(traj, mask=':1-250', mass_center=True)
traj = pt.unwrap(traj, mask=':251-460') 
traj = pt.autoimage(traj)
traj = pt.strip(traj, ':WAT,Na+,Cl-')
traj = pt.rms_fit(traj, mask='@CA')
pt.write_traj('clean.nc', traj, overwrite=True)
stripped_top = pt.strip(traj.top, ':WAT,Na+,Cl-')
stripped_top.save('clean.prmtop')

Pro Tip

Always preserve your original trajectory. These processing steps are irreversible, and you might need the raw data later if a mistake was made along the way.

Author

King Ifashe

View all posts

Oxford Protein Informatics Group

or "OPIG" to friends