You’ve just finished a week-long molecular dynamics simulation. You’re excited to see what happened to your protein complex, so you load up the trajectory in VMD and… your protein looks like it’s been through a blender. Pieces are scattered across the screen, water molecules are everywhere, and half your complex seems to have teleported to the other side of the simulation box. This chaos is caused by periodic boundary conditions (PBC).
PBC
PBC is a computational trick that simulates bulk behaviour by treating your simulation box like a repeating tile. When a molecule exits one side, it immediately reappears on the opposite side. This works perfectly for physics as your protein experiences realistic bulk water behaviour.

But it creates visualisation mess. During simulation, different parts of your protein complex cross box boundaries at different times. When you load the raw trajectory, your protein’s domains appear scattered across space despite being properly bonded. The simulation knows they’re connected (the bonds are intact), but visualisation software shows raw coordinates, making your protein look like it exploded.
But periodic boundary artifacts are just one piece of the puzzle. From my experience raw MD trajectories suffer from several interconnected issues that make them unsuitable for analysis or presentation
The four horsemen of MD trajectory chaos
1. Periodic boundary artefacts
Imagine your simulation box as one tile in an infinite repeating pattern. When molecules move across the box edges, they pop up on the other side. This creates visual chaos where your carefully prepared protein complex looks like it exploded.
2. Solvent overload
Your 50,000-atom protein is swimming in 200,000 water molecules plus ions. While important for realistic simulations, all that solvent makes analysis slow and visualisation cluttered. For most post-simulation analysis, you would only care about the protein.
3. Structural drift
Even though your protein stays folded, the entire complex tumbles and translates through space during the simulation. Without alignment, measuring distances or calculating RMSDs becomes meaningless.
4. Bloated file sizes
Raw trajectories with solvent can be massive, sometimes gigabytes per microsecond. These cumbersome files slow down analysis and eat up storage space.
If you’ve ever run MD simulations, you’ve probably encountered these. The good news? Your simulation is perfectly fine. The bad news? Raw MD trajectories need some serious cleanup before they’re ready for analysis and/or visualisations.
This guide cuts straight to what you actually need to piece together the right commands, especially if you’re dealing with protein complexes where getting the imaging right can be tricky.
Enter CPPTRAJ your trajectory cleanup crew
AMBER’s CPPTRAJ tool is designed to solve transforming your messy trajectories into analysis-ready datasets.
Fix the periodic boundary mess
# Load your system parm system.prmtop trajin simulation.nc # centre on your most stable component (usually the largest protein) center :1-300 mass origin # Unwrap other components so they stay connected unwrap :301-450 # Second protein/domain unwrap :451-460 # Small molecule/ligand # Use autoimage to fix the overall presentation autoimage anchor :1-300 fixed :301-460
It is key to pick your most stable component (usually your main protein) as an anchor, then unwrap and re-image everything else relative to it, which keeps your complex looking intact while preserving the correct physics.
Stripping the excess for smaller files
# Remove water and ions strip :WAT strip :Na+ strip :Cl- strip :K+ # Save the stripped topology for later use parmout system_clean.prmtop
Simple but effective as you’ve just reduced your system size by 80% while keeping everything that matters for most analyses.
Align for consistency
# Fit to remove overall translation/rotation rms fit :1-300@CA # Align to backbone carbons of main protein # Alternative: fit to the whole protein rms fit :1-300
Now every frame is consistently oriented, making distance measurements and structural comparisons meaningful.
Output Your Clean Trajectory
trajout simulation_clean.nc run
After running these straightforward commands your trajectory files PBC would be fixed, your analysis runs 5-10x faster without all that water as file sizes drop dramatically (often 80-90% smaller)and significantly, distance measurements and structural analyses actually make sense, and visualisations will look better instead of like molecular confetti.
Python libraries/wrapper equivalent
If you do not use CPPTRAJ on the terminal directly, Python offers several libraries for MD trajectory processing such as MDAnalysis (my preferred), Pytraj, and MDTraj
MDAnalysis
# This does the FULL cleanup pipeline:
# Unwrap PBC artefacts
# Center on stable domain
# Align all frames
# Remove solvent automatically
# Save clean trajectory + topology
!pip install MDAnalysis MDAnalysisTests
import MDAnalysis as mda
from MDAnalysis.transformations import unwrap, center_in_box, fit_rot_trans
def cleanup_trajectory(topology_file, trajectory_file, output_prefix):
# load trajectory
u = mda.Universe(topology_file, trajectory_file)
# define selections (adjust residue numbers for your system)
protein = u.select_atoms('protein')
main_domain = u.select_atoms('resid 1-250') # most stable domain
# set up transformations (equivalent to CPPTRAJ commands)
transformations = [
unwrap(protein), # unwrap
center_in_box(main_domain, center='mass'), # center
fit_rot_trans(main_domain, main_domain, # rms fit
weights='mass', check_continuity=True)
]
u.trajectory.add_transformations(*transformations)
# write clean trajectory (solvent automatically excluded)
with mda.Writer(f"{output_prefix}_clean.nc", n_atoms=protein.n_atoms) as writer:
for ts in u.trajectory:
writer.write(protein)
# save aclean topology
protein.write(f"{output_prefix}_clean.prmtop")
print(f"Cleanup complete: {output_prefix}_clean.nc")
# example usage
cleanup_trajectory("system.prmtop", "md_production.nc", "system")
# For multiple trajectories
trajectory_files = ["md_1.nc", "md_2.nc", "md_3.nc", "md_4.nc"]
for i, traj in enumerate(trajectory_files, 1):
cleanup_trajectory("system.prmtop", traj, f"system_{i}")
PyTraj (CPPTRAJ Python wrapper)
import pytraj as pt
# direct CPPTRAJ commands in python
traj = pt.load('system.nc', 'system.prmtop')
traj = pt.center(traj, mask=':1-250', mass_center=True)
traj = pt.unwrap(traj, mask=':251-460')
traj = pt.autoimage(traj)
traj = pt.strip(traj, ':WAT,Na+,Cl-')
traj = pt.rms_fit(traj, mask='@CA')
pt.write_traj('clean.nc', traj, overwrite=True)
stripped_top = pt.strip(traj.top, ':WAT,Na+,Cl-')
stripped_top.save('clean.prmtop')
Pro Tip
Always preserve your original trajectory. These processing steps are irreversible, and you might need the raw data later if a mistake was made along the way.
