How to do MDs

From ChengLab
Jump to: navigation, search

Amber

Manual here

Prepare the system (Protein(s) only)

Prepare your protein by removing all hydrogens (like -ignh for pdb2gmx) and feed it to tleap.

$ cat build.leap

source leaprc.protein.ff14SB
source leaprc.RNA.OL3
source leaprc.DNA.OL15
source leaprc.water.tip3p
mol = loadpdb mol.pdb
solvatebox mol TIP3PBOX 10
addions mol K+ 154 Cl- 91
saveamberparm mol ionized.parm7 ionized.rst7

Points to be caution and ALWAYS save you some time:

  1. Properly distinguish all the components by adding TER cards (read this)
  2. Check your C-terminal which usually requires some fix: If you already have two oxygens, change them into O and OXT
  3. Consider a fix.tcl (see below)
  4. Calculate number of ions using this. You may do a trial run of tleap to confirm the net charge of your solute
  5. If a PDB file can go through pdb2gmx, it will be probably fine for tleap
$ cat fix.tcl
mol new mol.pdb
set sel [atomselect top "resname ILE and name CD"]
$sel set name CD1
set sel [atomselect top "name OC1 OT1"]
$sel set name O
set sel [atomselect top "name OC2 OT2"]
$sel set name OXT
set sel [atomselect top all]
$sel writepdb mol.pdb

The ionized.parm7 and ionized.rst7 are now ready for simulations!

MD parameters

Some sample parameters can be found here and the run commands are

export prmtop=ionized.parm7
pmemd.cuda -O -i 01_min.in -o 01_min.out -p $prmtop -c ionized.rst7 -r 01_min.rst -ref ionized.rst7
pmemd.cuda -O -i 02_heat.in -o 02_heat.out -p $prmtop -c 01_min.rst -r 02_heat.rst -x 02_heat.nc -ref 01_min.rst
pmemd.cuda -O -i 03_eq.in -o 03_eq.out -p $prmtop -c 02_heat.rst -r 03_eq.rst -x 03_eq.nc -ref 02_heat.rst
pmemd.cuda -O -i 05_Pull.in -o 05_Pull.out -p $prmtop -c 03_eq.rst -r 05_Pull.rst -x 05_Pull.nc
pmemd.cuda -O -i 06_Prod.in -o 06_Prod.out -p $prmtop -c 05_Pull.rst -r 06_Prod.rst -x 06_Prod.nc

Here is a detailed version of md.in

npt 300K
&cntrl
   ! MD Controls
   ! [imin] 0 (default) for MD, 1 for energy minimization, 5 for re-reun
   imin=0, 
   ! [nmropt] 0 (default), 1 for NMR analysis e.g. COM.rst in US
   !nmropt=1
   ! [ntx] 0 (default) only reads coordinates, 1 reads coordinates and velocities
   ntx=5, 
   ! [irest] 0 (default) do not restart, 1 reads coordinates and velocities from restart file. If irest=1, ntx=5
   irest=1,
   ! [nstlim] Number of steps
   nstlim=250000000, 
   ! [dt] Time step (psec)
   dt=0.002,
   ! [ntpr] Output frequency of energy info
   ntpr=500, 
   ! [ntwr] Output frequency of restart file
   ntwr=500,
   ! [ntwx] Output frequency of trajectory
   ntwx=5000, 
   ! [ntr] 0 (default). If you restrain any atom, ntr=1
   ntr=1,

   ! Temperature regulation
   ! [ntt] 0 for NVE, 3 for Langevin dynamics
   ntt=3, 
   ! [temp0] 300 (default)
   temp0=300, 
   ! [gamma_ln] (ps-1) 0 (default) Recommended value 2.0-5.0
   gamma_ln=5.0,

   ! Pressure regulation
   ! [ntp] 0 (default), 1 for isotropic, 2 for anisotropic (membrane system)
   ntp=1, 
   ! [barostat] 1 (default) for Berendsen, 2 for MC barostat (recommended)
   barostat=2,
   ! [pres0] 1.0 (default)
   pres0=1.0, 
   ! [taup] (ps) 1.0 (default), recommended 1.0-5.0
   taup=2.0,

   ! Bond length constraints
   ! If TIP3P, ntc=ntf=2
   ! [ntc] 1 (default) for no constraint, 2 for h-bonds, 3 for all-bonds
   ntc=2,
   ! Genertic parameters
   ! [ntf] 1 (default), 2 for bond interactions involving H-atoms omitted (recommended)
   ntf=2, 
   ! [ntb] Periodic boundaries or not. 1 for constant volume, 2 for constant pressure. 
   ntb=2, 
   ! [cut] (A) Non-bonded cutoff. 8.0 (default)
   cut=10.0
/
First restrain - 3' of TS
1.0
FIND
N3 * * *
SEARCH
RES 1368 1368
END
Second restrain - 5' of NTS
1.0
FIND
N1 * * *
SEARCH
RES 1369 1369
END
Third restrain - CA of residue 1215
1.0
FIND
CA * * *
SEARCH
RES 1215 1215
END
END

Distance Restraints

With distance restraints, you can easily initiate biased MDs or umbrella sampling simulations.

Remember add the following to the MD parameter:

DISANG=COM_dist.RST
DUMPAVE=05_Pull_dist.dat

With iat=atom_index1,atom_index2, you restrain between two specific atoms:

&rst iat=6487,20697, r1=-199.0, r2=DISTHERE, r3=DISTHERE, r4=199.0, rk2=1.0, rk3=1.0, iresid=0, fxyz=1,1,1, outxyz=1, /

With iat=-1,-1, you can specify groups of atoms (COM) to be restrained using igr1 and igr2:

&rst iat=-1,-1, r1=-199.0, r2=DISTHERE, r3=DISTHERE, r4=199.0, rk2=1.0, rk3=1.0, iresid=0, fxyz=1,1,1, outxyz=1,
igr1=6487,6507,6529,6543,6555,6577,6588,6607,6621,6633,6652,6663,6680,6697,6713,6733,6745,
igr2=20697,20709,20719,20733,20740,20750,20771,20788,20807,20814,20833, /

MD trajectories

By default, AMBER gives trajectories and restart coordinates in NETCDF format. Load them into VMD using

mol addfile out.rst type netcdf

Or convert the restart coordinates to PDB for easier manipulation

cpptraj -p ionized.parm7 -y in.rst7 -x out.pdb

CPPTRAJ

You can first mask/strip your trajectories remotely and then transfer a much smaller trajectory file to local for analysis. There are two ways you can reduce the size of your trajectories: 1. remove solvent; 2. less frequent frames. Usually you would like to do both.

parm ionized.parm7
trajin md.nc
strip :WAT,K+,Cl- outprefix stripped
trajout stripped.dcd dcd offset 1000

Two more things you usually would like to add

autoimage: automatically centre and image trajectory concerning pbc
rms: it calculates the rmsd but you wanna use it to align the structure to reference (e.g. fist frame)
parm ionized.parm7
trajin md.nc
autoimage
strip :WAT,K+,Cl- outprefix stripped
rms first @CA
trajout stripped.dcd dcd offset 1000

If you wanna output a trajectory of restart coordinates, use:

trajin md.rst

If you wanna output multiple PDBs for each frame:

trajout frame.pdb pdb multi

Easy strip multiple trials of runs

#!/bin/bash
mkdir -p traj
for ii in $(seq 1 8); do
cat > cpj << EOF
parm win$ii/ionized.parm7.$ii
trajin win$ii/5tgtus.x.$ii 
autoimage
strip :WAT,K+,Na+,Cl- outprefix stripped
rms first @CA
trajout $ii.dcd dcd offset 2 
EOF
cpptraj cpj
mv stripped.ionized.parm7.$ii traj/$ii.parm7
mv $ii.dcd traj/$ii.dcd
done
rm -f cpj

RMSD

parm win1/ionized.parm7.1
reference win1/ionized.rst7.1 parm win1/ionized.parm7.1 [win1]
parm win$ii/ionized.parm7.$ii [p$ii]
trajin win$ii/5tgtus.x.$ii parm [p$ii]
rms :892-1078,1254-1300@N,CA,C ref [win1]
rms R$ii :1009-1020@N,CA,C ref [win1] out rmsd_win$ii.agr nofit
go

Native Contacts

parm win$ref/ionized.parm7.$ref
reference win$ref/ionized.rst7.$ref parm win$ref/ionized.parm7.$ref [win$ref]
parm win$ii/ionized.parm7.$ii [p$ii]
trajin win$ii/5tgtus.x.$ii parm [p$ii]
nativecontacts name NC1 :1009-1020&!@H= ref [win$ref] \
distance 3.0 \
mindist maxdist \
byresidue resoffset 0 \
writecontacts nc_win$ii.contacts \
resout nc_win$ii.resout \
out nc_win$ii.dat

Gromacs

The default cut-off scheme in GROMACS 2018 is based on classical buffered Verlet lists. These are implemented extremely efficiently on modern CPUs and accelerators, and support nearly all of the algorithms used in GROMACS.

Before version 4.6, GROMACS always used pair-lists based on groups of particles. These groups of particles were originally charge-groups, which were necessary with plain cut-off electrostatics. With the use of PME (or reaction-field with a buffer), charge groups are no longer necessary (and are ignored in the Verlet scheme). In GROMACS 4.6 and later, the group-based cut-off scheme is still available, but is deprecated since 5.0. It is still available mainly for backwards compatibility, to support the algorithms that have not yet been converted, and for the few cases where it may allow faster simulations with bio-molecular systems dominated by water.

Without PME, the group cut-off scheme should generally be combined with a buffered pair-list to help avoid artifacts. However, the group-scheme kernels that can implement this are much slower than either the unbuffered group-scheme kernels, or the buffered Verlet-scheme kernels. Use of the Verlet scheme is strongly encouraged for all kinds of simulations, because it is easier and faster to run correctly. In particular, GPU acceleration is available only with the Verlet scheme.

The Verlet scheme uses properly buffered lists with exact cut-offs. The size of the buffer is chosen with verlet-buffer-tolerance to permit a certain level of drift. Both the LJ and Coulomb potential are shifted to zero by subtracting the value at the cut-off. This ensures that the energy is the integral of the force. Still it is advisable to have small forces at the cut-off, hence to use PME or reaction-field with infinite epsilon.

Prepare the system (Protein only)

Justin's tutorials are always the good starting points (here), especially the "Lysozyme in water" one (here).

Simple steps to prepare a system, and it usually works!

rm -f editconf.pdb solvate.gro ions.tpr ionized.pdb

cat > ions.mdp << EOF
integrator  = steep
emtol       = 1000.0
emstep      = 0.01
nsteps      = 50000
nstlist         = 1
cutoff-scheme   = Verlet
ns_type         = grid
coulombtype     = PME
rcoulomb        = 1.0
rvdw            = 1.0
pbc             = xyz
EOF

#gmx pdb2gmx -f mol.pdb

# Create a cubic box of 2 nm on each side and place the system to the center and align its principle axes to the reference axes
# -princ usually helps to reduce your system size
echo 0 | gmx editconf -f conf.gro -o editconf.pdb -princ -center 3.7 3.4 2.4 -box 8.0 12.2 5.2
# Solvate the box
gmx solvate -cp editconf.pdb -o solvate.gro -p topol.top
# Add ions to make it neutral and of 0.15 M NaCl
# If you want KCl, add -pname K
gmx grompp -f ions.mdp -c solvate.gro -o ions.tpr -p topol.top -maxwarn 1 >& LOG
echo 6 | gmx genion -s ions.tpr -o ionized.pdb -conc 0.15 -neutral -p topol.top
echo q | gmx make_ndx -f ionized.pdb 

rm -f \#* editconf.pdb solvate.gro ions.mdp ions.tpr mdout.mdp

Things to note:

  1. You can find ions.mdp in Justin's tutorial or here (it does not even matter).
  2. Biological systems might require surprisingly large boxes, if you are interested, read this.
  3. You can consider using -d 1.2 if you are confident that your system does not extend its dimension a lot.

Standard MD (Protein only)

A standard MD includes steps as follows:

  1. Energy minimization
  2. Annealing
  3. Pre-equilibrium
  4. Production run

Here are some standard run scripts.

gmx mdrun -pin on -ntmpi 1 -ntomp 12 -v -deffnm 

Things to note:

  1. In step2-3, position restraints have been added to favor the initial structures (usually crystal)
  2. You should consider release the restraints step-wisely (usually people are just too lazy to do so)

Pulling/Umbrella Sampling

Justin's tutorial: here

Constant Velocity

pull                    = yes
pull_ncoords            = 1         ; only one reaction coordinate
pull_ngroups            = 2         ; two groups defining one reaction coordinate
pull_group1_name        = Chain_A
pull_group2_name        = Chain_B
pull_coord1_type        = umbrella  ; harmonic potential
pull_coord1_geometry    = distance  ; simple distance increase
pull_coord1_dim         = N N Y     ; pull along z
pull_coord1_groups      = 1 2       ; groups 1 (Chain A) and 2 (Chain B) define the reaction coordinate
pull_coord1_start       = yes       ; define initial COM distance > 0
pull_coord1_rate        = 0.01      ; 0.01 nm per ps = 10 nm per ns
pull_coord1_k           = 1000      ; kJ mol^-1 nm^-2

Constant Force

pull                    = yes
pull_ncoords            = 1         ; only one reaction coordinate
pull_ngroups            = 2
pull_group1_name        = H1end
pull_group2_name        = H3end
pull_coord1_type        = constant-force
pull_coord1_geometry    = direction
pull_coord1_vec         = 0 1 0     ; pull along y
pull_coord1_groups      = 1 2
pull_coord1_k           = 1000      ; kJ mol^-1 nm^-2

Trajectory manipulation

Usually you will have multiple trajectories from production runs, you would like to combine them and remove the PBC effects in order to calculate something or make videos.

gmx trjcat -f *.xtc -o xtc -dt 100 -n ndx

Notes:

  1. -settime if you did not continue
gmx trjconv -f pre.trr -o unwrap.xtc -s tpr -n ndx -pbc mol (-dt 100)
gmx trjconv -f unwrap.xtc -o fit.xtc -s tpr -n ndx -fit rot+trans

Notes:

  1. Usually protein for least squares fit and centering
  2. Try "-pbc nojump" if "mol" does not work
  3. "-pbc nojump" is good for dimer; "-pbc mol" is good for multimer
  4.  !!!Never use "-center" with "-pbc mol"!!!

Split

echo 0 | gmx trjconv -f ed.trr -s ../ed.tpr -split 20
for ii in $(seq 0 10); do
   jj=$(($ii + 1))
   echo 0 | gmx trjconv -f trajout$ii.xtc -s ../ed.tpr -skip 20 -o $jj.gro
done
rm -f trajout$ii.xtc \#*

Anton

Alt 1:

Load dtr in vmd

pbc unwrap -all
animate write trr
gmx trjcat -o xtc
gmx trjconv -pbc nojump -o unwrap.xtc

Alt 2:

Load dtr in vmd

animate write trr

Load trr in vmd

pbc unwrap -all
animate write trr
gmx trjcat -o unwrap.xtc

Alt 3:

Load dtr in vmd

animate write trr
gmx trjcat -f *.trr -o xtc -cat
gmx trjconv -f xtc -o xtc -timestep 120
gmx trjconv -s tpr -pbc res -center -o unwrap.xtc

General Q&A

Should I re-equilibrate after putting the complex into a bigger water box?

In theory, using a larger water box won't significantly affect conformational sampling of the solute as you are adding bulk waters. Of course, re-solvating your solute adds (kind of) random water molecules (and ions) around the solute so you have to equilibrate them a bit, but definitely not a long run. That's why I always use a large water box (enough for the pulling) in the first place so I can do the pulling in the same simulation setup. If your system is not huge (>1M atoms), it does not really "waste" much computing time.

Barostat: berendsen vs parrinello rahman vs mttk?

The short answer is Parrinello Rahman. Berendsen does not produce the correct thermodynamic ensemble and mttk is just for md-vv which we will rule it out later.

Thermostat: berendsen vs nose hoover vs anderson vs v-rescale?

"v-rescale". I was actually a NAMD guy so as a big fan of Langevin dynamics. But "v-rescale" really does a good job (10.1063/1.3073889: original paper by Bussi) and I highly recommend it on Gromacs over its Langevin dynamics implementation (discussed later).

Integrator

I'm using the standard labeled "md" but there seem to be many other options each with some sort of advantage. I will recommend "md" plus "v-rescale" for thermostat. If you really want Langevin dynamics, go with "sd" here and "tcoupl" options will be ignored. Be caution that Langevin dynamics in Gromacs could cost significant computation time especially with constraints and in parallel, therefore "md"+"v-rescale" is a modern choice for umbrella sampling in Gromacs. "md-vv" and "md-vv-atek" uses velocity Verlet for integrating Newton's equation of motion which is kind of unnecessary here. Btw "md" uses the leap-frog so they are analytically the same. Other options related to energy minimisations or normal mode calculations are self-explainable and have nth to do with MD runs.

PME: There is a secondary type that is supposed to have an " influence function optimized for the grid", and I don't know whether that matters for any of what I do

By experience, as long as you have decided the "rcoulomb" to use (default is 1.0, I prefer 1.4), default values of the PME setting give you pretty good dynamics. Note that rcoulomb might be adjusted by PME auto tuning afterwards, but not much. The only trick left is optimising number of ranks for PME calculation which also can be estimated by using tune_pme provided by Gromacs. It is also to my experience that in the era of extremely powerful GPUs, optimising the PME ranks only grant you <20% more efficiency which are maybe just few hours of walltime for small systems.

Simulated annealing: Should I be curving temperature rise in some way? How slowly should the annealing be done?

Should I be annealing after pulling? Curving temperature rise usually benefits the stability of the process of getting your system into dynamics. In general it will not affect the physics of your ultimate samplings. I don't think annealing after pulling is necessary. I am sharing my annealing steps for your reference.

annealing               = single single
annealing-npoints       = 5 5
annealing-time          = 0 100 200 300 400 0 100 200 300 400
annealing-temp          = 60 120 180 240 310 60 120 180 240 310

Restraints: How do you select atoms for restraint other than backbone atoms? Do you use rmsd values from an md run to check flexibility?

Not sure whether I understood your question. The "pull_group1_name" option is specifying the selected atoms. You have to prepare an index file using make_ndx provided by Gromacs selecting the atoms you want and give the group name to this option. Same to the "pull_group2_name". Many of the parameters are the same as in the pulling procedure, with the notable exception of "pull_coord1_rate", which has now been set to 0.0. That is how we restrain it to a reference value. Setting "pull_coord1_start = yes" means that the initial COM distance is the reference distance. In the old times, we have to define a reference (pull_coord1_init) separately for each configuration. Now "pull_coord1_start = yes" changes the value of "pull_coord1_init" for you.

Velocity Generation: When should/does it need to be used?

When you do not anneal. For example, after pulling, you can simply generate velocities from a Maxwell distribution of 310K.

What if the system is not stable

In general if you follow the tutorials you will get most of the physics correct at least. There might be some subtle points like Nose-Hoover might not be the best choice for umbrella sampling if you restraint a large portion of your system, etc. They are less likely to affect your scientific conclusion especially for very complex biological systems, as there are more intrinsic problems such as insufficient samplings or hidden energy barriers. Nevertheless, if you are experiencing difficulty to put a system smoothly into dynamics, this might be something you want to read http://manual.gromacs.org/documentation/current/user-guide/terminology.html. Or feel free to drop me an email so maybe I can share some of my experience.