How to do MDs
Contents
- 1 General Q&A
- 1.1 Should I re-equilibrate after putting the complex into a bigger water box?
- 1.2 Barostat: berendsen vs parrinello rahman vs mttk?
- 1.3 Thermostat: berendsen vs nose hoover vs anderson vs v-rescale?
- 1.4 Integrator: I'm using the standard labeled "md" but there seem to be many other options each with some sort of advantage
- 1.5 PME: There is a secondary type that is supposed to have an " influence function optimized for the grid", and I don't know whether that matters for any of what I do
- 1.6 Simulated annealing: Should I be curving temperature rise in some way? How slowly should the annealing be done?
- 1.7 Restraints: How do you select atoms for restraint other than backbone atoms? Do you use rmsd values from an md run to check flexibility?
- 1.8 Velocity Generation: When should/does it need to be used?
- 1.9 What if the system is not stable
- 2 Amber
- 3 Gromacs
General Q&A
Should I re-equilibrate after putting the complex into a bigger water box?
In theory, using a larger water box won't significantly affect conformational sampling of the solute as you are adding bulk waters. Of course, re-solvating your solute adds (kind of) random water molecules (and ions) around the solute so you have to equilibrate them a bit, but definitely not a long run. That's why I always use a large water box (enough for the pulling) in the first place so I can do the pulling in the same simulation setup. If your system is not huge (>1M atoms), it does not really "waste" much computing time.
Barostat: berendsen vs parrinello rahman vs mttk?
The short answer is Parrinello Rahman. Berendsen does not produce the correct thermodynamic ensemble and mttk is just for md-vv which we will rule it out later.
Thermostat: berendsen vs nose hoover vs anderson vs v-rescale?
"v-rescale". I was actually a NAMD guy so as a big fan of Langevin dynamics. But "v-rescale" really does a good job (10.1063/1.3073889: original paper by Bussi) and I highly recommend it on Gromacs over its Langevin dynamics implementation (discussed later).
Integrator: I'm using the standard labeled "md" but there seem to be many other options each with some sort of advantage
"md". I recommend "md" plus "v-rescale" for thermostat. If you really want Langevin dynamics, go with "sd" here and "tcoupl" options will be ignored. Be caution that Langevin dynamics in Gromacs could cost significant computation time especially with constraints and in parallel, therefore "md"+"v-rescale" is a modern choice for umbrella sampling in Gromacs. "md-vv" and "md-vv-atek" uses velocity Verlet for integrating Newton's equation of motion which is kind of unnecessary here. Btw "md" uses the leap-frog so they are analytically the same. Other options related to energy minimisations or normal mode calculations are self-explainable and have nth to do with MD runs.
PME: There is a secondary type that is supposed to have an " influence function optimized for the grid", and I don't know whether that matters for any of what I do
By experience, as long as you have decided the "rcoulomb" to use (default is 1.0, I prefer 1.4), default values of the PME setting give you pretty good dynamics. Note that rcoulomb might be adjusted by PME auto tuning afterwards, but not much. The only trick left is optimising number of ranks for PME calculation which also can be estimated by using tune_pme provided by Gromacs. It is also to my experience that in the era of extremely powerful GPUs, optimising the PME ranks only grant you <20% more efficiency which are maybe just few hours of walltime for small systems.
Simulated annealing: Should I be curving temperature rise in some way? How slowly should the annealing be done?
Should I be annealing after pulling? Curving temperature rise usually benefits the stability of the process of getting your system into dynamics. In general it will not affect the physics of your ultimate samplings. I don't think annealing after pulling is necessary. I am sharing my annealing steps for your reference.
annealing = single single annealing-npoints = 5 5 annealing-time = 0 100 200 300 400 0 100 200 300 400 annealing-temp = 60 120 180 240 310 60 120 180 240 310
Restraints: How do you select atoms for restraint other than backbone atoms? Do you use rmsd values from an md run to check flexibility?
Not sure whether I understood your question. The "pull_group1_name" option is specifying the selected atoms. You have to prepare an index file using make_ndx provided by Gromacs selecting the atoms you want and give the group name to this option. Same to the "pull_group2_name". Many of the parameters are the same as in the pulling procedure, with the notable exception of "pull_coord1_rate", which has now been set to 0.0. That is how we restrain it to a reference value. Setting "pull_coord1_start = yes" means that the initial COM distance is the reference distance. In the old times, we have to define a reference (pull_coord1_init) separately for each configuration. Now "pull_coord1_start = yes" changes the value of "pull_coord1_init" for you.
Velocity Generation: When should/does it need to be used?
When you do not anneal. For example, after pulling, you can simply generate velocities from a Maxwell distribution of 310K.
What if the system is not stable
In general if you follow the tutorials you will get most of the physics correct at least. There might be some subtle points like Nose-Hoover might not be the best choice for umbrella sampling if you restraint a large portion of your system, etc. They are less likely to affect your scientific conclusion especially for very complex biological systems, as there are more intrinsic problems such as insufficient samplings or hidden energy barriers. Nevertheless, if you are experiencing difficulty to put a system smoothly into dynamics, this might be something you want to read http://manual.gromacs.org/documentation/current/user-guide/terminology.html. Or feel free to drop me an email so maybe I can share some of my experience.
Amber
Manual here
Prepare the system (Protein(s) only)
Prepare your protein by removing all hydrogens (like -ignh for pdb2gmx) and feed it to tleap.
$ cat build.leap source leaprc.protein.ff14SB source leaprc.RNA.OL3 source leaprc.DNA.OL15 source leaprc.water.tip3p mol = loadpdb mol.pdb solvatebox mol TIP3PBOX 10 addions mol K+ 154 Cl- 91 saveamberparm mol ionized.parm7 ionized.rst7
Points to be caution and ALWAYS save you some time:
- Properly distinguish all the components by adding TER cards (read this)
- Check your C-terminal which usually requires some fix: If you already have two oxygens, change them into O and OXT
- Consider a fix.tcl (see below)
- Calculate number of ions using this. You may do a trial run of tleap to confirm the net charge of your solute
- If a PDB file can go through pdb2gmx, it will be probably fine for tleap
$ cat fix.tcl mol new mol.pdb set sel [atomselect top "resname ILE and name CD"] $sel set name CD1 set sel [atomselect top "name OC1 OT1"] $sel set name O set sel [atomselect top "name OC2 OT2"] $sel set name OXT set sel [atomselect top all] $sel writepdb mol.pdb
The ionized.parm7 and ionized.rst7 are now ready for simulations!
MD parameters
Some sample parameters can be found here and the run commands are
export prmtop=ionized.parm7 pmemd.cuda -O -i 01_min.in -o 01_min.out -p $prmtop -c ionized.rst7 -r 01_min.rst -ref ionized.rst7 pmemd.cuda -O -i 02_heat.in -o 02_heat.out -p $prmtop -c 01_min.rst -r 02_heat.rst -x 02_heat.nc -ref 01_min.rst pmemd.cuda -O -i 03_eq.in -o 03_eq.out -p $prmtop -c 02_heat.rst -r 03_eq.rst -x 03_eq.nc -ref 02_heat.rst pmemd.cuda -O -i 05_Pull.in -o 05_Pull.out -p $prmtop -c 03_eq.rst -r 05_Pull.rst -x 05_Pull.nc pmemd.cuda -O -i 06_Prod.in -o 06_Prod.out -p $prmtop -c 05_Pull.rst -r 06_Prod.rst -x 06_Prod.nc
Here is a detailed version of md.in
npt 300K &cntrl ! MD Controls ! [imin] 0 (default) for MD, 1 for energy minimization, 5 for re-reun imin=0, ! [nmropt] 0 (default), 1 for NMR analysis e.g. COM.rst in US !nmropt=1 ! [ntx] 0 (default) only reads coordinates, 1 reads coordinates and velocities ntx=5, ! [irest] 0 (default) do not restart, 1 reads coordinates and velocities from restart file. If irest=1, ntx=5 irest=1, ! [nstlim] Number of steps nstlim=250000000, ! [dt] Time step (psec) dt=0.002, ! [ntpr] Output frequency of energy info ntpr=500, ! [ntwr] Output frequency of restart file ntwr=500, ! [ntwx] Output frequency of trajectory ntwx=5000, ! [ntr] 0 (default). If you restrain any atom, ntr=1 ntr=1, ! Temperature regulation ! [ntt] 0 for NVE, 3 for Langevin dynamics ntt=3, ! [temp0] 300 (default) temp0=300, ! [gamma_ln] (ps-1) 0 (default) Recommended value 2.0-5.0 gamma_ln=5.0, ! Pressure regulation ! [ntp] 0 (default), 1 for isotropic, 2 for anisotropic (membrane system) ntp=1, ! [barostat] 1 (default) for Berendsen, 2 for MC barostat (recommended) barostat=2, ! [pres0] 1.0 (default) pres0=1.0, ! [taup] (ps) 1.0 (default), recommended 1.0-5.0 taup=2.0, ! Bond length constraints ! If TIP3P, ntc=ntf=2 ! [ntc] 1 (default) for no constraint, 2 for h-bonds, 3 for all-bonds ntc=2, ! Genertic parameters ! [ntf] 1 (default), 2 for bond interactions involving H-atoms omitted (recommended) ntf=2, ! [ntb] Periodic boundaries or not. 1 for constant volume, 2 for constant pressure. ntb=2, ! [cut] (A) Non-bonded cutoff. 8.0 (default) cut=10.0 / First restrain - 3' of TS 1.0 FIND N3 * * * SEARCH RES 1368 1368 END Second restrain - 5' of NTS 1.0 FIND N1 * * * SEARCH RES 1369 1369 END Third restrain - CA of residue 1215 1.0 FIND CA * * * SEARCH RES 1215 1215 END END
Distance Restraints
With distance restraints, you can easily initiate biased MDs or umbrella sampling simulations.
Remember add the following to the MD parameter:
DISANG=COM_dist.RST DUMPAVE=05_Pull_dist.dat
With iat=atom_index1,atom_index2, you restrain between two specific atoms:
&rst iat=6487,20697, r1=-199.0, r2=DISTHERE, r3=DISTHERE, r4=199.0, rk2=1.0, rk3=1.0, iresid=0, fxyz=1,1,1, outxyz=1, /
With iat=-1,-1, you can specify groups of atoms (COM) to be restrained using igr1 and igr2:
&rst iat=-1,-1, r1=-199.0, r2=DISTHERE, r3=DISTHERE, r4=199.0, rk2=1.0, rk3=1.0, iresid=0, fxyz=1,1,1, outxyz=1, igr1=6487,6507,6529,6543,6555,6577,6588,6607,6621,6633,6652,6663,6680,6697,6713,6733,6745, igr2=20697,20709,20719,20733,20740,20750,20771,20788,20807,20814,20833, /
MD trajectories
By default, AMBER gives trajectories and restart coordinates in NETCDF format. Load them into VMD using
mol addfile out.rst type netcdf
Or convert the restart coordinates to PDB for easier manipulation
cpptraj -p ionized.parm7 -y in.rst7 -x out.pdb
CPPTRAJ
You can first mask/strip your trajectories remotely and then transfer a much smaller trajectory file to local for analysis. There are two ways you can reduce the size of your trajectories: 1. remove solvent; 2. less frequent frames. Usually you would like to do both.
parm ionized.parm7 trajin md.nc strip :WAT,K+,Cl- outprefix stripped trajout stripped.dcd dcd offset 1000
Two more things you usually would like to add
autoimage: automatically centre and image trajectory concerning pbc rms: it calculates the rmsd but you wanna use it to align the structure to reference (e.g. fist frame)
parm ionized.parm7 trajin md.nc autoimage strip :WAT,K+,Cl- outprefix stripped rms first @CA trajout stripped.dcd dcd offset 1000
If you wanna output a trajectory of restart coordinates, use:
trajin md.rst
If you wanna output multiple PDBs for each frame:
trajout frame.pdb pdb multi
Easy strip multiple trials of runs
#!/bin/bash mkdir -p traj for ii in $(seq 1 8); do cat > cpj << EOF parm win$ii/ionized.parm7.$ii trajin win$ii/5tgtus.x.$ii autoimage strip :WAT,K+,Cl- outprefix stripped rms first @CA trajout $ii.dcd dcd offset 2 EOF cpptraj cpj mv stripped.ionized.parm7.$ii traj/$ii.parm7 mv $ii.dcd traj/$ii.dcd done rm -f cpj
RMSD
parm win1/ionized.parm7.1 reference win1/ionized.rst7.1 parm win1/ionized.parm7.1 [win1] parm win$ii/ionized.parm7.$ii [p$ii] trajin win$ii/5tgtus.x.$ii parm [p$ii] rms :892-1078,1254-1300@N,CA,C ref [win1] rms R$ii :1009-1020@N,CA,C ref [win1] out rmsd_win$ii.agr nofit go
Native Contacts
parm win$ref/ionized.parm7.$ref reference win$ref/ionized.rst7.$ref parm win$ref/ionized.parm7.$ref [win$ref] parm win$ii/ionized.parm7.$ii [p$ii] trajin win$ii/5tgtus.x.$ii parm [p$ii] nativecontacts name NC1 :1009-1020&!@H= ref [win$ref] \ distance 3.0 \ mindist maxdist \ byresidue resoffset 0 \ writecontacts nc_win$ii.contacts \ resout nc_win$ii.resout \ out nc_win$ii.dat
Gromacs
Prepare the system (Protein only)
Justin's tutorials are always the good starting points (here), especially the "Lysozyme in water" one (here).
Simple steps to prepare a system, and it usually works!
rm -f editconf.pdb solvate.gro ions.tpr ionized.pdb cat > ions.mdp << EOF integrator = steep emtol = 1000.0 emstep = 0.01 nsteps = 50000 nstlist = 1 cutoff-scheme = Verlet ns_type = grid coulombtype = PME rcoulomb = 1.0 rvdw = 1.0 pbc = xyz EOF #gmx pdb2gmx -f mol.pdb # Create a cubic box of 2 nm on each side and place the system to the center and align its principle axes to the reference axes # -princ usually helps to reduce your system size echo 0 | gmx editconf -f conf.gro -o editconf.pdb -princ -center 3.7 3.4 2.4 -box 8.0 12.2 5.2 # Solvate the box gmx solvate -cp editconf.pdb -o solvate.gro -p topol.top # Add ions to make it neutral and of 0.15 M NaCl # If you want KCl, add -pname K gmx grompp -f ions.mdp -c solvate.gro -o ions.tpr -p topol.top -maxwarn 1 >& LOG echo 6 | gmx genion -s ions.tpr -o ionized.pdb -conc 0.15 -neutral -p topol.top echo q | gmx make_ndx -f ionized.pdb rm -f \#* editconf.pdb solvate.gro ions.mdp ions.tpr mdout.mdp
Things to note:
- You can find ions.mdp in Justin's tutorial or here (it does not even matter).
- Biological systems might require surprisingly large boxes, if you are interested, read this.
- You can consider using -d 1.2 if you are confident that your system does not extend its dimension a lot.
Standard MD (Protein only)
A standard MD includes steps as follows:
- Energy minimization
- Annealing
- Pre-equilibrium
- Production run
Here are some standard run scripts.
gmx mdrun -pin on -ntmpi 1 -ntomp 12 -v -deffnm
Things to note:
- In step2-3, position restraints have been added to favor the initial structures (usually crystal)
- You should consider release the restraints step-wisely (usually people are just too lazy to do so)
Pulling/Umbrella Sampling
Justin's tutorial: here
Constant Velocity
pull = yes pull_ncoords = 1 ; only one reaction coordinate pull_ngroups = 2 ; two groups defining one reaction coordinate pull_group1_name = Chain_A pull_group2_name = Chain_B pull_coord1_type = umbrella ; harmonic potential pull_coord1_geometry = distance ; simple distance increase pull_coord1_dim = N N Y ; pull along z pull_coord1_groups = 1 2 ; groups 1 (Chain A) and 2 (Chain B) define the reaction coordinate pull_coord1_start = yes ; define initial COM distance > 0 pull_coord1_rate = 0.01 ; 0.01 nm per ps = 10 nm per ns pull_coord1_k = 1000 ; kJ mol^-1 nm^-2
Constant Force
pull = yes pull_ncoords = 1 ; only one reaction coordinate pull_ngroups = 2 pull_group1_name = H1end pull_group2_name = H3end pull_coord1_type = constant-force pull_coord1_geometry = direction pull_coord1_vec = 0 1 0 ; pull along y pull_coord1_groups = 1 2 pull_coord1_k = 1000 ; kJ mol^-1 nm^-2
Trajectory manipulation
Usually you will have multiple trajectories from production runs, you would like to combine them and remove the PBC effects in order to calculate something or make videos.
gmx trjcat -f *.xtc -o xtc -dt 100 -n ndx
Notes:
- -settime if you did not continue
gmx trjconv -f pre.trr -o unwrap.xtc -s tpr -n ndx -pbc mol (-dt 100) gmx trjconv -f unwrap.xtc -o fit.xtc -s tpr -n ndx -fit rot+trans
Notes:
- Usually protein for least squares fit and centering
- Try "-pbc nojump" if "mol" does not work
- "-pbc nojump" is good for dimer; "-pbc mol" is good for multimer
- !!!Never use "-center" with "-pbc mol"!!!
Split
echo 0 | gmx trjconv -f ed.trr -s ../ed.tpr -split 20 for ii in $(seq 0 10); do jj=$(($ii + 1)) echo 0 | gmx trjconv -f trajout$ii.xtc -s ../ed.tpr -skip 20 -o $jj.gro done rm -f trajout$ii.xtc \#*
Anton
Alt 1:
Load dtr in vmd
pbc unwrap -all animate write trr
gmx trjcat -o xtc gmx trjconv -pbc nojump -o unwrap.xtc
Alt 2:
Load dtr in vmd
animate write trr
Load trr in vmd
pbc unwrap -all animate write trr
gmx trjcat -o unwrap.xtc
Alt 3:
Load dtr in vmd
animate write trr
gmx trjcat -f *.trr -o xtc -cat gmx trjconv -f xtc -o xtc -timestep 120 gmx trjconv -s tpr -pbc res -center -o unwrap.xtc