How to do umbrella sampling

From ChengLab
Revision as of 14:56, 26 October 2020 by Kevin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Amber

Tutorial here Git here

PMF over RMSD

Prepare AMBER topologies

for ii in $(seq 1 32)
do
   mkdir $ii
   cd $ii
   cat > tclfix << EOF                                       # Fix your PDB file here so that it can be recognised by tleap
   mol new ../../pdb/crd.pdb.$ii type pdb
   set sel [atomselect top all]
   \$sel set beta 0
   set sel [atomselect top "not hydrogen"]
   \$sel set beta 1
   set sel [atomselect top "resid 1334 and name P OP1 OP2"]
   \$sel set beta 0
   set sel [atomselect top "resid 1 to 1300"]
   \$sel set chain A
   set sel [atomselect top "resid 1301 to 1333"]
   \$sel set chain B
   set sel [atomselect top "resid 1334 to 1356"]
   \$sel set chain C
   set sel [atomselect top "resid 1357 to 1379"]
   \$sel set chain D
   set sel [atomselect top "beta > 0"]
   \$sel writepdb mol.pdb
   quit
EOF
   cat > leapin << EOF                                       # Tleap parameter file; Box size, ion concentration, etc. 
   source leaprc.protein.ff14SB
   source leaprc.RNA.OL3
   source leaprc.DNA.bsc1
   source leaprc.water.tip3p
   mol = loadpdb mol.pdb
   solvatebox mol TIP3PBOX 10
   addions mol K+ 157 Cl- 93
   saveamberparm mol ionized.parm7 ionized.rst7
   quit
EOF
   vmd -dispdev text -e tclfix
   tleap -f leapin
   rm -f leapin tclfix mol.pdb
   cd ..
done

Prepare the collective variable (e.g. RMSD)

for ii in $(seq 3 32)
do
   cd $ii
   [ ! -f ionized.parm7 ] && echo "ionized.parm7 does not exist!" && exit 1
   [ ! -f ionized.rst7 ] && echo "ionized.rst7 does not exist!" && exit 1
   mv ionized.parm7 ionized.parm7.$ii
   mv ionized.rst7 ionized.rst7.$ii
   cat > tcl << 'EOF'
mol new ionized.parm7.NN type parm7
mol addfile ionized.rst7.NN type rst7
set outf [open "cv.in" "w"]
puts $outf "&colvar cv_type='MULTI_RMSD',"
set sel [atomselect top "protein and resid 892 to 1078 1254 to 1300 and name CA"].    # Define the residues you wanna restrain the RMSD
puts $outf "cv_ni=[$sel num],"
puts $outf "cv_nr=[expr [$sel num] * 3],"
puts -nonewline $outf "cv_i="
foreach ii [$sel get index] {
   set aa [atomselect top "index $ii"]
   puts -nonewline $outf "[$aa get serial],"
}
puts -nonewline $outf "\ncv_r="
foreach ii [$sel get index] {
   set aa [atomselect top "index $ii"]
   puts $outf "[$aa get x], [$aa get y], [$aa get z],"
}
puts $outf "\nanchor_position=-3,0,0,3,\nanchor_strength=5,5, /"                      # Define the strength in kcal/mol/A2
close $outf
quit
EOF
   sed "s/NN/$ii/g" tcl > tcl2
   vmd -dispdev text -e tcl2
   rm -f tcl*
   cd ..
done


PMF over distance

Pull to create windows

You may pull using your preferred MD engines, e.g. NAMD SMD/Targeted MD or Gromacs Essential Dynamics. Created a series of PDB files (removing hydrogens) and then

$ cat step1.sh
jj=0
for ii in $(seq 32.0 1.5 60.5)
do
   ipdb=$(($jj * 7))
   echo $jj $ii $ipdb
   rm -rf $ii
   mkdir -p $ii 
   cp ../pulling/${ipdb}.pdb $ii/mol.pdb
   cd $ii; vmd -dispdev text mol.pdb -e ../fix.tcl; tleap -f ../build.leap; cd ..
   jj=$(($jj+1))
done

Minimization, annealing and equalibration with CA restrained

Double check your COM_dist.RST

$ cat COM_dist.RST
&rst
iat=-1,-1,
r1=-199.0,
r2=DISTHERE,
r3=DISTHERE,
r4=199.0,
rk2=2.5,
rk3=2.5,
iresid=0,
fxyz=1,1,1,
outxyz=1,
igr1=1,2,3,
igr2=4,5,6,
/

Then simply start parallel simulations of the windows

$ cat step2.sh
jj=0
for ii in 50.0 48.5 47.0 45.5
do
   export CUDA_VISIBLE_DEVICES=$jj
   cp *.in COM_dist.RST $ii
   cd $ii
   sed -i "s/DISTHERE/${ii}/g" COM_dist.RST
   bash ../run.sh &
   cd ..
   jj=$((jj+1))
done

Restrain to the window centers before sampling

$ cat 05_Pull.in
Pull 303K
&cntrl
  imin=0, ntx=5, irest=1,
  ntc=2, ntf=2, tol=0.0000001,
  nstlim=50000, ntt=3, gamma_ln=1.0,
  temp0=303.0,
  ntpr=1000, ntwr=500000, ntwx=1000,
  dt=0.002, ig=-1,
  ntb=2, cut=10.0, ioutfm=1, ntxo=2,
  nmropt=1,
  ntp=1, pres0=1.0, comp=44.6, taup=2.0,
/
&wt type='DUMPFREQ', istep1=10 /
&wt type='END', /
DISANG=COM_dist.RST
DUMPAVE=05_Pull_dist.dat
LISTIN=POUT
LISTOUT=POUT
/
/
&ewald
 skinnb=3.0,
/

Sampling

Same as 05_Pull.in except running longer.

Analysis using WHAM

Install WHAM from here.

First you will have to prepare data files from AMBER outputs

$ cat run_prepare.sh
for ii in 38.0 51.5 53.0
do
   awk '{print $1,"",$8}' ../$ii/06_Prod_dist.dat > ../$ii/dist.dat
done

Then calculate PMF using WHAM

$ cat run_pmf.sh
wham 0 32 320 0.00000001 303 0 metadata out.pmf
sed '1d' out.pmf | awk '{print $1,"",$2}' > plot_free_energy.dat

metadata looks like this

../60.5/dist.dat 60.5 5.0
../59.0/dist.dat 59.0 5.0
../57.5/dist.dat 57.5 5.0

Gromacs

Everyone should look that Justin's tutorial first: [1]

Choosing a reaction coordinate

Pull to create windows

Run MD

MD Inputs

Include this in mdp

; Pull code
pull                    = yes
pull_ngroups            = 2
pull_ncoords            = 1
pull_group1_name        = Protein_&_r_918-923
pull_group2_name        = Protein_&_r_396-407
pull_coord1_type        = umbrella      ; harmonic biasing force
pull_coord1_geometry    = distance      ; simple distance increase
pull_coord1_groups      = 1 2
pull_coord1_dim         = N N Y
pull_coord1_rate        = 0.0
pull_coord1_k           = 1000          ; kJ mol^-1 nm^-2
pull_coord1_start       = yes           ; define initial COM distance > 0

Analyze using WHAM

cat tpr-files.dat
'../output/umbrella114.tpr'

cat pullf-files.dat
'pullf-umbrella114.xvg'

gmx wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal