How to do dockings

From ChengLab
Revision as of 00:40, 7 March 2021 by Kevin (talk | contribs) (Combine the ligand with the receptor (proteins))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Prerequisite

Prepare receptor

You can use ADFSuite (Download here) or raccoon. In general you have to convert PDB to PDBQT which contains partial charge info.

Checklist

  1. Open your protein PDB in Chimera/VMD/PyMOL and LOOK AT IT
  2. Any missing residues
  3. Any unwanted ligands
  4. Add hydrogens (using Chimera/Gromacs/MGLTools)

Install

./install.sh -d ~/opt/ADFRsuite-1.0 -c 0

Usage

prepare_receptor -v -r <Input PDB> -o receptor.pdbqt

Prepare ligands

You can use ADFSuite (Download here) or Openbabel.

Usage

prepare_ligand -v -l <Input PDB>

Benchmark

Autodock 4: 50 LGA, 2,500,000 evaluation each

Autodock Vina: max. mode=50

Resolution=0.375 A

Box-pointse=80x80x80

Box-size=30x30x30 (A^3)

(compounds/day) OSC (30 nodes) Amber machine (1 node, 4 GPUs)
Autodock 4 79,120 (Potentially 1.3x)
Vina 300,000 13,186

It seems that Vina can be accelerated by ~50% if you compile it well (Read this), but still lagging behind Autodock-GPU (Git here) on a multi-GPU machine.

Libraries

ZINC

ChemBridge

Glide

Ligprep

$SCHRODINGER/ligprep -WAIT -HOST ${host}:${nproc} -isd combine.sdf -omae out.maegz -bff 14 -i 2 -W i,-ph,7.0,-pht,0.0 -ac -s 10

Create a grid box

$SCHRODINGER/utilities/prepwizard recpt.pdb recpt.maegz

Feed this to glide

USECOMPMAE YES
INNERBOX 15, 15, 15
ACTXRANGE 26.407652
ACTYRANGE 26.407652
ACTZRANGE 26.407652
GRID_CENTER 192, 202, 216
OUTERBOX 26.407652, 26.407652, 26.407652
GRIDFILE "grid.zip"
RECEP_FILE "recpt.maegz"

The gridprep command

$SCHRODINGER/glide -WAIT -HOST ${host}:${nproc} gridprep.in

Docking

#PBS -A PAA0203
#PBS -N SDGR-dock
#PBS -j oe
#PBS -m ae
#PBS -M chan.773@osu.edu
#PBS -l walltime=1:00:00
#PBS -l mem=4315MB
#PBS -l software=glide+1
#PBS -S /bin/sh

qstat -f $PBS_JOBID
module load schrodinger
echo "Success loading schrodinger"
cd $PBS_O_WORKDIR
echo -e "Working in $(pwd)"
host=`cat $PBS_NODEFILE|head -1`
nproc=`cat $PBS_NODEFILE|wc -l`
$SCHRODINGER/glide -WAIT -HOST ${host}:${nproc} docking.in
date

tmp=$(date +%Y%m%d%H%M%S)
mkdir -p $tmp
cd $tmp
$SCHRODINGER/run split_structure.py ../docking_pv.maegz out.pdb -m pdb -many_files
for ii in out_ligand*.pdb; do
 prefix=$(basename $ii .pdb)
 $SCHRODINGER/utilities/structconvert -ipdb $ii -omol $prefix.mol2
done
$ cat docking.in
GRIDFILE grid.zip
DOCKING_METHOD confgen
PRECISION XP
LIGANDFILE lig.maegz
WRITEREPT True

Check Running Glide

If you wanna check who is running glide (as OSC only bought 10 licenses):

licadmin STAT

Therefore you may NOT be able to run glide at the OSC master node (even for testing). Also, remember to add a PBS option when you submit glide to avoid job cancellation due to too many people running glide

#PBS -l software=glide+1

Autodock

Compilation

Manual here

make DEVICE=GPU NUMWI=128

Get binding poses

A typical Autodock output looks like this:

MODEL      801
USER    Run = 801
USER    Cluster Rank = 1
USER    Number of conformations in this cluster = 22
USER  
USER    RMSD from reference structure       = 58.478 A
USER  
USER    Estimated Free Energy of Binding    =  -11.77 kcal/mol  [=(1)+(2)+(3)-(4)]
USER    Estimated Inhibition Constant, Ki   =    2.38 nM (nanomolar)  [Temperature = 298.15 K]
USER    
USER    (1) Final Intermolecular Energy     =  -13.85 kcal/mol
USER        vdW + Hbond + desolv Energy     =  -13.61 kcal/mol
USER        Electrostatic Energy            =   -0.24 kcal/mol
USER    (2) Final Total Internal Energy     =   -2.04 kcal/mol
USER    (3) Torsional Free Energy           =   +2.09 kcal/mol
USER    (4) Unbound System's Energy  [=(2)] =   -2.04 kcal/mol
USER    
USER    
USER  
USER    DPF = WD40.dpf
USER    NEWDPF move	teg.pdbqt
USER    NEWDPF about	3.758800 0.183900 -6.839200
USER    NEWDPF tran0	9.906563 33.794136 47.044031
USER    NEWDPF axisangle0	-0.222817 -0.959206 0.174002 107.688080
USER    NEWDPF quaternion0	-0.179906 -0.774476 0.140492 0.589985
USER    NEWDPF dihe0	-162.13 -159.47 -121.18 -26.65 178.53 33.91 99.44 
USER  
USER                              x       y       z    vdW   Elec        q     RMS 
ATOM      1  C15 RMT     0      13.047  32.564  44.515 -0.28 -0.00    +0.075    58.478

You will want to fetch the lines start with "ATOM" and remove the last two columns to have:

ATOM      1  C15 RMT     0      13.047  32.564  44.515 -0.28 -0.00
.
.
.
TER

Remember to add "TER" after the coordinates.

awk '{print gensub (/[^[:blank:]]+/, " ", 11)}'

is useful to remove unwanted columns (the 11th) from coordinates produced by docking softwares.

Autodock Vina

Prepare all ligands (.pdbqt) in a directory ligand/pdbqt.

Prepare all receptors (.pdbqt) in a directory receptor.

Split ligands by

python2 split.py

The configuration file looks like this:

center_x = 77.009
center_y = 80.221
center_z = 60.623
size_x = 28
size_y = 28
size_z = 26
num_modes = 100

Usage

The basic run command is

vina --config <config> --receptor <pdbqt> --ligand <pdbqt> --out out.pdbqt --log log.txt

Get results using python

#! /usr/bin/env python

import sys
import glob

def doit(n):
   file_names = glob.glob('*/*.pdbqt')
   everything = []
   failures = []
   print 'Found', len(file_names), 'pdbqt files'
   for file_name in file_names:
       file = open(file_name)
       lines = file.readlines()
       file.close()
       try:
           line = lines[1]
           result = float(line.split(':')[1].split()[0])
           everything.append([result, file_name])
       except:
           failures.append(file_name)
   everything.sort(lambda x,y: cmp(x[0], y[0]))
   part = everything[:n]
   for p in part:
       print p[1],
   print
   if len(failures) > 0:
       print 'WARNING:', len(failures), 'pdbqt files could not be processed'

if __name__ == '__main__':
   doit(int(sys.argv[1]))

Combine the ligand with the receptor (proteins)

  1. Make a namelist
  2. grep receptor coordinates from receptor.pdbqt
  3. grep ligand coordinates from out.dat
  4. cat together
DIR_DOCKRES=output_aa
RECEPTOR=receptor.pdbqt
ls $DIR_DOCKRES > namelist
for filename in $(cat namelist)
do
   ZINC=$(basename $filename)
   sed -n '/^MODEL 1$/,/^ENDMDL$/p' $DIR_DOCKRES/$ZINC/out.pdbqt > tmp
   grep '^HETATM' tmp > tmp2
   mv tmp2 tmp
   head -n -1 $RECEPTOR > combine.pdb
   cat tmp >> combine.pdb
   echo -e "TER\nENDMDL" >> combine.pdb
   mv combine.pdb $DIR_DOCKRES/$ZINC
done
rm -f tmp

Rendering

It is assumed that the receptors are pre-aligned.

  1. Prepare a sample VMD visualisation state
  2. rename the ligand to LIG
  3. standard render

Contact Statistics

General steps are:

  1. Get ready with PDBs combining the ligands and the receptor
  2. make_res.sh: using VMD to print a list of residue ID which is within 3 A of the ligand
  3. make_reslist.py: turn the ResID list into histogram
  4. Plot with plot_bar.m using MATLAB

Select out the top-ranked ligands

Make a namelist

head -25 sort | awk '{print $1}' | cut -d/ -f1-5 > tmp

Move the directories

for ii in $(cat tmp); do cp -r $ii res; done