How to do dockings
Contents
[hide]Prerequisite
Prepare receptor
You can use ADFSuite (Download here) or raccoon. In general you have to convert PDB to PDBQT which contains partial charge info.
Checklist
- Open your protein PDB in Chimera/VMD/PyMOL and LOOK AT IT
- Any missing residues
- Any unwanted ligands
- Add hydrogens (using Chimera/Gromacs/MGLTools)
Install
./install.sh -d ~/opt/ADFRsuite-1.0 -c 0
Usage
prepare_receptor4.py -v -r <Input PDB> -o receptor.pdbqt
Prepare ligands
You can use ADFSuite (Download here) or Openbabel.
Usage
prepare_ligand4.py -v -l <Input PDB>
Benchmark
Autodock 4: 50 LGA, 2,500,000 evaluation each
Autodock Vina: max. mode=50
Resolution=0.375 A
Box-pointse=80x80x80
Box-size=30x30x30 (A^3)
(compounds/day) | OSC (30 nodes) | Amber machine (1 node, 4 GPUs) |
---|---|---|
Autodock 4 | 79,120 (Potentially 1.3x) | |
Vina | 300,000 | 13,186 |
It seems that Vina can be accelerated by ~50% if you compile it well (Read this), but still lagging behind Autodock-GPU (Git here) on a multi-GPU machine.
Libraries
Glide
Ligprep
$SCHRODINGER/ligprep -WAIT -HOST ${host}:${nproc} -isd combine.sdf -omae out.maegz -bff 14 -i 2 -W i,-ph,7.0,-pht,0.0 -ac -s 10
ls -l
Create a grid box
Feed this to glide
GLIDECONS YES GLIDE_NTOTALCONS 1 GLIDE_NUMPOSITCONS 1 GRIDLIG NO USECOMPMAE YES INNERBOX 10, 10, 10 ACTXRANGE 26.407652 ACTYRANGE 26.407652 ACTZRANGE 26.407652 GRID_CENTER 6.928290, 6.346953, 21.799812 OUTERBOX 26.407652, 26.407652, 26.407652 GRIDFILE "factorXa_grid.zip" RECEP_FILE "factorXa_grid.maegz" GLIDECONSNAMES "position1", "S1_site" GLIDEXVOLNAMES "volume1"
Docking
glide -WAIT -HOST ${host}:${nproc} SP_cluster1_0.in
$ cat SP_cluster1_0.in GRIDFILE /users/PAS1146/osu0316/full_nci_database/cluster1/cluster1.zip DOCKING_METHOD confgen PRECISION SP LIGANDFILE /users/PAS1146/osu0316/full_nci_database/cluster1/SP/ligprep_cluster1_0-out.maegz OUTPUTDIR /users/PAS1146/osu0316/full_nci_database/cluster1/SP/ WRITEREPT True
Glide on OSC
#PBS -A PAA0030 #PBS -N SP_cluster1_0 #PBS -j oe #PBS -l walltime=100:00:00 #PBS -l nodes=1:ppn=1 #PBS -S /bin/sh #PBS -l software=glide+1 qstat -f $PBS_JOBID module load schrodinger cd $PBS_O_WORKDIR host=`cat $PBS_NODEFILE|head -1` nproc=`cat $PBS_NODEFILE|wc -l` glide -WAIT -HOST ${host}:${nproc} SP_cluster1_0.in
Check Running Glide
If you wanna check who is running glide (as OSC only bought 10 licenses):
licadmin STAT
Therefore you may NOT be able to run glide at the OSC master node (even for testing). Also, remember to add a PBS option when you submit glide to avoid job cancellation due to too many people running glide
#PBS -l software=glide+1
Autodock
Compilation
Manual here
make DEVICE=GPU NUMWI=128
Get binding poses
A typical Autodock output looks like this:
MODEL 801 USER Run = 801 USER Cluster Rank = 1 USER Number of conformations in this cluster = 22 USER USER RMSD from reference structure = 58.478 A USER USER Estimated Free Energy of Binding = -11.77 kcal/mol [=(1)+(2)+(3)-(4)] USER Estimated Inhibition Constant, Ki = 2.38 nM (nanomolar) [Temperature = 298.15 K] USER USER (1) Final Intermolecular Energy = -13.85 kcal/mol USER vdW + Hbond + desolv Energy = -13.61 kcal/mol USER Electrostatic Energy = -0.24 kcal/mol USER (2) Final Total Internal Energy = -2.04 kcal/mol USER (3) Torsional Free Energy = +2.09 kcal/mol USER (4) Unbound System's Energy [=(2)] = -2.04 kcal/mol USER USER USER USER DPF = WD40.dpf USER NEWDPF move teg.pdbqt USER NEWDPF about 3.758800 0.183900 -6.839200 USER NEWDPF tran0 9.906563 33.794136 47.044031 USER NEWDPF axisangle0 -0.222817 -0.959206 0.174002 107.688080 USER NEWDPF quaternion0 -0.179906 -0.774476 0.140492 0.589985 USER NEWDPF dihe0 -162.13 -159.47 -121.18 -26.65 178.53 33.91 99.44 USER USER x y z vdW Elec q RMS ATOM 1 C15 RMT 0 13.047 32.564 44.515 -0.28 -0.00 +0.075 58.478
You will want to fetch the lines start with "ATOM" and remove the last two columns to have:
ATOM 1 C15 RMT 0 13.047 32.564 44.515 -0.28 -0.00 . . . TER
Remember to add "TER" after the coordinates.
awk '{print gensub (/[^[:blank:]]+/, " ", 11)}'
is useful to remove unwanted columns (the 11th) from coordinates produced by docking softwares.
Autodock Vina
Prepare all ligands (.pdbqt) in a directory ligand/pdbqt.
Prepare all receptors (.pdbqt) in a directory receptor.
Split ligands by
python2 split.py
The configuration file looks like this:
center_x = 77.009 center_y = 80.221 center_z = 60.623 size_x = 28 size_y = 28 size_z = 26 num_modes = 100
Usage
The basic run command is
vina --config <config> --receptor <pdbqt> --ligand <pdbqt> --out out.pdbqt --log log.txt
Get results using python
#! /usr/bin/env python import sys import glob def doit(n): file_names = glob.glob('*/*.pdbqt') everything = [] failures = [] print 'Found', len(file_names), 'pdbqt files' for file_name in file_names: file = open(file_name) lines = file.readlines() file.close() try: line = lines[1] result = float(line.split(':')[1].split()[0]) everything.append([result, file_name]) except: failures.append(file_name) everything.sort(lambda x,y: cmp(x[0], y[0])) part = everything[:n] for p in part: print p[1], print if len(failures) > 0: print 'WARNING:', len(failures), 'pdbqt files could not be processed' if __name__ == '__main__': doit(int(sys.argv[1]))
Combine the ligand with the receptor (proteins)
- Make a namelist
- grep receptor coordinates from receptor.pdbqt
- grep ligand coordinates from out.dat
- cat together
Rendering
It is assumed that the receptors are pre-aligned.
- Prepare a sample VMD visualisation state
- rename the ligand to LIG
- standard render
Contact Statistics
General steps are:
- Get ready with PDBs combining the ligands and the receptor
- make_res.sh: using VMD to print a list of residue ID which is within 3 A of the ligand
- make_reslist.py: turn the ResID list into histogram
- Plot with plot_bar.m using MATLAB
Select out the top-ranked ligands
Make a namelist
head -25 sort | awk '{print $1}' | cut -d/ -f1-5 > tmp
Move the directories
for ii in $(cat tmp); do cp -r $ii res; done