How to do Direct Coupling Analysis
Contents
Background
Direct Coupling Analysis (DCA) tells you highly correlated residue pairs in a given protein sequence by means of synchronized (coupled) mutations during evolution.
Existing Online Server
Do it from scratch
Build your sequence library
Choose existing sequence libraries with reference to HHsuite [1]
UniRef30 is definitely a good start: [2]
Untar it and you will get a >200G folder
Multisequence alignment
- Install HHsuite [3] (using conda)
- Using hhblits and reformat.pl (provided by hhsuite) to get an MSA result (.fas)
hhblits -i prot.seq -d ../database/UniRef30_2020_06 -o $PREFIX.hhr -n 2 -oa3m $PREFIX.a3m -cov $COV reformat.pl a3m fas $PREFIX.a3m $PREFIX.fas
Structure-based filtering
Then filter the residue pairs based on structural info
cat > TCL << 'EOF'
set THRES 0.2
set SHIFT 314
set infile [open "tmp" r]
set outfile [open "tmppairs" w]
set outlog [open "LOG" w]
set nn 0
while {[gets $infile line] >= 0} {
lassign [regexp -inline -all {\S+} $line] ii jj kk
if {$kk > $THRES} {
puts $outfile "$ii $jj $kk"
incr nn
}
}
puts $outlog "No. of Pairs (DCA score > $THRES): $nn"
close $infile
close $outfile
mol new ../pdb_monomer.pdb
draw delete all
set sel [atomselect top "name CA"]
set rr [measure rgyr $sel]
puts $outlog "Radius of gyration of the monomer: $rr"
set infile [open "tmppairs" r]
set outfile [open "tmppairs_clearmissing" w]
set nn 0
while {[gets $infile line] >= 0} {
lassign [regexp -inline -all {\S+} $line] ii jj kk
set ii [expr $ii + $SHIFT]
set jj [expr $jj + $SHIFT]
set sel [atomselect top "resid $ii $jj and name CA"]
set sel1 [atomselect top "resid $ii and name CA"]
set sel2 [atomselect top "resid $jj and name CA"]
if {[llength [$sel get index]] != 2} {
puts $outlog "Missing pair: $ii $jj"
} elseif {[measure bond [$sel get index]] > [expr $rr * 1]} {
puts [measure bond [$sel get index]]
draw cylinder [measure center $sel1] [measure center $sel2] radius $kk
puts $outfile "$ii $jj $kk"
incr nn
}
}
puts $outlog "No. of Pairs (Distance > rgyr): $nn"
close $infile
close $outfile
mol new ../pdb_dimer.pdb
draw delete all
draw color white
set sel [atomselect top "name CA"]
set selall [atomselect top all]
set selcenter [atomselect 0 "name CA"]
$selall moveby [vecsub [measure center $sel] [measure center $selcenter]]
set sela [atomselect top "chain A and name CA"]
set selb [atomselect top "chain B and name CA"]
set rr [vecdist [measure center $sela] [measure center $selb]]
puts $rr
set infile [open "tmppairs_clearmissing" r]
set outfile [open "pairs_draw" w]
set nn 0
while {[gets $infile line] >= 0} {
lassign [regexp -inline -all {\S+} $line] ii jj kk
set sela [atomselect top "chain A and resid $ii and name CA or chain B and resid $jj and name CA"]
set selb [atomselect top "chain B and resid $ii and name CA or chain A and resid $jj and name CA"]
if {[measure bond [$sela get index]] < [measure bond [$selb get index]]} {
set sel1 [atomselect top "chain A and resid $ii and name CA"]
set sel2 [atomselect top "chain B and resid $jj and name CA"]
draw cylinder [measure center $sel1] [measure center $sel2] radius [expr $kk * 10]
puts $outfile "$ii $jj $kk"
} else {
set sel1 [atomselect top "chain B and resid $ii and name CA"]
set sel2 [atomselect top "chain A and resid $jj and name CA"]
draw cylinder [measure center $sel1] [measure center $sel2] radius [expr $kk * 10]
puts $outfile "$jj $ii $kk"
}
}
close $infile
close $outfile
close $outlog
EOF