How to do Direct Coupling Analysis
Contents
Background
Direct Coupling Analysis (DCA) tells you highly correlated residue pairs in a given protein sequence by means of synchronized (coupled) mutations during evolution.
Existing Online Server
Do it from scratch
Build your sequence library
Choose existing sequence libraries with reference to HHsuite [1]
UniRef30 is definitely a good start: [2]
Untar it and you will get a >200G folder
Multisequence alignment
- Install HHsuite (using conda)
- Using hhblits and reformat.pl (provided by hhsuite) to get an MSA result (.fas)
hhblits -i prot.seq -d ../database/UniRef30_2020_06 -o $PREFIX.hhr -n 2 -oa3m $PREFIX.a3m -cov $COV reformat.pl a3m fas $PREFIX.a3m $PREFIX.fas
Structure-based filtering
Then filter the residue pairs based on structural info
cat > TCL << 'EOF' set THRES 0.2 set SHIFT 314 set infile [open "tmp" r] set outfile [open "tmppairs" w] set outlog [open "LOG" w] set nn 0 while {[gets $infile line] >= 0} { lassign [regexp -inline -all {\S+} $line] ii jj kk if {$kk > $THRES} { puts $outfile "$ii $jj $kk" incr nn } } puts $outlog "No. of Pairs (DCA score > $THRES): $nn" close $infile close $outfile mol new ../pdb_monomer.pdb draw delete all set sel [atomselect top "name CA"] set rr [measure rgyr $sel] puts $outlog "Radius of gyration of the monomer: $rr" set infile [open "tmppairs" r] set outfile [open "tmppairs_clearmissing" w] set nn 0 while {[gets $infile line] >= 0} { lassign [regexp -inline -all {\S+} $line] ii jj kk set ii [expr $ii + $SHIFT] set jj [expr $jj + $SHIFT] set sel [atomselect top "resid $ii $jj and name CA"] set sel1 [atomselect top "resid $ii and name CA"] set sel2 [atomselect top "resid $jj and name CA"] if {[llength [$sel get index]] != 2} { puts $outlog "Missing pair: $ii $jj" } elseif {[measure bond [$sel get index]] > [expr $rr * 1]} { puts [measure bond [$sel get index]] draw cylinder [measure center $sel1] [measure center $sel2] radius $kk puts $outfile "$ii $jj $kk" incr nn } } puts $outlog "No. of Pairs (Distance > rgyr): $nn" close $infile close $outfile mol new ../pdb_dimer.pdb draw delete all draw color white set sel [atomselect top "name CA"] set selall [atomselect top all] set selcenter [atomselect 0 "name CA"] $selall moveby [vecsub [measure center $sel] [measure center $selcenter]] set sela [atomselect top "chain A and name CA"] set selb [atomselect top "chain B and name CA"] set rr [vecdist [measure center $sela] [measure center $selb]] puts $rr set infile [open "tmppairs_clearmissing" r] set outfile [open "pairs_draw" w] set nn 0 while {[gets $infile line] >= 0} { lassign [regexp -inline -all {\S+} $line] ii jj kk set sela [atomselect top "chain A and resid $ii and name CA or chain B and resid $jj and name CA"] set selb [atomselect top "chain B and resid $ii and name CA or chain A and resid $jj and name CA"] if {[measure bond [$sela get index]] < [measure bond [$selb get index]]} { set sel1 [atomselect top "chain A and resid $ii and name CA"] set sel2 [atomselect top "chain B and resid $jj and name CA"] draw cylinder [measure center $sel1] [measure center $sel2] radius [expr $kk * 10] puts $outfile "$ii $jj $kk" } else { set sel1 [atomselect top "chain B and resid $ii and name CA"] set sel2 [atomselect top "chain A and resid $jj and name CA"] draw cylinder [measure center $sel1] [measure center $sel2] radius [expr $kk * 10] puts $outfile "$jj $ii $kk" } } close $infile close $outfile close $outlog EOF