How to do Direct Coupling Analysis
Contents
Background
Direct Coupling Analysis (DCA) tells you highly correlated residue pairs in a given protein sequence by means of synchronized (coupled) mutations during evolution.
Existing Online Server
Do it from scratch
Build your sequence library
Choose existing sequence libraries with reference to HHsuite [1]
UniRef30 is definitely a good start: [2]
Untar it and you will get a >200G folder
Multisequence alignment
- Install HHsuite (using conda)
- Using hhblits and reformat.pl (provided by hhsuite) to get an MSA result (.fas)
hhblits -i prot.seq -d ../database/UniRef30_2020_06 -o $PREFIX.hhr -n 2 -oa3m $PREFIX.a3m -cov $COV reformat.pl a3m fas $PREFIX.a3m $PREFIX.fas