How to do Direct Coupling Analysis

From ChengLab
Revision as of 21:37, 6 February 2021 by Kevin (talk | contribs)
Jump to: navigation, search

Background

Direct Coupling Analysis (DCA) tells you highly correlated residue pairs in a given protein sequence by means of synchronized (coupled) mutations during evolution.

Existing Online Server

EVcoupling

Do it from scratch

Build your sequence library

Choose existing sequence libraries with reference to HHsuite [1]

UniRef30 is definitely a good start: [2]

Untar it and you will get a >200G folder

Multisequence alignment

  1. Install HHsuite (using conda)
  2. Using hhblits and reformat.pl (provided by hhsuite) to get an MSA result (.fas)
hhblits -i prot.seq -d ../database/UniRef30_2020_06 -o $PREFIX.hhr -n 2 -oa3m $PREFIX.a3m -cov $COV
reformat.pl a3m fas $PREFIX.a3m $PREFIX.fas