How to prepare a system

From ChengLab
Revision as of 19:38, 14 February 2021 by Kevin (talk | contribs) (RNA)
Jump to: navigation, search

Fetch PDB

You can fetch pdb using

wget https://files.rcsb.org/download/6I1K.pdb

Protein-only systems

Add missing residues using Modeller

Two simple steps:

  1. Make a alignment.ali containing your current sequence (indicate missing residues) and target sequence (without missing residues)
  2. Feed to Modeller with defined parameters

A sample alignment.ali looks like this:

>P1;prot
structureX:prot:  15 :A:+205 :B:::-1.00:-1.00
MVLLHASTHIFPTDFASVSRAFFNRYPNPYSPHVLSIDTISRNVDQEGNLRTTRLLKKS--------------TETWIIEVSVVNPAAS
TMKTYTRNLDHTGAMKVEEYTTYA--------IADSAVKFSS----GAASKVAAWSRTKFDENVKKSRMGMAFVIAAA--------*

>P1;prot_fill
sequence:::::::::
MVLLHKSTHIFPTDFASVSRAFFNRYPNPYSPHVLSIDTISRNVDQEGNLRTTRLLKKSGKLPTWVKPFLRGITETWIIEVSVVNPANS
TMKTYTRNLDHTGIMKVEEYTTYQFDSATSSTIADSRVKFSSGFNMGIKSKVEDWSRTKFDENVKKSRMGMAFVIQKLEEARNPQF*

A sample python script for calling Modeller:

from modeller import *
from modeller.automodel import *    # Load the automodel class
log.verbose()
env = environ()
# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']
a = automodel(env,
   alnfile  = 'alignment.seq',
   knowns   = 'prot',
   sequence = 'prot_fill',
   assess_methods=(assess.DOPE,
                   assess.GA341))
a.starting_model= 1
a.ending_model  = 1
a.make()

Or you can directly use Chimera for this purpose.

Manually rotate the residue side chains

To rotate a side chain in VMD:

set cc [atomselect top "resid 1 and name CA"] ### Select a CA as the centre of rotation
set ss [atomselect top "resid 1 and not name CA C O N H"] ### Select the side chain
$ss move [trans center [measure center $cc] axis x 10] ### Rotate the side chain about the x-axis by 10 degrees

Protonation states of amino acids

H++

ProPka

PDB2PQR

PlayMolecule (Using ProPka)

Improper dihedral problem in Gromacs' Amber force-field

In ffbonded.itp:

CT  CV  CC  NA       4      180.00     4.60240     2

However pdb2gmx will make

CT  CC  CV  NA

for HID, causing error message of "No default Proper Dih. types". Therefore you have to manually find the improper dihedral terms in the topology (usually topol_Protein.itp made by pdb2gmx) and exchange the second and the third indices. You could find the line number in the error message.

Protein-ligand complex systems

You are directed to How to parameterise ligands

Protein-membrane systems

CHARMM to Gromacs

File conversion from CHARMM to Gromacs is often encountered, because CHARMM-GUI is very convenient for creating a membrane system. There is a script psf2itp provided by CHARMM-GUI (modified by Kev 2019) to convert a psf to topology file for Gromacs with CHARMM36m ff. You may replace the toppar to change the force-field (but still have to be CHARMM). CHARMM36m ff for Gromacs could be downloaded here.

python psf2itp.py toppar.str PSFfile

Insert your protein into membrane

A simple VMD script deletes "bad lipids" and place your favourite protein inside the membrane:

set POPC "resname POPC"
set all [atomselect top all]
$all set beta 0
set seltext1 "$POPC and same residue as (within 0.6 of protein)"
set sel1 [atomselect top $seltext1]
$sel1 set beta 1
set badlipid [atomselect top "name P and beta > 0"]
$badlipid get resid
set sel [atomselect top "beta<1 "]
$sel writepdb prot_memb_hole.pdb

Remove water inside membrane

This VMD script solvates the system and removes unnecessary water inside the membrane

package require solvate
solvate 5kxi_popc.psf 5kxi_popc.pdb -o 5kxi_popc_water_TEMP -b 1.5 -minmax {{-38 -38 -39} {39 39 50}}
set all [atomselect top all]
$all set beta 0
set seltext "segid WT1 to WT99 and same residue as abs(z) <25"
set sel [atomselect top $seltext]
$sel set beta 1
set badwater [atomselect top "name OH2 and beta > 0"]
set seglist [$badwater get segid]
set reslist [$badwater get resid]
mol delete all
package require psfgen
resetpsf
readpsf 5kxi_popc_water_TEMP.psf
coordpdb 5kxi_popc_water_TEMP.pdb
foreach segid $seglist resid $reslist {
delatom $segid $resid
}
writepdb 5kxi_popcw.pdb
writepsf 5kxi_popcw.psf
file delete kcsa_popc_water_TEMP.psf
file delete kcsa_popc_water_TEMP.pdb

ParmEd

ParmEd is handful for converting common file formats such as psf/pdb to top/gro (Gromacs) or top/inpcrd (Amber).

import parmed as pmd
gmx = pmd.load_file('topol.top', xyz='system.gro')
gmx.save('system.top', format='amber')
gmx.save('system.inpcrd', format='rst7')
gmx.save('ionized.psf')

Please note that the Gromacs output comes WITHOUT forcefields.

RNA/DNA

Rosetta: rna_helix.py stepwise

ModeRNA: [1]

Notes

Re-number residues

foreach ii {A B C D} {
   mol new $ii.pdb
   set sel [atomselect top all]
   set oldNumbering [$sel get residue]
   set newNumbering {}
   foreach resid $oldNumbering {
       lappend newNumbering [expr {$resid + 1}]
   }
   $sel set resid $newNumbering
   $sel writepdb $ii.pdb
}

Make rotations of coordinates

set sel [atomselect top "segname P1 and resid 1 to 13"]
set selcenter [atomselect top "segname P1 and resid 14 and name N"]
set centerX [list [$selcenter get x] [$selcenter get y] [$selcenter get z]]
set matrixX [trans center $centerX axis y -10]
$sel move $matrixX
set sel [atomselect top "protein"]
set selcenter [atomselect top "protein"]
set centerX [measure center $selcenter] 
for {set ii 0} {$ii < 360} {incr ii 30} {
       set matrixX [trans center $centerX axis x -10]
       $sel move $matrixX
}

Modeller script used by Chimera

   # --- UCSF Chimera Copyright ---
   # Copyright (c) 2000 Regents of the University of California.
   # All rights reserved.  This software provided pursuant to a
   # license agreement containing restrictions on its disclosure,
   # duplication and use.  This notice must be embedded in or
   # attached to all copies, including partial copies, of the
   # software or any revisions or derivations thereof.
   # --- UCSF Chimera Copyright ---
   # This script is generated by the Modeller Dialog in Chimera, 
   # incorporating the settings from the user interface. User can revise 
   # this script and submit the modified one into the Advanced Option panel. 
   # Import the Modeller module

   from modeller import *
   from modeller.automodel import *    

   # ---------------------- namelist.dat --------------------------------
   # A "namelist.dat" file contains the file names, which was output from 
   # the Modeller dialog in Chimera based on user's selection.
   # The first line it the name of the target sequence, the remaining 
   # lines are name of the template structures

   namelist = open( 'namelist.dat', 'r' ).read().split('\n')
   tarSeq = namelist[0]
   template = tuple( [ x.strip() for x in namelist[1:] if x !=  ] )

   # ---------------------- namelist.dat --------------------------------
   # This instructs Modeller to display all log output. 

   log.verbose()

   # create a new Modeller environment
   env = environ()

   # Directory of atom/PDB/structure files. It is a relative path, inside 
   # a temp folder generated by Chimera. User can modify it and add their 
   # absolute path containing the structure files.
   env.io.atom_files_directory = ['.', './template_struc']

   # Read in HETATM records from template PDBs 
   env.io.hetatm = True

   # Read in water molecules from template PDBs 
   env.io.water = True

   # create a subclass of automodel or loopmodel, MyModel.
   # user can further modify the MyModel class, to override certain routine.
   class MyModel(loopmodel):
       def select_loop_atoms(self):
           from modeller import selection
           return selection(
               self.residue_range('374', '380'),
               self.residue_range('931', '939'),
               self.residue_range('1078', '1088'))
       def select_atoms(self):
           from modeller import selection
           return selection(
               self.residue_range('374', '380'),
               self.residue_range('931', '939'),
               self.residue_range('1078', '1088'))

           def customised_function(self): pass
           #code overrides the special_restraints method

           #def special_restraints(self, aln):

           #code overrides the special_patches method.
           # e.g. to include the addtional disulfides.
           #def special_patches(self, aln):
           
   a = MyModel(env, sequence = tarSeq,
           # alignment file with template codes and target sequence
           alnfile = 'alignment.ali',
           # name of initial PDB template
           knowns = template[0])

   # one fixed model to base loops on
   a.starting_model = 1
   a.ending_model = 1

   # 1 loop models
   loopRefinement = True
   a.loop.starting_model = 1
   a.loop.ending_model = 1
   a.loop.assess_methods=(assess.DOPE, assess.GA341, assess.normalized_dope)

   # Assesses the quality of models using
   # the DOPE (Discrete Optimized Protein Energy) method (Shen & Sali 2006)
   # and the GA341 method (Melo et al 2002, John & Sali 2003)
   a.assess_methods = (assess.GA341, assess.normalized_dope)

   # ------------------------- build all models --------------------------
   a.make()

   # ---------- Accesing output data after modeling is complete ----------

   # Get a list of all successfully built models from a.outputs
   if loopRefinement:
       ok_models = filter(lambda x: x['failure'] is None, a.loop.outputs)
   else:
       ok_models = filter(lambda x: x['failure'] is None, a.outputs)

   # Rank the models by index number
   #key = 'num'
   #ok_models.sort(lambda a,b: cmp(a[key], b[key]))
   def numSort(a, b, key="num"):
       return cmp(a[key], b[key])
   ok_models.sort(numSort)


   # Output the list of ok_models to file ok_models.dat 
   fMoutput = open('ok_models.dat', 'w')
   fMoutput.write('File name of aligned model\t GA341\t zDOPE \n')

   for m in ok_models:
       results  = '%s\t' % m['name']
       results += '%.5f\t' % m['GA341 score'][0]
       results += '%.5f\n' % m['Normalized DOPE score']
       fMoutput.write( results )

   fMoutput.close()