How to prepare a system
Contents
Fetch PDB
You can fetch pdb using
wget https://files.rcsb.org/download/6I1K.pdb
Protein-only systems
Add missing residues using Modeller
First, obtain the sequence of your PDB file:
from modeller import * code = 'prot' e = Environ() m = Model(e, file=code) aln = Alignment(e) aln.append_model(m, align_codes=code) aln.write(file=code+'.seq')
Two simple steps:
- Make a alignment.ali containing your current sequence (indicate missing residues) and target sequence (without missing residues)
- Feed to Modeller with defined parameters
A sample alignment.ali looks like this:
>P1;prot structureX:prot: 15 :A:+205 :B:::-1.00:-1.00 MVLLHASTHIFPTDFASVSRAFFNRYPNPYSPHVLSIDTISRNVDQEGNLRTTRLLKKS--------------TETWIIEVSVVNPAAS TMKTYTRNLDHTGAMKVEEYTTYA--------IADSAVKFSS----GAASKVAAWSRTKFDENVKKSRMGMAFVIAAA--------* >P1;prot_fill sequence::::::::: MVLLHKSTHIFPTDFASVSRAFFNRYPNPYSPHVLSIDTISRNVDQEGNLRTTRLLKKSGKLPTWVKPFLRGITETWIIEVSVVNPANS TMKTYTRNLDHTGIMKVEEYTTYQFDSATSSTIADSRVKFSSGFNMGIKSKVEDWSRTKFDENVKKSRMGMAFVIQKLEEARNPQF*
A sample python script for calling Modeller:
from modeller import * from modeller.automodel import * # Load the automodel class log.verbose() env = environ() # directories for input atom files env.io.atom_files_directory = ['.', '../atom_files'] a = automodel(env, alnfile = 'alignment.seq', knowns = 'prot', sequence = 'prot_fill', assess_methods=(assess.DOPE, assess.GA341)) a.starting_model= 1 a.ending_model = 1 a.make()
Or you can directly use Chimera for this purpose.
Manually rotate the residue side chains
To rotate a side chain in VMD:
set cc [atomselect top "resid 1 and name CA"] ### Select a CA as the centre of rotation set ss [atomselect top "resid 1 and not name CA C O N H"] ### Select the side chain $ss move [trans center [measure center $cc] axis x 10] ### Rotate the side chain about the x-axis by 10 degrees
Protonation states of amino acids
PlayMolecule (Using ProPka)
Improper dihedral problem in Gromacs' Amber force-field
In ffbonded.itp:
CT CV CC NA 4 180.00 4.60240 2
However pdb2gmx will make
CT CC CV NA
for HID, causing error message of "No default Proper Dih. types". Therefore you have to manually find the improper dihedral terms in the topology (usually topol_Protein.itp made by pdb2gmx) and exchange the second and the third indices. You could find the line number in the error message.
Protein-ligand complex systems
You are directed to How to parameterise ligands
Protein-membrane systems
CHARMM to Gromacs
File conversion from CHARMM to Gromacs is often encountered, because CHARMM-GUI is very convenient for creating a membrane system. There is a script psf2itp provided by CHARMM-GUI (modified by Kev 2019) to convert a psf to topology file for Gromacs with CHARMM36m ff. You may replace the toppar to change the force-field (but still have to be CHARMM). CHARMM36m ff for Gromacs could be downloaded here.
python psf2itp.py toppar.str PSFfile
Insert your protein into membrane
A simple VMD script deletes "bad lipids" and place your favourite protein inside the membrane:
set POPC "resname POPC" set all [atomselect top all] $all set beta 0 set seltext1 "$POPC and same residue as (within 0.6 of protein)" set sel1 [atomselect top $seltext1] $sel1 set beta 1 set badlipid [atomselect top "name P and beta > 0"] $badlipid get resid set sel [atomselect top "beta<1 "] $sel writepdb prot_memb_hole.pdb
Remove water inside membrane
This VMD script solvates the system and removes unnecessary water inside the membrane
package require solvate solvate 5kxi_popc.psf 5kxi_popc.pdb -o 5kxi_popc_water_TEMP -b 1.5 -minmax {{-38 -38 -39} {39 39 50}} set all [atomselect top all] $all set beta 0 set seltext "segid WT1 to WT99 and same residue as abs(z) <25" set sel [atomselect top $seltext] $sel set beta 1 set badwater [atomselect top "name OH2 and beta > 0"] set seglist [$badwater get segid] set reslist [$badwater get resid] mol delete all package require psfgen resetpsf readpsf 5kxi_popc_water_TEMP.psf coordpdb 5kxi_popc_water_TEMP.pdb foreach segid $seglist resid $reslist { delatom $segid $resid } writepdb 5kxi_popcw.pdb writepsf 5kxi_popcw.psf file delete kcsa_popc_water_TEMP.psf file delete kcsa_popc_water_TEMP.pdb
ParmEd
ParmEd is handful for converting common file formats such as psf/pdb to top/gro (Gromacs) or top/inpcrd (Amber).
import parmed as pmd gmx = pmd.load_file('topol.top', xyz='system.gro') gmx.save('system.top', format='amber') gmx.save('system.inpcrd', format='rst7') gmx.save('ionized.psf')
Please note that the Gromacs output comes WITHOUT forcefields.
RNA/DNA
Rosetta: rna_helix.py stepwise
ModeRNA: [1]
Notes
Re-number residues
foreach ii {A B C D} { mol new $ii.pdb set sel [atomselect top all] set oldNumbering [$sel get residue] set newNumbering {} foreach resid $oldNumbering { lappend newNumbering [expr {$resid + 1}] } $sel set resid $newNumbering $sel writepdb $ii.pdb }
Make rotations of coordinates
set sel [atomselect top "segname P1 and resid 1 to 13"] set selcenter [atomselect top "segname P1 and resid 14 and name N"] set centerX [list [$selcenter get x] [$selcenter get y] [$selcenter get z]] set matrixX [trans center $centerX axis y -10] $sel move $matrixX
set sel [atomselect top "protein"] set selcenter [atomselect top "protein"] set centerX [measure center $selcenter] for {set ii 0} {$ii < 360} {incr ii 30} { set matrixX [trans center $centerX axis x -10] $sel move $matrixX }
Modeller script used by Chimera
# --- UCSF Chimera Copyright --- # Copyright (c) 2000 Regents of the University of California. # All rights reserved. This software provided pursuant to a # license agreement containing restrictions on its disclosure, # duplication and use. This notice must be embedded in or # attached to all copies, including partial copies, of the # software or any revisions or derivations thereof. # --- UCSF Chimera Copyright --- # This script is generated by the Modeller Dialog in Chimera, # incorporating the settings from the user interface. User can revise # this script and submit the modified one into the Advanced Option panel. # Import the Modeller module from modeller import * from modeller.automodel import * # ---------------------- namelist.dat -------------------------------- # A "namelist.dat" file contains the file names, which was output from # the Modeller dialog in Chimera based on user's selection. # The first line it the name of the target sequence, the remaining # lines are name of the template structures namelist = open( 'namelist.dat', 'r' ).read().split('\n') tarSeq = namelist[0] template = tuple( [ x.strip() for x in namelist[1:] if x != ] ) # ---------------------- namelist.dat -------------------------------- # This instructs Modeller to display all log output. log.verbose() # create a new Modeller environment env = environ() # Directory of atom/PDB/structure files. It is a relative path, inside # a temp folder generated by Chimera. User can modify it and add their # absolute path containing the structure files. env.io.atom_files_directory = ['.', './template_struc'] # Read in HETATM records from template PDBs env.io.hetatm = True # Read in water molecules from template PDBs env.io.water = True # create a subclass of automodel or loopmodel, MyModel. # user can further modify the MyModel class, to override certain routine. class MyModel(loopmodel): def select_loop_atoms(self): from modeller import selection return selection( self.residue_range('374', '380'), self.residue_range('931', '939'), self.residue_range('1078', '1088')) def select_atoms(self): from modeller import selection return selection( self.residue_range('374', '380'), self.residue_range('931', '939'), self.residue_range('1078', '1088')) def customised_function(self): pass #code overrides the special_restraints method #def special_restraints(self, aln): #code overrides the special_patches method. # e.g. to include the addtional disulfides. #def special_patches(self, aln): a = MyModel(env, sequence = tarSeq, # alignment file with template codes and target sequence alnfile = 'alignment.ali', # name of initial PDB template knowns = template[0]) # one fixed model to base loops on a.starting_model = 1 a.ending_model = 1 # 1 loop models loopRefinement = True a.loop.starting_model = 1 a.loop.ending_model = 1 a.loop.assess_methods=(assess.DOPE, assess.GA341, assess.normalized_dope) # Assesses the quality of models using # the DOPE (Discrete Optimized Protein Energy) method (Shen & Sali 2006) # and the GA341 method (Melo et al 2002, John & Sali 2003) a.assess_methods = (assess.GA341, assess.normalized_dope) # ------------------------- build all models -------------------------- a.make() # ---------- Accesing output data after modeling is complete ---------- # Get a list of all successfully built models from a.outputs if loopRefinement: ok_models = filter(lambda x: x['failure'] is None, a.loop.outputs) else: ok_models = filter(lambda x: x['failure'] is None, a.outputs) # Rank the models by index number #key = 'num' #ok_models.sort(lambda a,b: cmp(a[key], b[key])) def numSort(a, b, key="num"): return cmp(a[key], b[key]) ok_models.sort(numSort) # Output the list of ok_models to file ok_models.dat fMoutput = open('ok_models.dat', 'w') fMoutput.write('File name of aligned model\t GA341\t zDOPE \n') for m in ok_models: results = '%s\t' % m['name'] results += '%.5f\t' % m['GA341 score'][0] results += '%.5f\n' % m['Normalized DOPE score'] fMoutput.write( results ) fMoutput.close()