Pyridoxal Phosphatase Methods
Evolution
Homology Search
A Basic Local Alignment Search Tool (BLAST) search was carried out using a fixed non-redundant database.
Multiple Sequence Alignment
A selected range of sequences from the BLAST search was aligned using the ClustalX tool. The sequences were selected by removing redundant proteins from similar organisms and also by the expected value (e-value) of the match.
Phylogenetic Tree
The Phylip collection of programs was then used to generate a phylogenetic tree. A distance matrix was calculated using Protdist, then neighbor was used to generate the actual tree. The confidence values of the tree was then calculated by boot-strapping the alignment using the same progams but running using multiple data sets and compiling with the consense program.
Structure
Protein Data Bank (PDB)
The structure of 2cfsA was obtained from the PDB. A search on the database using Pyridoxal Phosphatase's PDB ID (2cfsA) did not yield any results. A search using the PDB ID "2cfs" was successful. The crystal structure of Pyridoxal Phosphatase was obtained from the "Images and Visualization" section located on the right side of website.
http://www.rcsb.org/pdb/explore/explore.do?structureId=2cfs
The PDB file of Pyridoxal Phosphatase was saved for further use.
DALI
DALI, otherwise known as the Distance Alignment Matrix Method, breaks down the input structures into hexapeptide fragments and calculates a distance matrix by evaluating the contact patterns between successive fragments. When two proteins' distance matrices share the same or similar features in approximately the same positions, they can be said to have similar folds with similar-length loops connecting their secondary structure elements.
A search was carried out using the PDB file (obtained from the previous step) of Pyridoxal Phosphatase. The results were generated and returned via e-mail. Due its similarity to 2cfsA, 2oycA - which was at the top of the list of hits generated by the DALI server, was structurally superimposed against 2cfsA using the PyMOL software.
PyMOL
PyMOL is a python-enhanced molecular graphics program that facilitates the visualization of proteins and nucleic acids via a number of representations. Apart from visualization, PyMOL allows for the alteration and modification of protein structures, as well as the calculation of molecular distances. PyMOL may be utilized via 2 methods: (1) the Graphical User Interface, or GUI; and/or (2) manually entering commands at the PyMOL command line. Due to its convenience and user-friendliness, the GUI approach was adopted for most of this study.
http://www.lifesci.sussex.ac.uk/research/bioinformatics/Y2_bioinformatics/p_visual.php
PDBsum
From the EBI website (http://www.ebi.ac.uk/), the PDBsum structural database was accessed to obtain the secondary structures, as well as information (i.e. topology diagram, cleft analysis) pertaining to the secondary structures of both 2cfsA and 2oycA. PDB provides pictorial representations of the various key protein information stored within the database.
Cleft Analysis via PyMOL
Based on the information obtained in the previous step, PyMOL was used to provide a three-dimensional view of potentially active sites in both 2cfsA and 2oycA. This basically gives the user a three-dimensional view of the catalytic sites of the protein.
PROFUNC
As with PDBsum, PROFUNC was accessed via the EBI website. The following information were obtained through this:
- Related Protein Sequences in the PDB (SAS)
- Matches to existing PDB Structures
- Secondary Structure Matching (SSM), which searches for structures with the same - or similar overall fold as the target.
- Nest Analysis.
- Summary of Protein Function
As for DALI, all results were generated via e-mail due to the large number of jobs processed by the database.
Identification of potential active sites
Based on the information obtained via the Nest Analysis method (PROFUNC), PyMOL was then used to identify the active site(s) based on the position of the specified residues.
Function
Function by homology
BLAST and FASTA database was used to compare functions of other proteins with similar homology. Pfam was then subsequently used to determine the family of the protein of interest.
Function by structure
Interpro and UniProt was used to sequence motifs from several databases eg. PROSITE, PRINTS,PFam-A, TIGRFAM, PROFILES and PRODOM. Motifs and domain of 2cfs_A was then analysed against other proteins with similar structure.
Function by gene location
Profunc
Protein-protein interactions
The String database was used to analyse protein-protein interaction of 2cfsA.