Pyridoxal Phosphatase Discussion: Difference between revisions

From MDWiki
Jump to navigationJump to search
No edit summary
 
(81 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Evolution
==Evolution==
----


===Conserved Residues===
It was found that the related proteins had very few conserved regions. The only region that had some conservation was near the end of the proteins, from 810 to 890 nucleotides in the multiple sequence alignment (MSA).


[[Image:Pyridoxal_phosphatase_conserved_image.jpg|left|thumb|800px|Spatial alignment of conserved residues from the MSA.]]  The picture on the left shows the spatial alignment of the conserved residues from the multiple sequence alignment. The red highlighted residues are those that are conserved throughout the alignment while the yellow residues are those which have conservative substitutions in some of the organisms. The green residues denote those that had semi-conservative substitutions.


Structure
It could be deduced that the conserved residues form the active site of the protein as they are in proximity to one another. This is reinforced by the fact that the semi-conserved substitutions are also close to the conserved residues and appear to be linked. The conserved residues are also adjacent to a magnesium ion, which is a cofactor, as described in the structure portion of the paper. The conservative substitutions on the outskirts of the active site could possibly exist to maintain the shape and conformation of the active site and thus ensure the activity of the site remains.
----


'''PDB'''<BR>
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 
===Phylogeny===
The phylogenetic tree generation for organisms that have proteins related to 2cfsA (Pyridoxal phosphatase) resulted in a cladogram that  can be split into four distinct Kingdoms; Prokaryota, Fungi and Plantae and Animalia.
 
[[Image:Pyridoxal_phosphatase_Radial_Cladogram.jpg|center|thumb|800px]]
 
It is interesting to note that, from the cladogram, the organisms with the closest relationship to the human pyridoxal phosphatase are from the rodent family, ''Rattus norvegicus'' (Rat), ''Mus musculus'' (Mouse) and ''Monodelphis domestica'' (Short-tailed opossum) as well as the cow (''Bos taurus'') rather than the chimpanzee (''Pan troglodytes'') and the macaques (''Macaca mulatta''), both of which are much further off, and which are close to ''Canis familiaris'' or dog.
 
The prokaryotes that have related proteins are fewer than the those from the animal or fungi kingdoms. The majority of the prokaryotes are archaea rather than bacteria. This could possibly indicate that archaea use Vitamin B6 while bacteria do not, hence the need to have pyridoxal phosphatase related proteins.
 
The organisms with related proteins in the Plantae Kingdom are few but are less spread out than the prokaryotes, leading to the conclusion that there is less diversity among the pyridoxal phosphatase-related proteins in plants. Curiously, there appear to be three different proteins related to pyridoxal phosphatase in ''Chlamydomonas reinhardtii'', an algae, as all are in seperate branches of the cladogram. A similar occurrence is in the two organisms from the genus ''Ostreococcus'', where, both are related, being in the same genus but are located on different branches.
 
The Kingdom Fungi organisms are diverse but are closely related, as evidenced by their proximity to each other and the short branch lengths.
 
While bootstrapping the data during the construction of the phylogenetic tree, several branches were found in the tree that were not in the consensus tree.
 
[[Image:Pyridoxal phosphatase Rectangular Tree discussion1.jpg|center|thumb|500px]]
 
The branches that had low boot-strap values were generally those from the Kingdom Prokaryota but with several branches in Kingdom Animalia as well.
 
The weakest confidence values were found at the root of the cladogram with differentiation becoming more certain at the tips of the branches.
<br>
 
==Structure==
 
===PDB===
Based on the information obtained from '''PDB''', 2cfsA was identified to have the following features: <BR>
Based on the information obtained from '''PDB''', 2cfsA was identified to have the following features: <BR>
* Isolated from ''Homo Sapiens'', and is expressed in ''Escherichia Coli''. <BR>
* Isolated from ''Homo Sapiens'' and expressed in ''Escherichia Coli''. <BR>
* Structurally similar to the Pyridoxal Phosphate Phosphatase protein. <BR>
* Structurally similar to the Pyridoxal Phosphate Phosphatase protein. <BR>
* Consists of a single type of chain (A), and (2) Magnesium components. <BR>
* Consists of a single type of chain (A)<BR>
* 2 Magnesium ions <BR>
* Resolution of 2.4 angstroms. The significance of this is that the probability that the number of side-chains in the wrong rotamer is relatively smaller. Proteins of similar resolution were noted also to: (1) have many small detectable errors, (2) be of correct folding, (3) contain fewer number of errors in the surface loops and (4) consist of visible water molecules and small ligands.<BR>
* Resolution of 2.4 angstroms. The significance of this is that the probability that the number of side-chains in the wrong rotamer is relatively smaller. Proteins of similar resolution were noted also to: (1) have many small detectable errors, (2) be of correct folding, (3) contain fewer number of errors in the surface loops and (4) consist of visible water molecules and small ligands.<BR>


'''DALI''' <BR>
===DALI===
A search on the protein in the '''Dali''' database yielded 176 hits, of which the top 11 were identified to be of potential significance on account of the information provided in the summary block of the results. The summary block provides the following information: <BR>
A search on the protein in the '''Dali''' database yielded 176 hits, of which the top 11 were identified to be of potential significance on account of the information provided in the summary block of the results. The summary block provides the following information: <BR>
* ''Z score'', or the statistical significance of the similarity between the hit protein and the protein-of-interest.<BR>
* ''Z score'', or the statistical significance of the similarity between the hit protein and the protein-of-interest. The program optimises a weighted sum of similarities of intramolecular distances.<BR>
* ''Root Mean Square Distance (RMSD)'', which indicates the degree of divergence between the hit protein and the protein-of-interest. <BR>
* ''Root Mean Square Distance (RMSD)'', which indicates the degree of divergence between the hit protein and the protein-of-interest. The lower the value, the more similar it is to the protein-of-interest. <BR>
* ''lali'', the total number of shared residues between the hit protein and the protein-of-interest. <BR>
* ''lali'', the number of structurally equivalent residues. <BR>
* ''nres'', or the total number of residues in the hit protein. <BR>
* ''nres'', or the total number of amino acids in the hit protein. <BR>
* ''%id''. As the term implies, %id refers to the percentage of sequence identity over structurally identical positions. <BR>
* ''%id''. As the term implies, %id refers to the percentage of sequence identity over structurally identical positions. <BR>


Of the 11 hit proteins, 2oycA was predicted to be the most structurally similar to 2cfsA. This was on virtue of the following properties: <BR>
Of the 11 hit proteins, 2oycA was predicted to be the most structurally similar to 2cfsA. This was on virtue of the following properties: <BR>
* Among the list of hit proteins generated, it had the highest Z value of 47.6 <BR>
* Among the list of hit proteins generated, it had the highest Z value of 47.6 <BR>
* rmsd value of 0.4, the lowest among the hit proteins. The lower the rmsd value, the more similar it is to the protein-of-interest. <BR>
* RMSD value of 0.4, the lowest among the hit proteins. <BR>
* lali value of 288. 2cfsA has a total of 298 amino acid residues, which means that 2oycA and 2cfsA differ by only 10 amino acid residues. <BR>
* lali value of 288. 2cfsA has a total of 298 amino acid residues, which means that 2oycA and 2cfsA differ by only 10 amino acid residues. <BR>
* nres value of 292. This did not bear much significance on the decision-making process. However, a conclusion drawn was that it was similar to 2cfsA in terms of length. <BR>
* nres value of 292. This did not bear much significance on the decision-making process. However, a conclusion drawn was that it was similar to 2cfsA in terms of length. <BR>
* %id score of 99%, which simply means that based on the information currently stored in the DALI database, 2oycA and 2cfsA were highly similar.<BR>
* %id score of 99%, which simply means that based on the information currently stored in the DALI database, 2oycA and 2cfsA were highly similar.<BR>


[[Image:Dali.jpg|left|thumb|550px|The top 11 hits generated by the DALI database. These 11 proteins were also deemed to be of significance to this study]][[Image:Dali cutoff.jpg|left|thumb|550px|The red, boxed section illustrates the cut-off point. Protein number 12 onwards were rejected based on their lali values.]]
[[Image:Dali.jpg|left|thumb|550px|The top 11 hits generated by the DALI database. These 11 proteins were also deemed to be of significance to this study]][[Image:Dali cutoff.jpg|left|thumb|550px|The red, boxed section illustrates the cut-off point. Protein number 12 onwards were rejected as their ''lali'' values were less than half of 2cfsA's (298)]]
<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>
<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>


Given that 2cfsA has 298 amino acid residues, the twelfth hit onwards (i.e. 2hszA) were rejected as more than half of their amino acid residues did not indicate similarity to 2cfsA. <BR>
Given that 2cfsA has 298 amino acid residues, the twelfth hit onwards (i.e. 2hszA) were rejected as more than half of their amino acid residues did not indicate similarity to 2cfsA (i.e. poor structural equivalence). <BR>


To further prove that 2cfsA and 2oycA were structurally similar, their three dimensional structures were superimposed using PyMOL. As expected, the 2oycA bore a close resemblance to 2cfsA, but the differing regions have yet to be identified. <BR>
To further prove that 2cfsA and 2oycA were structurally similar, their three dimensional structures were superimposed using PyMOL. As expected, 2oycA bore a close resemblance to 2cfsA. <BR>


'''PDBsum''' <BR>
Based on the results obtained in the first (DALI) run, a second run was conducted towards the end of the study. This was primarily due to the concern that the first run identified 2oycA as the query protein instead of 2cfsA. While this would not have (negative) implications on the already-generated results, it was done as a measure of assurance for the team. It was observed that there was a huge deviation between the results obtained in the first and second runs, namely:<BR>
The secondary structures of 2cfsA and 2oycA were noted to be highly similar, the only difference being that 2oycA did not have any disulphide bonds, nor indications of any active site(s), as opposed to 2cfsA.<BR>


The topology diagrams, however, indicated complete homology between the said proteins.<BR>
* 2cfsA was not among the list of hit proteins yielded in the first run, whereas based on the results obtained in the second run, it was identified as the query protein. This information was significant, as before the second run, it could only be hypothesized that 2cfsA belonged to the same protein family as 2oycA. With the results yielded in the second run, it can be confirmed that 2cfsA and 2oycA were indeed from the same protein family. In fact, the second run yielded a handful of proteins which were more closely related to 2cfsA than 2oycA.


A cleft analysis for both proteins was conducted, and based on the three-dimensional modelling using PyMOL, it was concluded that both proteins may have similar active sites. This is significant as structural information is usually crucial for the functional prediction of the protein-of-interest. Since the information provided by the secondary structures of both proteins are highly similar, there is a great possibility that 2cfsA and 2oycA are functionally similar. <BR>
The vast difference between the results obtained in both runs highlighted the disadvantages of DALI. While it is one of the more reliable protein fold comparison programs available, continuous changes in the field has resulted in inconsistencies (Novotny et al., 2004). This has been well illustrated from the results obtained in this study.


'''PROFUNC''' <BR>
Other structural studies indicated the use of the Combinatorial Extension (CE) database. In this study, the CE database was not utilized due to software incompatibility. Whenever a query PDB ID was run against the database, an error message would appear. This could be because CE only accepts structures uploaded from a Mac or a Unix workstation (Novotny et al., 2004), which were not utilized during this study.
Based on the results generated by DALI, it was concluded that 2cfsA and 2oycA were structurally similar and this could have been a crucial point in the functional determination of both  proteins. A search on Profunc, however, seemed to suggest that 2cftA is a much closer match to 2cfsA than 2oycA. In fact, 2cftA can be said to be identical, in more ways than one, to 2cfsA. <BR>
 
===PDBsum, Pfam===
PDBsum is a database which pictorially illustrates information on each macromolecule deposited in the PDB. Some of the features provided by PDBsum include:
* Images of the query structure
* Annotated plots of each protein chain's secondary structure
* PROMOTIF-generated structural analyses, illustrated in the form of topology diagrams
* Schematics of protein interactions (i.e. with ligand, DNA), represented in the form of LIGPLOT diagrams.
''(Laskowski et al., 2001)''<BR>
 
 
The protein chains are represented in the form of "wiring diagrams", a schematic of the protein chain's secondary structure motifs, primary sequence, structural domains and active sites (Laskowski et al., 2001). The secondary structures of 2cfsA and 2oycA were noted to be highly similar, the difference being that 2oycA did not have any disulphide bonds, nor indications of any active site(s), as opposed to 2cfsA.<BR>
 
The topology diagrams of both proteins (2cfsA and 2oycA) were the next bits of information obtained. These topology diagrams are PROMOTIF-generated schematics of the secondary structure motifs, and do not provide information as detailed as that of the "wiring diagrams" described in the previous paragraph (Laskowski et al., 2001).
In this study, the topology diagrams indicated complete homology between the said proteins.<BR>
 
A cleft analysis for both proteins was conducted, and visualization was carried out (via PyMOL) with the aim of observing similarities between the potential catalytic sites of 2cfsA and 2oycA. Clefts in protein surfaces are of relative significance due to their relevance to binding sites. It has been hypothesized that the active site usually lies in the largest protein clefts/cavities (Laskowski et al. 1996). With the similarities observed in both proteins, it was deduced that 2cfsA and 2oycA's catalytic sites were located in the same region.
 
The results obtained are significant as structural information is usually crucial for the functional prediction of the protein-of-interest. Since the information provided by the secondary structures of both proteins are highly similar, there is a great possibility that 2cfsA and 2oycA are functionally similar. <BR>
 
A link to the pfam site pertaining to 2cfsA identified it as a member of the haloacid dehalogenase-like hydrolase family (http://pfam.sanger.ac.uk/family?acc=PF00702). Members of this protein family (which includes phosphatases) belong to the Haloacid Dehalogenase (HAD) superfamily. Almo et al., 2007 mentions that members of the HAD superfamily catalyze phosphoryl transfer reactions in small molecules. It encompasses a large number of magnesium-dependent phosphohydrolases which are responsible for the co-ordination of the catalytically active magnesium ions, where the first aspartic acid serves as the nucleophile and phosphoryl acceptor. Further evidence highlighting the importance of the magnesium ions lie in the fact that when substituted with calcium, a near total loss of activity is observed. This highlights the significance of the magnesium ion, and also ties in with the work done using the ProFunc database, which predicts that 2cfsA is involved in catalytic processes. An assumption, therefore, would be that the magnesium ions are involved in the catalytic functions of 2cfsA.
 
===PROFUNC===
The ProFunc database was utilized due to its ability to predict the likely functions of a protein with a known 3D structure. It makes use of both existing and new methods to analyse a query protein's sequence and structure, with the aim of identifying functional motifs which could indicate relationships to proteins whose functions are known (Laskowski et al., 2005).
 
Based on the results generated by DALI, it was concluded that 2cfsA and 2oycA were structurally similar and this could have been a crucial point in the functional determination of both  proteins. A search on ProFunc, however, seemed to suggest that 2cftA was a much closer match to 2cfsA than 2oycA. In fact, based on the local alignment scoring system, 2cftA is identical to 2cfsA in terms of sequence and structure, according to the information stored in the PDB database. However, this does not rule out the potential significance of 2oycA, as it was the next closest protein hit after 2cftA.<BR>


It was important to note that unlike the information provided by NCBI (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=134104092), 2cfsA was noted to have 293 amino acid residues instead of 298 (NCBI). One reason for this discrepancy could be that in the ever-changing scientific field, existing information will not be spared from change. <BR>
It was important to note that unlike the information provided by NCBI (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=134104092), 2cfsA was noted to have 293 amino acid residues instead of 298 (NCBI). One reason for this discrepancy could be that in the ever-changing scientific field, existing information will not be spared from change. <BR>


Based on the information provided by PDB, 2cfsA and 2cftA shared identical sequences and structures. However, this does not rule out 2oycA, as it was still the next closest protein hit to 2cfsA and 2cftA. <BR>
In terms of secondary structure matching (SSM), however, 2cftA was NOT identical to 2cfsA, as reflected by their differing Q scores (1.000 and 0.981 respectively). The Q score takes into account the number of aligned residues, the rmsd scores and the size of the proteins; and a high Q score implies high similarity between the hit protein and the protein-of-interest. More interestingly, however, was the observation that 2oycA was a distant sixth on the list, four tiers below 2cftA. Such was the homology between the hit proteins and 2cfsA, however, that even at sixth, the deviation between 2cfsA and 2oycA was minimal. <BR>
 
To determine the potential active sites of 2cfsA, the nest analysis method was utilized. Based on an article by Pal et. al, 2002, the principle of this method revolves around the possibility that anion (negatively-charged) and cation (positively-charged) binding sites in proteins are made up of three amino acids, of which two exhibit "enantiomeric" main chain conformations. This simply means that the main chain torsion angles of the two adjacent amino acids are inverted about the centre of the Ramachandran plot. This results in the formation of "nests", which are defined as concave depressions which ultimately serve as binding sites. Using the nest analysis method, 2oycA obtained similar results to that of 2cfsA. There is therefore a good chance that both 2cfsA and 2oycA have similar catalytic sites. Keeping in mind that 2oycA is the PDB identifier for Chronophin, and that there is an article covering chronophin, it was used to identify potential structure-function relationships.
 
===Visualizing the potential catalytic site of 2cfsA===
Finally, a LIGPLOT of interactions involving the PLP ligand in 2cftA was obtained. The LIGPLOT was a good indication of the location of 2cftA's active site, highlighting the hydrogen bonds and non-bonded interactions between the ligand (in 2cfsA's case, the Magnesium ions) and the protein residues the ligand interacts with (Laskowski et al., 2001). It was also noticed that the Mg 1296(A) ion of 2cfsA was located in exactly the same position as 2cftA's calcium ion. This is significant as substitution of the Magnesium ion with the catalytically inert Calcium ion results in a loss of activity (Almo et al., 2007). Based on this, it was deduced that 2cfsA's active site is in the region surrounding Mg 1296(A). The LIGPLOT of interactions involving the Mg 1297(A) ion of 2cfsA was obtained, and this information - along with the information provided by the LIGPLOT involving the Mg 1296(A) - was used to generated a three-dimensional view of the catalytic site of 2cfsA via PyMOL. Compared to the earlier results based on the evolutionary aspect of the paper, the conserved regions - as obtained from the Multiple Sequence Alignment, was not too far off from the information (residues that constitute the catalytic site) obtained from Almo et al., 2007. PyMOL-generated visualizations of the possible catalytic sites using both sets of information illustrated this.
 
==Function==
 
===Sequence and Structure===
 
Search results from Pfam, FASTA and BLAST database all pinpoints that 2cfs_A belongs to HAD-like hydrolase superfamily.
None of the search results indicates that 2cfs_A have other functions. Likewise for structure comparision, using Interpro and Profunc, top search results yield only members of the HAD-like hydrolase superfamily. The use of sequence homology and structure similarities in this case to predict functions seemed inefficient as search results based on sequence and structure are only able to pinpoint that 2cfs_A functions as a HAD-like hydrolase and are unable to give more information beyond that.
 
===Functional Expression in Tissue===


In terms of secondary structure matching (SSM), however, 2cftA was NOT identical to 2cfsA, even though it was homologous enough to be ranked as the next highest protein hit. 2oycA was a distant sixth on the list, four tiers below 2cftA. Such was the homology of all the hit proteins, however, that even at sixth, the deviation between 2cfsA and 2oycA was minimal. <BR>
Function of Pyridoxal Phosphatase has been well established as it is a very important vitamin, involved in most cellular metabolism and reactions. However, the level of functional expression is markedly variable from tissue to tissue. In adults, PLP phosphatase was most highly expressed in all the regions of central nerve system except the spinal cord. High levels were also found in liver and testis. In fetus, expression levels of PLP phosphatase transcript showed a rather even distribution in all organs except the brain.It is interesting to note, levels of pyridoxal phosphatase in the brain for both adults and fetus are substantially higher, suggesting that pyridoxal phosphatase may have a specific functional role there.


To determine the potential active sites of 2cfsA, the nest analysis method was utilized. Based on an article by Pal et. al, 2002, the principle of this method revolves around the possibility that anion (negatively-charged) and cation (positively-charged) binding sites in proteins are made up of three amino acids, of which two exhibit "enantiomeric" main chain conformations. This simply means that the main chain torsion angles of the two adjacent amino acids are inverted about the centre of the Ramachandran plot. This results in the formation of "nests", which are defined as concave depressions which ultimately serve as binding sites.
===Protein-Protein Interaction===


A search on 2oycA was then done on PROFUNC, the following were confirmed: <BR>
Using STRING, predicted functional partners for protein of interest can be predicted, based on proximity of gene location, interaction discovered during experiments, textmining etc. This is very useful as there are evidence to form the basis for the prediction.
* 2oycA is structurally similar to 2cfsA
* High probability of being functionally related due to common active sites


Function
=====back to [[Pyridoxal Phosphatase]] main page=====
----

Latest revision as of 01:23, 10 June 2008

Evolution

Conserved Residues

It was found that the related proteins had very few conserved regions. The only region that had some conservation was near the end of the proteins, from 810 to 890 nucleotides in the multiple sequence alignment (MSA).

Spatial alignment of conserved residues from the MSA.

The picture on the left shows the spatial alignment of the conserved residues from the multiple sequence alignment. The red highlighted residues are those that are conserved throughout the alignment while the yellow residues are those which have conservative substitutions in some of the organisms. The green residues denote those that had semi-conservative substitutions.

It could be deduced that the conserved residues form the active site of the protein as they are in proximity to one another. This is reinforced by the fact that the semi-conserved substitutions are also close to the conserved residues and appear to be linked. The conserved residues are also adjacent to a magnesium ion, which is a cofactor, as described in the structure portion of the paper. The conservative substitutions on the outskirts of the active site could possibly exist to maintain the shape and conformation of the active site and thus ensure the activity of the site remains.
















Phylogeny

The phylogenetic tree generation for organisms that have proteins related to 2cfsA (Pyridoxal phosphatase) resulted in a cladogram that can be split into four distinct Kingdoms; Prokaryota, Fungi and Plantae and Animalia.

Pyridoxal phosphatase Radial Cladogram.jpg

It is interesting to note that, from the cladogram, the organisms with the closest relationship to the human pyridoxal phosphatase are from the rodent family, Rattus norvegicus (Rat), Mus musculus (Mouse) and Monodelphis domestica (Short-tailed opossum) as well as the cow (Bos taurus) rather than the chimpanzee (Pan troglodytes) and the macaques (Macaca mulatta), both of which are much further off, and which are close to Canis familiaris or dog.

The prokaryotes that have related proteins are fewer than the those from the animal or fungi kingdoms. The majority of the prokaryotes are archaea rather than bacteria. This could possibly indicate that archaea use Vitamin B6 while bacteria do not, hence the need to have pyridoxal phosphatase related proteins.

The organisms with related proteins in the Plantae Kingdom are few but are less spread out than the prokaryotes, leading to the conclusion that there is less diversity among the pyridoxal phosphatase-related proteins in plants. Curiously, there appear to be three different proteins related to pyridoxal phosphatase in Chlamydomonas reinhardtii, an algae, as all are in seperate branches of the cladogram. A similar occurrence is in the two organisms from the genus Ostreococcus, where, both are related, being in the same genus but are located on different branches.

The Kingdom Fungi organisms are diverse but are closely related, as evidenced by their proximity to each other and the short branch lengths.

While bootstrapping the data during the construction of the phylogenetic tree, several branches were found in the tree that were not in the consensus tree.

Pyridoxal phosphatase Rectangular Tree discussion1.jpg

The branches that had low boot-strap values were generally those from the Kingdom Prokaryota but with several branches in Kingdom Animalia as well.

The weakest confidence values were found at the root of the cladogram with differentiation becoming more certain at the tips of the branches.

Structure

PDB

Based on the information obtained from PDB, 2cfsA was identified to have the following features:

  • Isolated from Homo Sapiens and expressed in Escherichia Coli.
  • Structurally similar to the Pyridoxal Phosphate Phosphatase protein.
  • Consists of a single type of chain (A)
  • 2 Magnesium ions
  • Resolution of 2.4 angstroms. The significance of this is that the probability that the number of side-chains in the wrong rotamer is relatively smaller. Proteins of similar resolution were noted also to: (1) have many small detectable errors, (2) be of correct folding, (3) contain fewer number of errors in the surface loops and (4) consist of visible water molecules and small ligands.

DALI

A search on the protein in the Dali database yielded 176 hits, of which the top 11 were identified to be of potential significance on account of the information provided in the summary block of the results. The summary block provides the following information:

  • Z score, or the statistical significance of the similarity between the hit protein and the protein-of-interest. The program optimises a weighted sum of similarities of intramolecular distances.
  • Root Mean Square Distance (RMSD), which indicates the degree of divergence between the hit protein and the protein-of-interest. The lower the value, the more similar it is to the protein-of-interest.
  • lali, the number of structurally equivalent residues.
  • nres, or the total number of amino acids in the hit protein.
  • %id. As the term implies, %id refers to the percentage of sequence identity over structurally identical positions.

Of the 11 hit proteins, 2oycA was predicted to be the most structurally similar to 2cfsA. This was on virtue of the following properties:

  • Among the list of hit proteins generated, it had the highest Z value of 47.6
  • RMSD value of 0.4, the lowest among the hit proteins.
  • lali value of 288. 2cfsA has a total of 298 amino acid residues, which means that 2oycA and 2cfsA differ by only 10 amino acid residues.
  • nres value of 292. This did not bear much significance on the decision-making process. However, a conclusion drawn was that it was similar to 2cfsA in terms of length.
  • %id score of 99%, which simply means that based on the information currently stored in the DALI database, 2oycA and 2cfsA were highly similar.
The top 11 hits generated by the DALI database. These 11 proteins were also deemed to be of significance to this study
The red, boxed section illustrates the cut-off point. Protein number 12 onwards were rejected as their lali values were less than half of 2cfsA's (298)













Given that 2cfsA has 298 amino acid residues, the twelfth hit onwards (i.e. 2hszA) were rejected as more than half of their amino acid residues did not indicate similarity to 2cfsA (i.e. poor structural equivalence).

To further prove that 2cfsA and 2oycA were structurally similar, their three dimensional structures were superimposed using PyMOL. As expected, 2oycA bore a close resemblance to 2cfsA.

Based on the results obtained in the first (DALI) run, a second run was conducted towards the end of the study. This was primarily due to the concern that the first run identified 2oycA as the query protein instead of 2cfsA. While this would not have (negative) implications on the already-generated results, it was done as a measure of assurance for the team. It was observed that there was a huge deviation between the results obtained in the first and second runs, namely:

  • 2cfsA was not among the list of hit proteins yielded in the first run, whereas based on the results obtained in the second run, it was identified as the query protein. This information was significant, as before the second run, it could only be hypothesized that 2cfsA belonged to the same protein family as 2oycA. With the results yielded in the second run, it can be confirmed that 2cfsA and 2oycA were indeed from the same protein family. In fact, the second run yielded a handful of proteins which were more closely related to 2cfsA than 2oycA.

The vast difference between the results obtained in both runs highlighted the disadvantages of DALI. While it is one of the more reliable protein fold comparison programs available, continuous changes in the field has resulted in inconsistencies (Novotny et al., 2004). This has been well illustrated from the results obtained in this study.

Other structural studies indicated the use of the Combinatorial Extension (CE) database. In this study, the CE database was not utilized due to software incompatibility. Whenever a query PDB ID was run against the database, an error message would appear. This could be because CE only accepts structures uploaded from a Mac or a Unix workstation (Novotny et al., 2004), which were not utilized during this study.

PDBsum, Pfam

PDBsum is a database which pictorially illustrates information on each macromolecule deposited in the PDB. Some of the features provided by PDBsum include:

  • Images of the query structure
  • Annotated plots of each protein chain's secondary structure
  • PROMOTIF-generated structural analyses, illustrated in the form of topology diagrams
  • Schematics of protein interactions (i.e. with ligand, DNA), represented in the form of LIGPLOT diagrams.

(Laskowski et al., 2001)


The protein chains are represented in the form of "wiring diagrams", a schematic of the protein chain's secondary structure motifs, primary sequence, structural domains and active sites (Laskowski et al., 2001). The secondary structures of 2cfsA and 2oycA were noted to be highly similar, the difference being that 2oycA did not have any disulphide bonds, nor indications of any active site(s), as opposed to 2cfsA.

The topology diagrams of both proteins (2cfsA and 2oycA) were the next bits of information obtained. These topology diagrams are PROMOTIF-generated schematics of the secondary structure motifs, and do not provide information as detailed as that of the "wiring diagrams" described in the previous paragraph (Laskowski et al., 2001). In this study, the topology diagrams indicated complete homology between the said proteins.

A cleft analysis for both proteins was conducted, and visualization was carried out (via PyMOL) with the aim of observing similarities between the potential catalytic sites of 2cfsA and 2oycA. Clefts in protein surfaces are of relative significance due to their relevance to binding sites. It has been hypothesized that the active site usually lies in the largest protein clefts/cavities (Laskowski et al. 1996). With the similarities observed in both proteins, it was deduced that 2cfsA and 2oycA's catalytic sites were located in the same region.

The results obtained are significant as structural information is usually crucial for the functional prediction of the protein-of-interest. Since the information provided by the secondary structures of both proteins are highly similar, there is a great possibility that 2cfsA and 2oycA are functionally similar.

A link to the pfam site pertaining to 2cfsA identified it as a member of the haloacid dehalogenase-like hydrolase family (http://pfam.sanger.ac.uk/family?acc=PF00702). Members of this protein family (which includes phosphatases) belong to the Haloacid Dehalogenase (HAD) superfamily. Almo et al., 2007 mentions that members of the HAD superfamily catalyze phosphoryl transfer reactions in small molecules. It encompasses a large number of magnesium-dependent phosphohydrolases which are responsible for the co-ordination of the catalytically active magnesium ions, where the first aspartic acid serves as the nucleophile and phosphoryl acceptor. Further evidence highlighting the importance of the magnesium ions lie in the fact that when substituted with calcium, a near total loss of activity is observed. This highlights the significance of the magnesium ion, and also ties in with the work done using the ProFunc database, which predicts that 2cfsA is involved in catalytic processes. An assumption, therefore, would be that the magnesium ions are involved in the catalytic functions of 2cfsA.

PROFUNC

The ProFunc database was utilized due to its ability to predict the likely functions of a protein with a known 3D structure. It makes use of both existing and new methods to analyse a query protein's sequence and structure, with the aim of identifying functional motifs which could indicate relationships to proteins whose functions are known (Laskowski et al., 2005).

Based on the results generated by DALI, it was concluded that 2cfsA and 2oycA were structurally similar and this could have been a crucial point in the functional determination of both proteins. A search on ProFunc, however, seemed to suggest that 2cftA was a much closer match to 2cfsA than 2oycA. In fact, based on the local alignment scoring system, 2cftA is identical to 2cfsA in terms of sequence and structure, according to the information stored in the PDB database. However, this does not rule out the potential significance of 2oycA, as it was the next closest protein hit after 2cftA.

It was important to note that unlike the information provided by NCBI (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=134104092), 2cfsA was noted to have 293 amino acid residues instead of 298 (NCBI). One reason for this discrepancy could be that in the ever-changing scientific field, existing information will not be spared from change.

In terms of secondary structure matching (SSM), however, 2cftA was NOT identical to 2cfsA, as reflected by their differing Q scores (1.000 and 0.981 respectively). The Q score takes into account the number of aligned residues, the rmsd scores and the size of the proteins; and a high Q score implies high similarity between the hit protein and the protein-of-interest. More interestingly, however, was the observation that 2oycA was a distant sixth on the list, four tiers below 2cftA. Such was the homology between the hit proteins and 2cfsA, however, that even at sixth, the deviation between 2cfsA and 2oycA was minimal.

To determine the potential active sites of 2cfsA, the nest analysis method was utilized. Based on an article by Pal et. al, 2002, the principle of this method revolves around the possibility that anion (negatively-charged) and cation (positively-charged) binding sites in proteins are made up of three amino acids, of which two exhibit "enantiomeric" main chain conformations. This simply means that the main chain torsion angles of the two adjacent amino acids are inverted about the centre of the Ramachandran plot. This results in the formation of "nests", which are defined as concave depressions which ultimately serve as binding sites. Using the nest analysis method, 2oycA obtained similar results to that of 2cfsA. There is therefore a good chance that both 2cfsA and 2oycA have similar catalytic sites. Keeping in mind that 2oycA is the PDB identifier for Chronophin, and that there is an article covering chronophin, it was used to identify potential structure-function relationships.

Visualizing the potential catalytic site of 2cfsA

Finally, a LIGPLOT of interactions involving the PLP ligand in 2cftA was obtained. The LIGPLOT was a good indication of the location of 2cftA's active site, highlighting the hydrogen bonds and non-bonded interactions between the ligand (in 2cfsA's case, the Magnesium ions) and the protein residues the ligand interacts with (Laskowski et al., 2001). It was also noticed that the Mg 1296(A) ion of 2cfsA was located in exactly the same position as 2cftA's calcium ion. This is significant as substitution of the Magnesium ion with the catalytically inert Calcium ion results in a loss of activity (Almo et al., 2007). Based on this, it was deduced that 2cfsA's active site is in the region surrounding Mg 1296(A). The LIGPLOT of interactions involving the Mg 1297(A) ion of 2cfsA was obtained, and this information - along with the information provided by the LIGPLOT involving the Mg 1296(A) - was used to generated a three-dimensional view of the catalytic site of 2cfsA via PyMOL. Compared to the earlier results based on the evolutionary aspect of the paper, the conserved regions - as obtained from the Multiple Sequence Alignment, was not too far off from the information (residues that constitute the catalytic site) obtained from Almo et al., 2007. PyMOL-generated visualizations of the possible catalytic sites using both sets of information illustrated this.

Function

Sequence and Structure

Search results from Pfam, FASTA and BLAST database all pinpoints that 2cfs_A belongs to HAD-like hydrolase superfamily. None of the search results indicates that 2cfs_A have other functions. Likewise for structure comparision, using Interpro and Profunc, top search results yield only members of the HAD-like hydrolase superfamily. The use of sequence homology and structure similarities in this case to predict functions seemed inefficient as search results based on sequence and structure are only able to pinpoint that 2cfs_A functions as a HAD-like hydrolase and are unable to give more information beyond that.

Functional Expression in Tissue

Function of Pyridoxal Phosphatase has been well established as it is a very important vitamin, involved in most cellular metabolism and reactions. However, the level of functional expression is markedly variable from tissue to tissue. In adults, PLP phosphatase was most highly expressed in all the regions of central nerve system except the spinal cord. High levels were also found in liver and testis. In fetus, expression levels of PLP phosphatase transcript showed a rather even distribution in all organs except the brain.It is interesting to note, levels of pyridoxal phosphatase in the brain for both adults and fetus are substantially higher, suggesting that pyridoxal phosphatase may have a specific functional role there.

Protein-Protein Interaction

Using STRING, predicted functional partners for protein of interest can be predicted, based on proximity of gene location, interaction discovered during experiments, textmining etc. This is very useful as there are evidence to form the basis for the prediction.

back to Pyridoxal Phosphatase main page