Paper: Difference between revisions
No edit summary |
JasonCheong (talk | contribs) No edit summary |
||
(43 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
<font size = " | <font size = "5">'''Abstract'''</font> | ||
Line 18: | Line 18: | ||
<font size = " | <font size = "5">'''Introduction'''</font> | ||
Line 70: | Line 70: | ||
<font size = " | <font size = "5">'''Results'''</font> | ||
Line 80: | Line 80: | ||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | {|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | ||
| | | 1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii | ||
61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm | 61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm | ||
Line 224: | Line 224: | ||
29: 3033-A 1mh9-A 9.2 3.2 146 194 15 0 0 15 S HYDROLASE deoxyribonucleotidase (mitochondrial 5'(3')- | 29: 3033-A 1mh9-A 9.2 3.2 146 194 15 0 0 15 S HYDROLASE deoxyribonucleotidase (mitochondrial 5'(3')- | ||
'''Figure 7. '''The DALI search results that were returned through e-mailed. The | '''Figure 7. '''The DALI search results that were returned through e-mailed. The first position (2gfh) shows the query protein. With a z value | ||
of 41.1 and a root mean standard deviation of 0.0 and %IDE of 100, shows that it is a HAD family protein. The | of 41.1 and a root mean standard deviation of 0.0 and %IDE of 100, shows that it is a HAD family protein. The 2nd, 9th, 16th, 19th and 28th | ||
significant similarities of query protein as a hydrolase phosphatase as Z values are more then 1, RMSD still of low values and %IDE of | shows significant similarities of query protein as a hydrolase phosphatase as Z values are more then 1, RMSD still of low values and %IDE of | ||
then 20.Z | more then 20.Z | ||
Line 244: | Line 244: | ||
Na and Cl are metals. | Na and Cl are metals. | ||
<font size = "4">'''Protein Structure'''</font> | |||
[[Image:Document7_04.png|framed|none]] | |||
'''Figure 8. '''Secondary structure of 2gfh protein with residue interaction and the catalytic residues marked out in red boxes. ''([http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl]'') | |||
[[Image:Document7_03.png|framed|none]] | |||
'''Figure 9.''' '''(A) '''Main, bottom and right view of 2gfh protein, the spheres represent the element/chemical components. '''(B) '''2gfh protein viewed using KiNG.''' (C) ''' Topology diagram of 2gfh showing the beta and alpha strand. ''([http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl]'') | |||
The structure of 2gfh protein was determined to a be polypeptide(L) with 260 residues. Secondary structure (Figure 8) comprises of 56% helical | |||
(13 helicals; 146 residues) and 11% beta sheet (8 strands; 31 residues) | |||
<font size = "4">'''Protein Folding'''</font> | |||
'''Table 1. ''' Matching folds detected by SSM and Dali, with scores values between the Neu5Ac-9-P phosphatase and other proteins.([http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/profunc/GetResults.pl?source=profunc&user_id=bb32&code=143144 ''http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/profunc/GetResults.pl?source=profunc&user_id=bb32&code=143144]'') | |||
{|border="2" cellspacing="0" cellpadding="4" width="100%" | |||
|align = "center"|Hit | |||
|align = "center"|Z-score | |||
|align = "center"|No. <br> SSE | |||
|align = "center"|RMSD <br> (Å) | |||
|align = "center"|Sequence <br> Id | |||
|align = "center"|PDB <br> entry | |||
|align = "center"|Name | |||
|- | |||
|align = "center"|1 | |||
|align = "center"|16.6 | |||
|align = "center"|16 | |||
|align = "center"|0.00 | |||
|align = "center"|100.0% | |||
|align = "center"|2gfhA | |||
|align = "center"|Crystal structure of protein c20orf147 homolog (17391249) from ''Mus musculus'' at 1.90 a resolution | |||
|- | |||
|align = "center"|2 | |||
|align = "center"|9.4 | |||
|align = "center"|16 | |||
|align = "center"|2.34 | |||
|align = "center"|23.0% | |||
|align = "center"|1x42A | |||
|align = "center"|Crystal structure of a haloacid dehalogenase family protein (ph0459) from ''Pyrococcus horikoshii'' ot3 | |||
|- | |||
|align = "center"|3 | |||
|align = "center"|9.2 | |||
|align = "center"|10 | |||
|align = "center"|1.63 | |||
|align = "center"|26.0% | |||
|align = "center"|1swwA | |||
|align = "center"|Crystal structure of the phosphonoacetaldehyde hydrolase d12a mutant complexed with magnesium and substrate phosphonoacetaldehyde | |||
|- | |||
|align = "center"|4 | |||
|align = "center"|9.3 | |||
|align = "center"|10 | |||
|align = "center"|1.66 | |||
|align = "center"|26.0% | |||
|align = "center"|1swvA | |||
|align = "center"|Crystal structure of the d12a mutant of phosphonoacetaldehyde hydrolase complexed with magnesium | |||
|- | |||
|align = "center"|5 | |||
|align = "center"|7.1 | |||
|align = "center"|11 | |||
|align = "center"|1.75 | |||
|align = "center"|24.4% | |||
|align = "center"|1fezA | |||
|align = "center"|The crystal structure of ''Bacillus cereus'' phosphonoacetaldehyde hydrolase complexed with tungstate, a product analog | |||
|- | |||
|align = "center"|6 | |||
|align = "center"|6.3 | |||
|align = "center"|12 | |||
|align = "center"|2.34 | |||
|align = "center"|20.5% | |||
|align = "center"|2p11A | |||
|align = "center"|Crystal structure of hypothetical protein (yp_553970.1) from ''Burkholderia xenovorans'' lb400 at 2.20 a resolution | |||
|- | |||
|align = "center"|8 | |||
|align = "center"|7.5 | |||
|align = "center"|11 | |||
|align = "center"|1.96 | |||
|align = "center"|26.8% | |||
|align = "center"|1rqlA | |||
|align = "center"|Crystal structure of phosponoacetaldehyde hydrolase complexed with magnesium and the inhibitor vinyl sulfonate | |||
|- | |||
|align = "center"|9 | |||
|align = "center"|7.3 | |||
|align = "center"|11 | |||
|align = "center"|1.96 | |||
|align = "center"|26.8% | |||
|align = "center"|1rqnA | |||
|align = "center"|Phosphonoacetaldehyde hydrolase complexed with magnesium | |||
|- | |||
|align = "center"|10 | |||
|align = "center"|6.7 | |||
|align = "center"|13 | |||
|align = "center"|2.44 | |||
|align = "center"|22.1% | |||
|align = "center"|2b0cA | |||
|align = "center"|The crystal structure of the putative phosphatase from ''Escherichia coli'' | |||
|} | |||
The high score values between Neu5Ac phosphatase and the other proteins (Table 1), proving that the folding of the different proteins match. | |||
The Z-score measures the statistical significance of a match in terms of standard Gaussian statistics. It is based on the quality of the match | |||
between the query and target structures and assumes a Gaussian distribution of quality scores would be obtained from a large enough databases | |||
of protein structures. The higher the Z-score, the higher is the statistical significance of the match is the number of matched secondary | |||
structure elements, examples; helices and strands between the two structures. | |||
<font size = "4">'''Sequence Similarity'''</font> | |||
{|cellspacing="0" cellpadding = "10" style="border-style:solid; border-color:black; border-width:1px;" | |||
| | |||
Hydrolase: domain 1 of 1, from 18 to 224: score 96.2, E = 1e-25 | |||
*->ikavvFDkDGTLtdgkeppiaeaiveaaaelgl.........lplee | |||
++av+FD+D+TL+d+ + + ++ + e+ ++l + + +++ ++ + | |||
query 18 VRAVFFDLDNTLIDT-AGASRRGMLEVIKLLQSkyhykeeaeIICDK 63 | |||
vekllgrgl.g.erilleggltaell...................d.evl | |||
v l +++ ++ ++ t ++ + +++++ ++++ ++ ++ | |||
query 64 VQVKLSKECfHpYSTCITDVRTSHWEeaiqetkggadnrklaeecYfLWK 113 | |||
glial.dklypgarealkaLkrrGikvailTggdr.naeallealgla.l | |||
++ ++ l +++++ l +L++ +++ +lT+gdr++++++ ea+++ ++ | |||
query 114 STRLQhMILADDVKAMLTELRKE-VRLLLLTNGDRqTQREKIEACACQsY 162 | |||
fdviidsdevggvgpivvgKPkpeifllalerlgvkpeevgpevlmVGDg | |||
fd+i++++e + KP+p if + ++ lgv+p ++ +mVGD+ | |||
query 163 FDAIVIGGEQK------EEKPAPSIFYHCCDLLGVQPGDC----VMVGDT 202 | |||
vnDapalaa.AGv.gvamgngg<-* | |||
+ +++ + +AG+++++++n + | |||
query 203 LETDIQGGLnAGLkATVWINKS 224 | |||
|} | |||
'''Figure 10. '''The alignments of the top-scoring domains of 2gfh protein (query) using Pfam 21.0 (Janelia Farm). ([http://pfam.janelia.org/ http://pfam.janelia.org]) | |||
A search of using Pfam (Figure 10) matched the query sequence in this case Neu5Ac-9-P phosphatase with hydrolase. The E value of 1e-25 gives | |||
significant results proving that it is not by chance nor random that the match made was a hydrolase. | |||
<font size = "4">'''Surface Properties'''</font> | |||
[[Image:Document7_02.png|framed|none]] | |||
'''Figure''' '''11. '''Molecular surface of 2gfh colored by electrostatic potential shown using Pymol. | |||
Using the PDB file name 2gfh, a model was constructed using Pymol showing the electrostatic potential of the molecular surface. As shown in | |||
Figure 11, the red color portions are negatively charged while the blue would be positively charged region. The charge ranges from -63.539 to | |||
63.539. | |||
[[Image:Document7_05.png]] | |||
'''Figure 12. (A) '''Molecular structure of 2gfh showing the possible binding sites with the different colors represent classes of amino | |||
acids. '''(B)''' Results from Profunc show that 2gfh comprises of 2 ligands: phostphate ion (PO<sub>4</sub>) and ethylene glycol (EDO). | |||
Profunc helps to identify the likely biochemical function of a protein from its 3 dimensional (3D) structure. It uses fold matching, residue | |||
conservation, surface cleft analysis, and functional 3D templates, to identify both the protein<nowiki>’</nowiki>s likely active site and | |||
possible homologues in the PDB. The search provided information on the possible binding sites and important identification of potential ligands | |||
like PO<sub>4</sub> and EDO. Based on comprehension and research, EDO (Figure 14) could most likely be a chemical compound widely used to | |||
crystallize protein from its native form and used as automotive antifreeze. Finding of the PO<sub>4</sub> ligand (Figure 13) was important as | |||
it would most likely be an active site. As Neu5Ac-9-P phosphatase is a hydrolase, the PO<sub>4</sub> could well be involved in the mechanism | |||
and function of the protein. | |||
[[Image:Document7_07.png]] | |||
'''Figure 13. (A) '''Molecular structure of 2gfh with the ligand PO<sub>4.</sub>''' (B) '''Molecular and chemical structure of PO<sub>4.</sub>''' (C) '''Ligand interaction involving PO<sub>4.</sub> | |||
[[Image:Document7_11.png]] | |||
'''Figure 14. (A) '''Molecular structure of 2gfh with the ligand EDO<sub>.</sub>''' (B) '''Molecular and chemical structure of EDO<sub>.</sub>''' (C) '''Ligand interaction involving EDO<sub>.</sub> | |||
[[Image:Document15_07.png]] | |||
'''Figure 15. '''Molecular structure of Neu5Ac-9-P was determined using RasMol, showing the conserved region of asparagine, threonine and leucine with EDO molecule in grey and PO<sub>4</sub> in yellow. | |||
'''Table 2. '''Number 4 shows siginificant scores implying possible convserved residues in N-acetylneuraminic acid phosphatase. | |||
{|border="2" cellspacing="0" cellpadding="4" width="100%" | |||
|align = "center"|No | |||
|align = "center"|Score | |||
|align = "center"|Number<br>of residues | |||
|align = "center"|Cleft | |||
|align = "center"|Average accessibility | |||
|align = "center"|Average conservation | |||
|align = "center"|Residues | |||
|- | |||
|align = "center"|1 | |||
|align = "center"|3.770 | |||
|align = "center"|3 | |||
|align = "center"|3 | |||
|align = "center"|2 | |||
|align = "center"|0.437 | |||
|align = "center"|Ser212(A), Gly213(A), Arg214(A) | |||
|- | |||
|align = "center"|2 | |||
|align = "center"|3.579 | |||
|align = "center"|3 | |||
|align = "center"|3 | |||
|align = "center"|<nowiki>-</nowiki> | |||
|align = "center"|0.913 | |||
|align = "center"|Ala201(A), Gly202(A), Leu203(A) | |||
|- | |||
|align = "center"|3 | |||
|align = "center"|3.483 | |||
|align = "center"|3 | |||
|align = "center"|3 | |||
|align = "center"|<nowiki>-</nowiki> | |||
|align = "center"|0.816 | |||
|align = "center"|Leu177(A), Gly178(A), Val179(A) | |||
|- | |||
|align = "center"|4 | |||
|align = "center"|3.000 | |||
|align = "center"|3 | |||
|align = "center"|3 | |||
|align = "center"|2 | |||
|align = "center"|1.000 | |||
|align = "center"|Asn15(A), Thr16(A), Leu17(A) | |||
|- | |||
|align = "center"|5 | |||
|align = "center"|0.646 | |||
|align = "center"|4 | |||
|align = "center"|4 | |||
|align = "center"|<nowiki>-</nowiki> | |||
|align = "center"|0.646 | |||
|align = "center"|Cys145(A), Ala146(A), Cys147(A), Gln148(A) | |||
|} | |||
Profunc also provided information of the conserved residues in Neu5Ac-9-P phosphatase. By using nest analysis whereby, nests are structural | |||
motifs that are often found in functionally important regions of protein structures and given a score value. When a score is above 2.0, it | |||
implies that the nest is a functionally significant one. The results were tabulated showing the nest<nowiki>’</nowiki>s start and end residues | |||
residues making up the nest. Residue conservation was given to each nest residue. The score ranges from 0.0 to 1.0 which signifies that the | |||
residue is not at all conserved or perfectly conserved respectively. It is determined from a multiple sequence alignment of the | |||
protein<nowiki>’</nowiki>s sequence against BLAST hits from UniProt sequence database. Results (Figure 15) show 2 highly conserved | |||
region asparagine, threonine and leucine as the residue conservation score was 1.0. | |||
<font size = "4">'''Functional analysis'''</font> | |||
The MSA (Figure 16) for the query sequence and the other 35 sequences shows several conserved motifs. The 1<sup>st</sup> conserved motif | |||
consists of almost invariant region of aspartic acid (D), only the 33<sup>rd</sup> protein (gi: <nowiki>|</nowiki>45552117<nowiki>|</nowiki>) | |||
showing gap. The 2<sup>nd</sup> motif shows conserved and invariant of leucine (L), threonine (T), asparagine (N) and glycine (G). The | |||
3<sup>rd</sup> motif shows 2 invariant amino acid residues of lysine (K), proline (P), valine (V), glycine (G), aspartic acid (D) and | |||
isoleucine (I). This correlates with the study done by Maliekal ''et al'' and strongly suggested that the query protein is a phosphatase. | |||
[[Image:Document9_01.png]] | |||
'''Figure 16. '''MSA of the query protein Neu5Ac phosphatase with 35 others proteins. Only the 60<sup>th</sup> – 70<sup>th</sup> and the | |||
210<sup>th</sup> -300<sup>th</sup> amino acid sequence were shown to illustrate the conserved and invariant regions. The 3 boxed-up sequences | |||
were either conserved or invariant regions. | |||
<font size = "5">'''Discussion'''</font> | |||
<font size = "4">'''Multiple Sequence Alignment'''</font> | |||
From the MSA obtained, the organisms with the large gap insertions were isolated to be mainly ''Bacillus'', with the exception | |||
of ''Symbiobacterium thermophilum''. ''Symbiobacterium'' is an uncultivable thermophile isolated from compost. Its survival is based mainly on | |||
microbial commensalisms <sup>5</sup><sup>. Th</sup>is bacterium can only grow ''in vitro'', if it is co-cultured with ''Bacillus'' species | |||
bacteria <sup>5</sup><sup>. Th</sup>is could therefore explain its genetic association with ''Bacillus'', as observed from the sequence | |||
alignment. However, interestingly, ''Bacillus'' is classified as Gram-positive, while ''Symbiobacterium'' is a Gram-negative bacterium. As | |||
observed from the sequence alignment, other Gram-negative bacterium protein sequences (''Vibrio'' species) do not contain the large gap | |||
insertion at the 91<sup>st</sup> to 114<sup>th</sup> amino acid positions, with the exception of ''Symbiobacterium''. Hence, more genetic (and | |||
even functional) analysis might be necessary to determine the hydrolase protein relationship between the Gram-positive ''Bacillus'' with the | |||
Gram-negative ''Symbiobacterium''. | |||
<font size = "4">'''Phylogenetic Tree'''</font> | |||
From the Rectangular Cladogram view of the tree, it was observed that there were two main Domains — Procaryotes and Eucaryotes. This would also | |||
be the root and first branching point of the phylogenetic tree. | |||
The invertebrates (of Phylum Arthropoda) would be the first branching point for the eucaryotes in this tree. | |||
From there, further branching occurs into the vertebrates (of Phylum Chordata). This would then be further branched into Osteichthyes (bony | |||
fish) and Tetrapoda (four-limbed vertebrates) Superclasses. | |||
For the prokaryotic domain, mainly branching occurs between Gram-positive (''Bacillus'' spp.) and Gram-negative (''Vibrio'' spp.) bacteria. | |||
Hence, it can be generally deduced that the Neu5Ac (hydrolase) protein is non-evolutionary specific, as it is observed to be present in almost | |||
all main Phyla and Classes of organisms from the two main Procaryotic and Eucaryotic Domains. Its functional significance would therefore be a | |||
general one. | |||
<font size = "4">'''Bootstrapping'''</font> | |||
Tree bootstrapping is necessary to test for the reliability of the branching patterns and distances formed on the phylogenetic tree. This was | |||
done by making "pseudoreplicates" of multiple sequence alignments of up to 100 sets. The distance matrices were recalculated using these d | |||
duplicate alignment values to generate a bootstrap tree, which can be used to compare the branching patterns and distances with the original | |||
phylogenetic tree.The bootstrap values (in percentage) obtained on each branch, signify branching confidence. Bootstrap values of 95% equate to | |||
full branching confidence; 75% value equates to 95% branching confidence; 60% value equates to much lowered branching confidence; while 50% | |||
value would render no branching confidence. | |||
<font size = "4">'''Functional Analysis'''</font> | |||
[[Image:Document17_01.png]] | |||
[[Image:Document17_03.png]] | |||
'''Figure 17. (A)''' List of all matched protein name terms for 2gfh.''' (B) '''List of all matched Gene Ontology terms for 2gfh. The score in | |||
red is a measure of how strongly the term is predicted from the hits obtained by the different methods. The scores in blue show each | |||
method<nowiki>’</nowiki>s contribution to the total score (with the number of relevant sequences/structures shown in brackets in grey). | |||
(http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/) | |||
The predicted function based on the evolution and structure, illustrate that 2gfh is a hydrolase. Profunc searches (Figure 17) on 2gfh also | |||
show that it possesses hydrolase activity. The highest score for Gene Ontology (Figure 17) states it used for metabolism and possesses | |||
phosphoglycolate phosphatase activity. Hydrolyase is an enzyme which catalyzes hydrolysis reaction (Figure 18), which is the addition of the | |||
hydrogen and hydroxyl ions of water to a molecule with its consequent splitting into two or more simpler molecules. Hydrolase is the systematic | |||
name for any enzyme of EC class 3. | |||
[[Image:Document18_01.png]] | |||
'''Figure 18. '''Hydrolyase catalyze the hydrosis of the chemical bond between A and B, resulting of 2 simple molecules. | |||
Neu5Ac phosphatase belongs to the HAD family, HAD is a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess | |||
phosphatase, phosphoglucomutase, phosphonatase, and dehalogenase activities <sup>6</sup>. HAD-like hydrolases represent the largest family of | |||
predicted small molecule phosphatases encoded in the genomes of bacteria, archaea, and eukaryotes, with 6,805 proteins in data bases <sup>7</sup>. | |||
HADs share little overall sequence similarity (15–30% identity), but they can be identified by the presence of three short | |||
conserved sequence motifs <sup>7</sup>. Most of the characterized HADs have phosphatase activity (CO–P bond hydrolysis), catalyze dehalogenase | |||
activity (C–halogen bond hydrolysis), phosphonatase (C–P bond hydrolysis), and phosphoglucomutase (CO–P bond hydrolysis and intramolecular | |||
phosphoryl transfer) reactions <sup>6</sup>. | |||
In the study conducted by Maliekal ''et al'' (Figure 19), they compared the alignment of the first 280 amino acids of rat and human Neu5Ac-9-P | |||
phostphatase with other 2 homologous sequences. | |||
[[Image:Document19_08.png]] | |||
'''Figure 19. ''' Alignment of rat and human Neu5Ac-9-P phosphatase with homologous sequences. The following sequences are aligned: ''Rattus | |||
norvegicus'' (Rnor, gi-34859431), ''Homo sapiens'' (Hsap, gi-23308749), ''Xenopus laevis'' (Xlae, gi-46250196), ''Danio rerio'' (Drer, gi- | |||
63101958), and ''Drosophila melanogaster'' (Dmel, gi-28381565). Only the first 280 residues of the latter sequence are shown. Completely | |||
conserved residues are shown in boldface type. Asterisks indicate the extremely conserved residues in phosphatases of the HAD family <sup>8</sup>. | |||
The MSA done by Maliekal ''et al'' shows that the Neu5Ac-9-Pase orthologs shared the three motifs found in phosphatases of the HAD family, | |||
namely a 1<sup>st</sup> motif comprising two extremely conserved aspartates (D), a 2<sup>nd</sup> motif comprising a conserved serine (S) or | |||
threonine (T), and a 3<sup>rd</sup> motif comprising a conserved lysine (K) and two conserved aspartates (D) <sup>8</sup>. The first aspartate | |||
in the first motif forms a phosphoaspartate during the catalytic cycle <sup>9</sup>. These findings suggested therefore that the HDHD4 protein | |||
was a phosphatase. The first aspartate in the first motif forms a phosphoaspartate during the catalytic cycle <sup>10</sup>. In our MSA (Figure | |||
16), the several conserved motifs that shared great similarity to the study done by Maliekal ''et al''. These findings suggested therefore that | |||
Neu5Ac-9-P phosphatase protein is a phosphatase | |||
Phosphatases of the HAD family are dependent on the presence of Mg<sup>2<nowiki>+</nowiki> </sup>and Ca<sup>2<nowiki>+</nowiki></sup> inhibits | |||
their activity by replacing Mg<sup>2<nowiki>+</nowiki> </sup>and preventing the nucleophilic attack by the aspartate that covalently binds the | |||
phosphate group <sup>8</sup>. Phosphatases that form a phosphoenzyme during the catalytic cycle, are inhibited by vanadate <sup>11</sup>. | |||
Vanadate (VO<sub>4</sub><sup>3−</sup>), formed when V<sub>2</sub>O<sub>5</sub> is dissolved in water at alkaline pH, appears to inhibit enzymes | |||
that process phosphate. | |||
The presence of a protein sharing at least about 50% sequence identity with rat or human Neu5Ac-9-P phosphatase in the genomes of mammals, | |||
chicken, xenopus, and fishes indicates that sialic acid synthesis proceeds via the 9-phosphate intermediate in these species <sup>8</sup>. This | |||
is consistent with the finding that the genome of vertebrates comprises a gene encoding the bifunctional enzyme UDP-'' N''-acetylglucosamine-2- | |||
epimerase or ''N''-acetylmannosamine kinase <sup>8</sup><sup>.</sup> | |||
In bacteria, ''E. coli'' genome encodes five membrane-bound and 23 soluble HAD-like hydrolases, representing about 40% of the ''E. coli'' | |||
proteins with known or predicted small molecule phosphatase activity <sup>12</sup>. The metabo lites hydrolyzed by HADs are intermediates of | |||
various metabolic pathways and reactions (glycolysis, pentose phosphate pathway, gluconeogenesis, and intermediary sugar and nucleotide | |||
metabolism). | |||
''E. coli ''HADs hydrolyze a wide range of phosphorylated metabolites, including carbohydrates, nucleotides, organic acids, and coenzymes. | |||
Studies have shown that the most common substrates in metabolism such as glycolysis and pentose phosphate pathway (Figure 18). These enzymes | |||
were fructose-1-phosphate, glucose-6-phosphate, mannose-6-phosphate, 2-deoxyglucose-6-phosphate, fructose-6- phosphate, ribose-5-phosphate, and | |||
erythrose- 4-phosphate <sup>13</sup>. | |||
[[Image:Document20_01.png]] | |||
[[Image:Document20_02.png]] | |||
'''Figure 20. '''The schematic diagrams of glycolysis and pentose phosphate metabolic pathways. The green arrows show the substrates that are hydrolyzed by HADs '''(A)''' Glycolysis pathway with substrates that are hydrolyze by HADs: glucose 6-phosphate, fructose 6-phosphate and dihydroxyacetone phosphate. '''(B)''' Pentose phostphate pathway with substrates that are hydrolyze by HADs: glucose-6-phosphate, fructose-6-phosphate, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, gluconate 6-phosphate and erythrose-4-phosphate. | |||
(http://www.steve.gb.com/science/core_metabolism.html) | |||
<font size = "4">'''Methods and Materials'''</font> | |||
<font size = "4">'''Query Sequence'''</font> | |||
Sequences of N-acetylneuraminic acid phosphatase from House Mouse (''Mus musculus'') were obtained from Genbank protein database, with Accession number of 2GFH_A. | |||
<font size = "4">'''Sequence Homology'''</font> | |||
The query sequence was matched to related (amino acid sequence similarity) proteins from Blast. This was done using a fixed database stored | |||
within a DVD, instead of obtaining the query search from the actual BlastP database on the World Wide Web. | |||
<font size = "4">'''Multiple Sequence Alignment'''</font> | |||
Alignment was performed on all the related proteins (from the BlastP search), using ClustalX. Similarly, the ClutalX programme used for this | |||
was obtained from the DVD, instead of the website. | |||
<font size = "4">'''Phylogenetic Tree'''</font> | |||
Phylip programme was used for the purpose of obtaining a phylogentic tree to determine the relationship of the proteins from individual | |||
organisms. The various programmes used were again obtained from the DVD.Prodist (within Phylip) was used to calculate the distance matrix. The | |||
calculation method selected was as using PAM-Dayhoff.Neighbor (also found within Phylip) was next used to form the phylogenetic tree, using the | |||
distance matrix calculation obtained. The "Input order of species" option was set to "Random" when generating the tree, with a random odd | |||
number also given.Treeview programme was used to view the final tree. | |||
<font size = "4">'''Bootstrapping'''</font> | |||
Seqboot (within Phylip) was used to replicate 100 samples of the sequence alignments. | |||
The outfile (.aln) was then used in calculating the bootstrap distance matrices, using Prodist. The parameter setting for this calculation was | |||
similar to the initial distance matrix calculation, using PAM-Dayhoff method. An added parameter was including multiple data sets, of 100 | |||
replicates. | |||
This outfile (.dis) was run through Neighbor. The parameter settings were again similar to the previous generation of the earlier phylogenetic | |||
tree. An added parameter, as was with the bootstrap distance matrix calculations, was the inclusion of multiple data sets of 100 replicates. | |||
The treefile (.ph) was run through Consense (within Phylip) to obtain the final bootstrapped phylogenetic tree. Bootstrap branch values were | |||
also obtained to determine the reliability of the tree branches. | |||
<font size = "4">'''Replacing organism identifiers on phylogenetic tree'''</font> | |||
An online World Wide Web programme — Kenegdo server, was used in converting organism identifiers from within the tree, to their species names. | |||
<font size = "4">'''Protein Folding'''</font> | |||
First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Neu5Ac-9-P phosphatase is a | |||
haloacid dehalogenase-like hydrolase. Searching the PDB was then done to source for the structures of biological macromolecules and their | |||
relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used | |||
to compare the alignments between the query protein and its neigbhours. | |||
<font size = "4">'''Sequence Similarity'''</font> | |||
Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In | |||
order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to | |||
analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches. | |||
The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It | |||
uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both | |||
the protein<nowiki>’</nowiki>s likely active site and possible homologues in the PDB. | |||
<font size = "4">'''Surface Properties'''</font> | |||
RasMol which is a molecular graphics program was used for the visualisation of proteins, nucleic acids and small molecules while PyMOL, a | |||
molecular graphics system with an embedded Python interpreter designed for real-time visualization and rapid generation of high-quality | |||
molecular graphics images and animations was performed to assist you in the research. | |||
<font size = "5">'''References'''</font> | |||
1. Lawrence, S. M., Huddleston, K. A., Pitts, L. R., Nguyen, N., Lee, Y. C., Vann, W. F., Coleman, T. A. & Betenbaugh, M. J. (2000). Cloning and expression of the human N-acetylneuraminic acid phosphate synthase gene with 2-keto-3-deoxy-D-glycero-D-galactonononic acid biosynthetic ability. J. Biol Chem 275, 17869–17877. | |||
2. Schauer, R. (2000). Achievements and challenges of sialic acid research. Glycoconj. J 17, 485-499. | |||
3. Varki, A. (1997). Sialic acids as ligands in recognition phenomena. FASEB J. 11, 248-255. | |||
4. Angata, T. & Varki, A. (2002). Chemical diversity in the sialic acids and related alpha-keto acids:an evolutionary perspective. Chem Rev 102, 439-469. | |||
5. Institute, E. B. (2007). http://www.ebi.ac.uk/2can/genomes/bacteria/Symbiobacterium_thermophilum.html European Bioinformatics Institute. | |||
6. Calderone, V., Forleo, C., Benvenuti, M., Thaller, M. C., Rossolini, G. M. & Mangani, S. (2004). The First Structure of a Bacterial Class B Acid Phosphatase Reveals Further Structural Heterogeneity Among Phosphatases of the Haloacid Dehalogenase Fold. J. Mol. Biol 335, 761–773. | |||
7. Koonin, E. V. & Tatusov, R. L. (1994). A genomic perspective on protein families. J. Mol. Biol. 244, 125-132. | |||
8. Maliekal, P., Vertommen, D., Delpierre, G. & Schaftingen, E. V. (2006). Identification of the sequence encoding N-acetylneuraminate-9-phosphate phosphatase. Glycobiology 16, 165–172. | |||
9. Collet, J.-F., Stroobant, V. & Van Schaftingen, E. (1999). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107–14112. | |||
10. Collet, J.-F., Stroobant, V., Pirard, M., Delpierre, G. & Van Schaftingen, E. (1998). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107-14112. | |||
11. Macara, I. G. (1980). Vanadium, an element in search of a role. Trends Biochem Sci 5, 92-94. | |||
12. Keseler, I. M., Collado-Vides, J., Gama-Castro, S., Ingraham, J., Paley, S., Paulsen, I. T., Peralta-Gil, M. & Karp, P. D. (2005). EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33, D334–D337. | |||
13. Kuznetsova, E., Proudfoot, M., Gonzalez, C. F., Brown, G., Omelchenko, M. V., Borozan, I., Carmel, L., Wolf, Y. I., Mori, H. & Yakunin, A. F. (2006). Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. J. Biol Chem 281, 36149–36161. |
Latest revision as of 03:28, 12 June 2007
Evolution, Structure and Function of N-acetylneuraminic Acid Phosphatase
Jason Cheong Wen Leong (s41235935), Yau Heen wai (s41286272), Lim Junxian (s41313011)
Abstract
N-acetylneuraminic acid phosphatase a novel protein investigated by our group. With its structure and sequence known, the function was
assumed to be a part of the enormous family of haloacid dehalogenase-like hydrolases. It represent the family of predicted small molecule
phosphatases related by sequence cleave sites and reactions in the genomes of bacteria, archaea, and eukaryotes. Many have evolved to be used
for specific biological functions within individual organism
Introduction
The novel protein investigated by our group is N-acetylneuraminic acid (Neu5Ac) phosphatase, it was first release on Protein Data Bank
(PDB) on 18th April 2006, named 2gfh. Mus muscular (mouse) was used as the source of the gene and Escherichia coli was the
vector used to express the novel protein. In Homo sapiens (man), it was known to be as N-acetylneuraminate 9-phosphate (Neu5Ac-9-P)
phosphatase haloacid dehalogenase (HAD)-like hydrolase domain containing protein 4. Other aliases of the novel protein include C20orf147, NANP
and HDHD4. The gene encoding the protein was found to be on chromosome 20; location 20p11.1.
Neu5Ac-9-P phosphatase belongs to a large family of haloacid dehalogenase (HAD)-like hydrolases. The enzymes found within this classification
possess varied types of cleavage activities. Although many of its members are related by sequence cleave sites and reactions, many have evolved
to be used for specific biological functions within individual organisms.
These small molecule phosphatase enzymes have been found to exists in the various domains of life — Bacteria, Archaea, and Eucarya. The number
of genes found within each organism is varied from bacteria to eukaryotes. Bacterial Neu5Ac synthase and mammalian Neu5Ac-9-P synthase are
homologous proteins, sharing about 35% sequence identity1. Neu5Ac-9-P phosphatase dephosphorylates Neu5Ac-9-P to form Neu5Ac, the
main form of sialic acid.
Figure 1. Dephosphorylation of Neu5Ac-9-P is a reversible reaction with an end product of Neu5Ac (sialic acid) and a free phosphate.
Sialic acids are nine-carbon sugars with a carboxylate group that are found as components of many glycoproteins, glycolipids, and
polysaccharides in animals, viruses, and bacteria. The main form of sialic acid, Neu5Ac, is often present as the terminal sugar of N-
glycans on glycoproteins and glycolipids and plays an important role in protein–protein and cell–cell recognition 2; 3.
Figure 2. Chemical structure of sialic acid.(http://en.wikipedia.org/wiki/Sialic_acid)
Sialic acids are found widely distributed in animal tissues and in bacteria, especially in glycoproteins and gangliosides. The amino group
bears either an acetyl or a glycolyl group. Sialic acid consists of acetylated, sulfated, methylated, and lactylated derivatives and is a large
family of more than 50 members 4.
Results
Query Sequence
The amino acid query sequence of 2gfh protein (Figure 3) from Mus musculus is obtained from Genbank.
1 mgsdkihhhh hhmglsrvra vffdldntli dtagasrrgm levikllqsk yhykeeaeii
61 cdkvqvklsk ecfhpystci tdvrtshwee aiqetkggad nrklaeecyf lwkstrlqhm 121 iladdvkaml telrkevrll lltngdrqtq rekieacacq syfdaivigg eqkeekpaps 181 ifyhccdllg vqpgdcvmvg dtletdiqgg lnaglkatvw inksgrvplt sspmphymvs 241 svlelpallq sidckvsmsv> |
Figure 3. The 260 amino acid sequence of 2gfh protein.
Sequence Homology
From the BlastP similarity was used for comparison as these had shown higher homology to the query sequence sequence search, a total of 500
proteins were yielded.Only a total of 38 proteins, in contrast with the remainder of the search results.These proteins were chosen according to
their bit scores and E-values. Two more outlier partial sequences contributing to poor overall alignment (huge deletion gaps) were subsequently
removed. The remaining 36 sequences were used for the generation of the phylogenetic tree (and bootstrapped tree as well).
Multiple Sequence Alignment
The following multiple sequence alignment (MSA) was obtained (Figure 4). From the alignments, gi|10888xy and
gi|10888yz are representative of gi|108881764 and gi|108881765 respectively. Both these
hypothetical proteins belong to the mosquito Aedes aegypti.
The identifier numbers for these two proteins were initially changed to an alpha-numeric one, due to the inability of Phylip to generate a tree
from the original identifiers. This was due to the fact that the programme only took the first five numeric digits (10888), thereby resulting
in a programme error prompt which listed both proteins as duplicates (from the identifier numbers). Both these identifiers were subsequently
renamed for the final phylogenetic tree.
Figure 4. MSA of query (top-most sequence – No.1) and related sequences.
From the MSA, it can be observed that there are generally slight domain conservations throughout the protein sequences. Small insertion and
deletion gaps were noticeable along the alignment as well. A particularly large insertion gap was observed between amino acids 91 to 114.
The organisms with the large insertion gaps were as identified below:
Bacillus licheniformis
Bacillus subtilis
Bacillus halodurans
Bacillus clausii
Symbiobacterium thermophilum
A highly conserved (with invariant) section of amino acids (LV)–(LVA)–(LIV)–(LIV)-T-N-G was observed in all the sequences from amino acid 211
to 217 in the alignment. Downstream of this conserved portion of genes are 5 more invariant positions (1 or 2 amino acids in length).From these
short conservation regions, the functions or even structure of the encoded proteins could have significance in its evolutionary pattern.
Phylogenetic Tree
The tree was plotted to obtain the phylogenetic lineage (Figure 5).
Figure 5. (A) Phylogenetic tree showing organisms with related protein sequence homology in Radial Tree view. (B) Rectangular
Cladogram view with related protein sequence homology.
From the Rectangular Cladogram view, it could be observed that there are four distinct separate groups involving fishes, mammals (where the
query protein is also mapped), bacteria and insects.
Bootstrapping
Bootstrapping values obtained were analysed. Branch values occurring below 75% (<75%) would be indicated by an asterisk (*),
as shown in Figure 6.
Figure 6. Branch bootstrap values in Rectangular Cladogram view. Branches with strap values <75% were indicated with
asterisks (*)
DALI Searching
SUMMARY: PDB/chain identifiers and structural alignment statistics NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN
1: 3033-A 2gfh-A 41.1 0.0 246 246 100 0 0 1 S HYDROLASE haloacid dehalogenase-like hydrolase domain 2: 3033-A 1fez-A 18.1 3.5 178 256 22 0 0 13 S HYDROLASE phosphonoacetaldehyde hydrolase (bacillus c 3: 3033-A 2hsz-A 17.9 3.3 168 222 23 0 0 13 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION novel predicted 4: 3033-A 1qq5-A 17.3 3.1 198 245 19 0 0 12 S HYDROLASE l-2-haloacid dehalogenase (xanthobacter aut 5: 3033-A 1o03-A 17.0 5.0 188 221 20 0 0 11 S ISOMERASE beta-phosphoglucomutase (lactococcus lactis 6: 3033-A 2b0c-A 16.4 2.6 184 199 20 0 0 13 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION putative phospha 7: 3033-A 2fdr-A 15.8 4.4 190 214 19 0 0 15 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION conserved hypoth 8: 3033-A 2p11-A 15.7 2.9 194 211 16 0 0 20 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 9: 3033-A 1te2-A 15.7 3.6 170 211 19 0 0 15 S HYDROLASE putative phosphatase (escherichia coli o157 10: 3033-A 1yns-A 15.3 4.0 169 254 11 0 0 13 S HYDROLASE e-1 enzyme (enolase-phosphatase e1) (homo s 11: 3033-A 1qyi-A 15.0 3.5 198 375 19 0 0 17 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 12: 3033-A 2i6x-A 14.9 3.1 176 199 19 0 0 18 S HYDROLASE hydrolase, haloacid dehalogenase-like family 13: 3033-A 1u7p-A 14.3 2.9 144 164 18 0 0 14 S HYDROLASE magnesium-dependent phosphatase-1 (mdp-1) ( 14: 3033-A 1ymq-A 14.1 2.3 130 260 16 0 0 14 S TRANSFERASE sugar-phosphate phosphatase bt4131 (bacte 15: 3033-A 1j8d-A 13.1 2.5 141 180 11 0 0 12 S HYDROLASE deoxy-d-mannose-octulosonate 8-phosphate ph 16: 3033-A 2ho4-A 12.9 2.4 131 246 19 0 0 14 S HYDROLASE haloacid dehalogenase-like hydrolase domain 17: 3033-A 1pw5-A 12.7 2.3 136 246 21 0 0 12 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION nagd protein, pu 18: 3033-A 1nf2-A 12.7 2.6 127 267 13 0 0 11 S STRUCTURAL GENOMICS/UNKNOWN FUNCTION phosphatase (the 19: 3033-A 1rlm-A 12.4 2.8 131 269 13 0 0 14 S HYDROLASE phosphatase Mutant (escherichia coli) bacte 20: 3033-A 1f5s-A 12.1 3.5 159 210 14 0 0 15 S HYDROLASE phosphoserine phosphatase (psp) (methanoco 21: 3033-A 1cr6-B 12.0 3.8 177 541 18 0 0 18 S HYDROLASE epoxide hydrolase (mus musculus) mouse expr 22: 3033-A 1rku-A 11.9 3.6 172 206 11 0 0 18 S TRANSFERASE homoserine kinase (pseudomonas aeruginosa 23: 3033-A 2b30-A 11.8 2.7 134 284 16 0 0 12 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION pvivax hypotheti 24: 3033-A 1kyt-A 10.5 2.5 122 216 13 0 0 15 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 25: 3033-A 2o2x-A 10.3 3.6 139 204 17 0 0 14 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 26: 3033-A 1u02-A 10.1 2.7 128 222 16 0 0 12 S STRUCTURAL GENOMICS trehalose-6-phosphate phosphatase 27: 3033-A 2fea-A 10.0 3.5 167 219 7 0 0 21 S HYDROLASE 2-hydroxy-3-keto-5-methylthiopentenyl-1- pho 28: 3033-A 2hx1-A 9.6 3.2 130 275 24 0 0 19 S HYDROLASE predicted sugar phosphatases of the had supe 29: 3033-A 1mh9-A 9.2 3.2 146 194 15 0 0 15 S HYDROLASE deoxyribonucleotidase (mitochondrial 5'(3')-
Figure 7. The DALI search results that were returned through e-mailed. The first position (2gfh) shows the query protein. With a z value
of 41.1 and a root mean standard deviation of 0.0 and %IDE of 100, shows that it is a HAD family protein. The 2nd, 9th, 16th, 19th and 28th
shows significant similarities of query protein as a hydrolase phosphatase as Z values are more then 1, RMSD still of low values and %IDE of
more then 20.Z
From the DALI search (Figure 7), Neu5Ac phosphatase is a haloacid dehalogenase-like hydrolase. This family is structurally different from the
alpha/ beta hydrolase family. It has L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. This family consists of two domains of
structure. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of (S)-2-
haloacid dehalogenase I. The remaining of the fold is composed of the core alpha/beta domain. It is classified as a hydrolase found in mouse.
The chemical components would be phosphate ion, sodium ion, 1,2-ethanediol, chloride ion. PO4 and EDO are ligands while
Na and Cl are metals.
Protein Structure
Figure 8. Secondary structure of 2gfh protein with residue interaction and the catalytic residues marked out in red boxes. (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl)
Figure 9. (A) Main, bottom and right view of 2gfh protein, the spheres represent the element/chemical components. (B) 2gfh protein viewed using KiNG. (C) Topology diagram of 2gfh showing the beta and alpha strand. (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/CheckCode.pl)
The structure of 2gfh protein was determined to a be polypeptide(L) with 260 residues. Secondary structure (Figure 8) comprises of 56% helical
(13 helicals; 146 residues) and 11% beta sheet (8 strands; 31 residues)
Protein Folding
Table 1. Matching folds detected by SSM and Dali, with scores values between the Neu5Ac-9-P phosphatase and other proteins.(http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/profunc/GetResults.pl?source=profunc&user_id=bb32&code=143144)
Hit | Z-score | No. SSE |
RMSD (Å) |
Sequence Id |
PDB entry |
Name |
1 | 16.6 | 16 | 0.00 | 100.0% | 2gfhA | Crystal structure of protein c20orf147 homolog (17391249) from Mus musculus at 1.90 a resolution |
2 | 9.4 | 16 | 2.34 | 23.0% | 1x42A | Crystal structure of a haloacid dehalogenase family protein (ph0459) from Pyrococcus horikoshii ot3 |
3 | 9.2 | 10 | 1.63 | 26.0% | 1swwA | Crystal structure of the phosphonoacetaldehyde hydrolase d12a mutant complexed with magnesium and substrate phosphonoacetaldehyde |
4 | 9.3 | 10 | 1.66 | 26.0% | 1swvA | Crystal structure of the d12a mutant of phosphonoacetaldehyde hydrolase complexed with magnesium |
5 | 7.1 | 11 | 1.75 | 24.4% | 1fezA | The crystal structure of Bacillus cereus phosphonoacetaldehyde hydrolase complexed with tungstate, a product analog |
6 | 6.3 | 12 | 2.34 | 20.5% | 2p11A | Crystal structure of hypothetical protein (yp_553970.1) from Burkholderia xenovorans lb400 at 2.20 a resolution |
8 | 7.5 | 11 | 1.96 | 26.8% | 1rqlA | Crystal structure of phosponoacetaldehyde hydrolase complexed with magnesium and the inhibitor vinyl sulfonate |
9 | 7.3 | 11 | 1.96 | 26.8% | 1rqnA | Phosphonoacetaldehyde hydrolase complexed with magnesium |
10 | 6.7 | 13 | 2.44 | 22.1% | 2b0cA | The crystal structure of the putative phosphatase from Escherichia coli |
The high score values between Neu5Ac phosphatase and the other proteins (Table 1), proving that the folding of the different proteins match.
The Z-score measures the statistical significance of a match in terms of standard Gaussian statistics. It is based on the quality of the match
between the query and target structures and assumes a Gaussian distribution of quality scores would be obtained from a large enough databases
of protein structures. The higher the Z-score, the higher is the statistical significance of the match is the number of matched secondary
structure elements, examples; helices and strands between the two structures.
Sequence Similarity
Hydrolase: domain 1 of 1, from 18 to 224: score 96.2, E = 1e-25 *->ikavvFDkDGTLtdgkeppiaeaiveaaaelgl.........lplee ++av+FD+D+TL+d+ + + ++ + e+ ++l + + +++ ++ + query 18 VRAVFFDLDNTLIDT-AGASRRGMLEVIKLLQSkyhykeeaeIICDK 63 vekllgrgl.g.erilleggltaell...................d.evl v l +++ ++ ++ t ++ + +++++ ++++ ++ ++ query 64 VQVKLSKECfHpYSTCITDVRTSHWEeaiqetkggadnrklaeecYfLWK 113 glial.dklypgarealkaLkrrGikvailTggdr.naeallealgla.l ++ ++ l +++++ l +L++ +++ +lT+gdr++++++ ea+++ ++ query 114 STRLQhMILADDVKAMLTELRKE-VRLLLLTNGDRqTQREKIEACACQsY 162 fdviidsdevggvgpivvgKPkpeifllalerlgvkpeevgpevlmVGDg fd+i++++e + KP+p if + ++ lgv+p ++ +mVGD+ query 163 FDAIVIGGEQK------EEKPAPSIFYHCCDLLGVQPGDC----VMVGDT 202 vnDapalaa.AGv.gvamgngg<-* + +++ + +AG+++++++n + query 203 LETDIQGGLnAGLkATVWINKS 224 |
Figure 10. The alignments of the top-scoring domains of 2gfh protein (query) using Pfam 21.0 (Janelia Farm). (http://pfam.janelia.org)
A search of using Pfam (Figure 10) matched the query sequence in this case Neu5Ac-9-P phosphatase with hydrolase. The E value of 1e-25 gives
significant results proving that it is not by chance nor random that the match made was a hydrolase.
Surface Properties
Figure 11. Molecular surface of 2gfh colored by electrostatic potential shown using Pymol.
Using the PDB file name 2gfh, a model was constructed using Pymol showing the electrostatic potential of the molecular surface. As shown in
Figure 11, the red color portions are negatively charged while the blue would be positively charged region. The charge ranges from -63.539 to
63.539.
Figure 12. (A) Molecular structure of 2gfh showing the possible binding sites with the different colors represent classes of amino
acids. (B) Results from Profunc show that 2gfh comprises of 2 ligands: phostphate ion (PO4) and ethylene glycol (EDO).
Profunc helps to identify the likely biochemical function of a protein from its 3 dimensional (3D) structure. It uses fold matching, residue
conservation, surface cleft analysis, and functional 3D templates, to identify both the protein’s likely active site and
possible homologues in the PDB. The search provided information on the possible binding sites and important identification of potential ligands
like PO4 and EDO. Based on comprehension and research, EDO (Figure 14) could most likely be a chemical compound widely used to
crystallize protein from its native form and used as automotive antifreeze. Finding of the PO4 ligand (Figure 13) was important as
it would most likely be an active site. As Neu5Ac-9-P phosphatase is a hydrolase, the PO4 could well be involved in the mechanism
and function of the protein.
Figure 13. (A) Molecular structure of 2gfh with the ligand PO4. (B) Molecular and chemical structure of PO4. (C) Ligand interaction involving PO4.
Figure 14. (A) Molecular structure of 2gfh with the ligand EDO. (B) Molecular and chemical structure of EDO. (C) Ligand interaction involving EDO.
Figure 15. Molecular structure of Neu5Ac-9-P was determined using RasMol, showing the conserved region of asparagine, threonine and leucine with EDO molecule in grey and PO4 in yellow.
Table 2. Number 4 shows siginificant scores implying possible convserved residues in N-acetylneuraminic acid phosphatase.
No | Score | Number of residues |
Cleft | Average accessibility | Average conservation | Residues |
1 | 3.770 | 3 | 3 | 2 | 0.437 | Ser212(A), Gly213(A), Arg214(A) |
2 | 3.579 | 3 | 3 | - | 0.913 | Ala201(A), Gly202(A), Leu203(A) |
3 | 3.483 | 3 | 3 | - | 0.816 | Leu177(A), Gly178(A), Val179(A) |
4 | 3.000 | 3 | 3 | 2 | 1.000 | Asn15(A), Thr16(A), Leu17(A) |
5 | 0.646 | 4 | 4 | - | 0.646 | Cys145(A), Ala146(A), Cys147(A), Gln148(A) |
Profunc also provided information of the conserved residues in Neu5Ac-9-P phosphatase. By using nest analysis whereby, nests are structural
motifs that are often found in functionally important regions of protein structures and given a score value. When a score is above 2.0, it
implies that the nest is a functionally significant one. The results were tabulated showing the nest’s start and end residues
residues making up the nest. Residue conservation was given to each nest residue. The score ranges from 0.0 to 1.0 which signifies that the
residue is not at all conserved or perfectly conserved respectively. It is determined from a multiple sequence alignment of the
protein’s sequence against BLAST hits from UniProt sequence database. Results (Figure 15) show 2 highly conserved
region asparagine, threonine and leucine as the residue conservation score was 1.0.
Functional analysis
The MSA (Figure 16) for the query sequence and the other 35 sequences shows several conserved motifs. The 1st conserved motif
consists of almost invariant region of aspartic acid (D), only the 33rd protein (gi: |45552117|)
showing gap. The 2nd motif shows conserved and invariant of leucine (L), threonine (T), asparagine (N) and glycine (G). The
3rd motif shows 2 invariant amino acid residues of lysine (K), proline (P), valine (V), glycine (G), aspartic acid (D) and
isoleucine (I). This correlates with the study done by Maliekal et al and strongly suggested that the query protein is a phosphatase.
Figure 16. MSA of the query protein Neu5Ac phosphatase with 35 others proteins. Only the 60th – 70th and the
210th -300th amino acid sequence were shown to illustrate the conserved and invariant regions. The 3 boxed-up sequences
were either conserved or invariant regions.
Discussion
Multiple Sequence Alignment
From the MSA obtained, the organisms with the large gap insertions were isolated to be mainly Bacillus, with the exception
of Symbiobacterium thermophilum. Symbiobacterium is an uncultivable thermophile isolated from compost. Its survival is based mainly on
microbial commensalisms 5. This bacterium can only grow in vitro, if it is co-cultured with Bacillus species
bacteria 5. This could therefore explain its genetic association with Bacillus, as observed from the sequence
alignment. However, interestingly, Bacillus is classified as Gram-positive, while Symbiobacterium is a Gram-negative bacterium. As
observed from the sequence alignment, other Gram-negative bacterium protein sequences (Vibrio species) do not contain the large gap
insertion at the 91st to 114th amino acid positions, with the exception of Symbiobacterium. Hence, more genetic (and
even functional) analysis might be necessary to determine the hydrolase protein relationship between the Gram-positive Bacillus with the
Gram-negative Symbiobacterium.
Phylogenetic Tree
From the Rectangular Cladogram view of the tree, it was observed that there were two main Domains — Procaryotes and Eucaryotes. This would also
be the root and first branching point of the phylogenetic tree.
The invertebrates (of Phylum Arthropoda) would be the first branching point for the eucaryotes in this tree.
From there, further branching occurs into the vertebrates (of Phylum Chordata). This would then be further branched into Osteichthyes (bony
fish) and Tetrapoda (four-limbed vertebrates) Superclasses.
For the prokaryotic domain, mainly branching occurs between Gram-positive (Bacillus spp.) and Gram-negative (Vibrio spp.) bacteria.
Hence, it can be generally deduced that the Neu5Ac (hydrolase) protein is non-evolutionary specific, as it is observed to be present in almost
all main Phyla and Classes of organisms from the two main Procaryotic and Eucaryotic Domains. Its functional significance would therefore be a
general one.
Bootstrapping
Tree bootstrapping is necessary to test for the reliability of the branching patterns and distances formed on the phylogenetic tree. This was
done by making "pseudoreplicates" of multiple sequence alignments of up to 100 sets. The distance matrices were recalculated using these d
duplicate alignment values to generate a bootstrap tree, which can be used to compare the branching patterns and distances with the original
phylogenetic tree.The bootstrap values (in percentage) obtained on each branch, signify branching confidence. Bootstrap values of 95% equate to
full branching confidence; 75% value equates to 95% branching confidence; 60% value equates to much lowered branching confidence; while 50%
value would render no branching confidence.
Functional Analysis
Figure 17. (A) List of all matched protein name terms for 2gfh. (B) List of all matched Gene Ontology terms for 2gfh. The score in
red is a measure of how strongly the term is predicted from the hits obtained by the different methods. The scores in blue show each
method’s contribution to the total score (with the number of relevant sequences/structures shown in brackets in grey).
(http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/)
The predicted function based on the evolution and structure, illustrate that 2gfh is a hydrolase. Profunc searches (Figure 17) on 2gfh also
show that it possesses hydrolase activity. The highest score for Gene Ontology (Figure 17) states it used for metabolism and possesses
phosphoglycolate phosphatase activity. Hydrolyase is an enzyme which catalyzes hydrolysis reaction (Figure 18), which is the addition of the
hydrogen and hydroxyl ions of water to a molecule with its consequent splitting into two or more simpler molecules. Hydrolase is the systematic
name for any enzyme of EC class 3.
Figure 18. Hydrolyase catalyze the hydrosis of the chemical bond between A and B, resulting of 2 simple molecules.
Neu5Ac phosphatase belongs to the HAD family, HAD is a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess
phosphatase, phosphoglucomutase, phosphonatase, and dehalogenase activities 6. HAD-like hydrolases represent the largest family of
predicted small molecule phosphatases encoded in the genomes of bacteria, archaea, and eukaryotes, with 6,805 proteins in data bases 7.
HADs share little overall sequence similarity (15–30% identity), but they can be identified by the presence of three short
conserved sequence motifs 7. Most of the characterized HADs have phosphatase activity (CO–P bond hydrolysis), catalyze dehalogenase
activity (C–halogen bond hydrolysis), phosphonatase (C–P bond hydrolysis), and phosphoglucomutase (CO–P bond hydrolysis and intramolecular
phosphoryl transfer) reactions 6.
In the study conducted by Maliekal et al (Figure 19), they compared the alignment of the first 280 amino acids of rat and human Neu5Ac-9-P
phostphatase with other 2 homologous sequences.
Figure 19. Alignment of rat and human Neu5Ac-9-P phosphatase with homologous sequences. The following sequences are aligned: Rattus
norvegicus (Rnor, gi-34859431), Homo sapiens (Hsap, gi-23308749), Xenopus laevis (Xlae, gi-46250196), Danio rerio (Drer, gi-
63101958), and Drosophila melanogaster (Dmel, gi-28381565). Only the first 280 residues of the latter sequence are shown. Completely
conserved residues are shown in boldface type. Asterisks indicate the extremely conserved residues in phosphatases of the HAD family 8.
The MSA done by Maliekal et al shows that the Neu5Ac-9-Pase orthologs shared the three motifs found in phosphatases of the HAD family,
namely a 1st motif comprising two extremely conserved aspartates (D), a 2nd motif comprising a conserved serine (S) or
threonine (T), and a 3rd motif comprising a conserved lysine (K) and two conserved aspartates (D) 8. The first aspartate
in the first motif forms a phosphoaspartate during the catalytic cycle 9. These findings suggested therefore that the HDHD4 protein
was a phosphatase. The first aspartate in the first motif forms a phosphoaspartate during the catalytic cycle 10. In our MSA (Figure
16), the several conserved motifs that shared great similarity to the study done by Maliekal et al. These findings suggested therefore that
Neu5Ac-9-P phosphatase protein is a phosphatase
Phosphatases of the HAD family are dependent on the presence of Mg2+ and Ca2+ inhibits
their activity by replacing Mg2+ and preventing the nucleophilic attack by the aspartate that covalently binds the
phosphate group 8. Phosphatases that form a phosphoenzyme during the catalytic cycle, are inhibited by vanadate 11.
Vanadate (VO43−), formed when V2O5 is dissolved in water at alkaline pH, appears to inhibit enzymes
that process phosphate.
The presence of a protein sharing at least about 50% sequence identity with rat or human Neu5Ac-9-P phosphatase in the genomes of mammals,
chicken, xenopus, and fishes indicates that sialic acid synthesis proceeds via the 9-phosphate intermediate in these species 8. This
is consistent with the finding that the genome of vertebrates comprises a gene encoding the bifunctional enzyme UDP- N-acetylglucosamine-2-
epimerase or N-acetylmannosamine kinase 8.
In bacteria, E. coli genome encodes five membrane-bound and 23 soluble HAD-like hydrolases, representing about 40% of the E. coli
proteins with known or predicted small molecule phosphatase activity 12. The metabo lites hydrolyzed by HADs are intermediates of
various metabolic pathways and reactions (glycolysis, pentose phosphate pathway, gluconeogenesis, and intermediary sugar and nucleotide
metabolism).
E. coli HADs hydrolyze a wide range of phosphorylated metabolites, including carbohydrates, nucleotides, organic acids, and coenzymes.
Studies have shown that the most common substrates in metabolism such as glycolysis and pentose phosphate pathway (Figure 18). These enzymes
were fructose-1-phosphate, glucose-6-phosphate, mannose-6-phosphate, 2-deoxyglucose-6-phosphate, fructose-6- phosphate, ribose-5-phosphate, and
erythrose- 4-phosphate 13.
Figure 20. The schematic diagrams of glycolysis and pentose phosphate metabolic pathways. The green arrows show the substrates that are hydrolyzed by HADs (A) Glycolysis pathway with substrates that are hydrolyze by HADs: glucose 6-phosphate, fructose 6-phosphate and dihydroxyacetone phosphate. (B) Pentose phostphate pathway with substrates that are hydrolyze by HADs: glucose-6-phosphate, fructose-6-phosphate, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, gluconate 6-phosphate and erythrose-4-phosphate.
(http://www.steve.gb.com/science/core_metabolism.html)
Methods and Materials
Query Sequence
Sequences of N-acetylneuraminic acid phosphatase from House Mouse (Mus musculus) were obtained from Genbank protein database, with Accession number of 2GFH_A.
Sequence Homology
The query sequence was matched to related (amino acid sequence similarity) proteins from Blast. This was done using a fixed database stored
within a DVD, instead of obtaining the query search from the actual BlastP database on the World Wide Web.
Multiple Sequence Alignment
Alignment was performed on all the related proteins (from the BlastP search), using ClustalX. Similarly, the ClutalX programme used for this
was obtained from the DVD, instead of the website.
Phylogenetic Tree
Phylip programme was used for the purpose of obtaining a phylogentic tree to determine the relationship of the proteins from individual
organisms. The various programmes used were again obtained from the DVD.Prodist (within Phylip) was used to calculate the distance matrix. The
calculation method selected was as using PAM-Dayhoff.Neighbor (also found within Phylip) was next used to form the phylogenetic tree, using the
distance matrix calculation obtained. The "Input order of species" option was set to "Random" when generating the tree, with a random odd
number also given.Treeview programme was used to view the final tree.
Bootstrapping
Seqboot (within Phylip) was used to replicate 100 samples of the sequence alignments.
The outfile (.aln) was then used in calculating the bootstrap distance matrices, using Prodist. The parameter setting for this calculation was
similar to the initial distance matrix calculation, using PAM-Dayhoff method. An added parameter was including multiple data sets, of 100
replicates.
This outfile (.dis) was run through Neighbor. The parameter settings were again similar to the previous generation of the earlier phylogenetic
tree. An added parameter, as was with the bootstrap distance matrix calculations, was the inclusion of multiple data sets of 100 replicates.
The treefile (.ph) was run through Consense (within Phylip) to obtain the final bootstrapped phylogenetic tree. Bootstrap branch values were
also obtained to determine the reliability of the tree branches.
Replacing organism identifiers on phylogenetic tree
An online World Wide Web programme — Kenegdo server, was used in converting organism identifiers from within the tree, to their species names.
Protein Folding
First DALI search was done to compare the 3D structure with those in the protein data bank. It revealed that Neu5Ac-9-P phosphatase is a
haloacid dehalogenase-like hydrolase. Searching the PDB was then done to source for the structures of biological macromolecules and their
relationships to sequence, function, and disease. CE which is a databases and tool for 3-D protein structure ccomparison and alignment was used
to compare the alignments between the query protein and its neigbhours.
Sequence Similarity
Interproscan was then used to analyze the newly determined sequences for annotation of predicted proteins from genome sequencing projects. In
order to further analyze the protein, Pfam which is a large collection of multiple sequence alignments and hidden Markov models is used to
analyze the protein in this case acetylneuraminic acid phosphatase to find Pfam family matches.
The aim of using the ProFunc server is to help identify the likely biochemical function of a protein from its three-dimensional structure. It
uses a series of methods, including fold matching, residue conservation, surface cleft analysis, and functional 3D templates, to identify both
the protein’s likely active site and possible homologues in the PDB.
Surface Properties
RasMol which is a molecular graphics program was used for the visualisation of proteins, nucleic acids and small molecules while PyMOL, a
molecular graphics system with an embedded Python interpreter designed for real-time visualization and rapid generation of high-quality
molecular graphics images and animations was performed to assist you in the research.
References
1. Lawrence, S. M., Huddleston, K. A., Pitts, L. R., Nguyen, N., Lee, Y. C., Vann, W. F., Coleman, T. A. & Betenbaugh, M. J. (2000). Cloning and expression of the human N-acetylneuraminic acid phosphate synthase gene with 2-keto-3-deoxy-D-glycero-D-galactonononic acid biosynthetic ability. J. Biol Chem 275, 17869–17877.
2. Schauer, R. (2000). Achievements and challenges of sialic acid research. Glycoconj. J 17, 485-499.
3. Varki, A. (1997). Sialic acids as ligands in recognition phenomena. FASEB J. 11, 248-255.
4. Angata, T. & Varki, A. (2002). Chemical diversity in the sialic acids and related alpha-keto acids:an evolutionary perspective. Chem Rev 102, 439-469.
5. Institute, E. B. (2007). http://www.ebi.ac.uk/2can/genomes/bacteria/Symbiobacterium_thermophilum.html European Bioinformatics Institute.
6. Calderone, V., Forleo, C., Benvenuti, M., Thaller, M. C., Rossolini, G. M. & Mangani, S. (2004). The First Structure of a Bacterial Class B Acid Phosphatase Reveals Further Structural Heterogeneity Among Phosphatases of the Haloacid Dehalogenase Fold. J. Mol. Biol 335, 761–773.
7. Koonin, E. V. & Tatusov, R. L. (1994). A genomic perspective on protein families. J. Mol. Biol. 244, 125-132.
8. Maliekal, P., Vertommen, D., Delpierre, G. & Schaftingen, E. V. (2006). Identification of the sequence encoding N-acetylneuraminate-9-phosphate phosphatase. Glycobiology 16, 165–172.
9. Collet, J.-F., Stroobant, V. & Van Schaftingen, E. (1999). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107–14112.
10. Collet, J.-F., Stroobant, V., Pirard, M., Delpierre, G. & Van Schaftingen, E. (1998). A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX (T/V) motif. J. Biol Chem 273, 14107-14112.
11. Macara, I. G. (1980). Vanadium, an element in search of a role. Trends Biochem Sci 5, 92-94.
12. Keseler, I. M., Collado-Vides, J., Gama-Castro, S., Ingraham, J., Paley, S., Paulsen, I. T., Peralta-Gil, M. & Karp, P. D. (2005). EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33, D334–D337.
13. Kuznetsova, E., Proudfoot, M., Gonzalez, C. F., Brown, G., Omelchenko, M. V., Borozan, I., Carmel, L., Wolf, Y. I., Mori, H. & Yakunin, A. F. (2006). Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. J. Biol Chem 281, 36149–36161.