Arylformamidase Sequence & Homology
Our query sequence "Arylformamidase" is a putatuve thioesterase isloated from a Silicibacter sp. These organisms are best known for their ability to degrade sulfur compounds in the marine environment. The query sequence sequence Target 13, pdb:2pbl is 262 residues in length.
Method
Using the query sequence Arylformamidase,a BLASTP search was performed on the bacterial protein sequence using a non-redundant database. The top scoring matches to an E-value of 3e-054, 35 sequences in total, were selected. Eukaryotic homologous sequences sequences were found using NCBI HomoloGene. These were appended to the list and a multple sequece alignment was performed using CLUSTAL X.
The data output from the multiple sequence alignment was bootstrapped 1000 times and a phylogenetic tree was created using the neighbour-joining algorythm. The program FigTree was used to create the visual representation of this tree(Figure 1).
A similar BLASTP search was performed using the human homologue to our query sequence. 126 of the top scoring matches were selected for a multiple sequence alignment. This was the minimum number of sequences which would also include the query sequence. The sequences were aligned, bootstrapped and a tree created as above. The tree revealed some questionable matches, joining humans with pufferfish for instance, which, whilst evolutionarily interesting poses more questions than answers.
Top scoring sequences from the results of the BLASTP search using the human homologue were appended to the original top scoring sequences of the results BLASTP search on the bacterial query sequence.
As above, using CLUSTAL X, a multiple sequence alignment was generated, the data was then bootstrapped 1000 times and a phylogenetic tree generated using the neighbour-joining algorythm (Figure 2).
Results
Figure 1 shows that the query sequence "Arylformamidase" grouped with bacterial sequences, shown cloured in Blue. The bootstrap values reveal low confidence with many of the nodes occurring lower down on the phylogenetic tree revealing a possible explanation for certain closely related species to be grouped into separate clades. However, despite low bootstrap scores, the grouping does reliably separate prokaryotes from eukaryotes and the eukaryotes themsselves are clearly distinguished between yeasts and moulds (shown in Green), plants (Dark Green), invertebrates (Orange) and vertebrates (shown in Red).
Figure 1.
Unrooted phylogenetic tree of highest scoring results from a BLASTP search of bacterial sequnces using a non-redundant database and homologous eukaryotic sequences sourced from NCBI HomoloGene. Branch lengths are related to phylogenetic distance and node numbers refer to Bootstrap values. On this tree "Arylformamidase" refers to the Silicibacter species from which our sequence originated. The colour coding distinguishes prokaryotic organisms shown in Blue, from eukaryote yeasts and moulds (shown in Green), plants (Dark Green), invertebrates (Orange) and vertebrates (shown in Red).
To further elucidate the phylogeny of 2pbl, its human homologue, Arylformamidase, was queried in a BLAST search. The top scoring matches of bacterial homologues, present in Figure 1, were appended with top scoring matches of eukaryotic homologues. The human homologue, Arylformamidase, has a 26.28% sequence similarity. Despite this low score, multiple sequence alignment revealed that key regions were highly conserved between bacterial and eukayotic homologues.
Figure 2 is largely consistent with traditional taxonomic groupings of organisms. Specifically, it reveals greater statistical confidence in the separation of prokaryotes (Blue and Green) and eukaryotes (invertebrates are shown in Orange; vertebrates are in Red).
Figure 2.
Unrooted phylogenetic tree of highest scoring results from a BLASTP search of bacterial sequences and highest scoring results of a BLASTP search on a homologous human sequence. Branch lengths are related to phylogenetic distance and node numbers refer to Bootstrap values. On this tree "Arylformamidase" refers to the Silicibacter species from which our sequence originated. The colour coding distinguishes prokaryotes (Blue and Green) and eukaryotes (invertebrates are shown in Orange; vertebrates are in Red).
In general, members of the same genus have been grouped together on these phylogenetic trees with some notable exceptions. For instance, Silicibacter, the species from which we derived our protein, occurs on disparate branches of the tree.
Discussion
The multiple sequence alignment revealed several conserved regions accross all species, thereby indicating a high level of conservation from Bacteria through Eukaryota. Most significantly, the catalytic triad of 137S, 215E/D and 242H and many associated residues which occur in the same structural area of the protein are conserved accross all species of prokaryotes and eukaryotes. This may therefore be indicative of the conservation of functional group of residues within the protein. These included vertebrates, invertebrates, yeasts, moulds and single-celled eukaryotes. The catalytic triad is thought to be involved in thioesterase/carboxylesterase activity though the function of the protein may show variation between species.
Given that the phylogeny of our protein is largely consistent with traditional taxonomic groupings of organisms and that we can find no evidence of horizontal gene transfer, the delineations between prokariotic and eukaryotic species alow us to infer that the dominant mode of inheritance is clonal from bacteria to plantae and animalia.
References
NCBI HomoloGene: Arylformamidase
MicrobeWiki: Silicibacter pomeroyi
Return to Arylformamidase