Results - 2gqnA: Difference between revisions
Choojinhsien (talk | contribs) |
Choojinhsien (talk | contribs) |
||
(84 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
==Multiple Sequence Alignment== | |||
Majority of the blast search results have significant match (extremely low E value), except 25 out of the 500 matches have E-value of zero which means 25 of them are not significant and will be ignored. Some of the similar sequences with nearly identical annotation will be drop out to ease alignment. | |||
Due to the fact that the human sequence contains eukaryotes as well as many other organisms like plants and microorganisms so the bacteria sequence will not be necessary to be considered at this stage. I have taken 55 matches from the human sequence homolog with extremely low E value. The multiple sequence alignment and a bootstrap tree was constructed | |||
The sequence CDLCDRIIIGDREWAAHIKSKSH shown in '''Figure 1 (D)''' is deemed to be zinc finger (further discussion will be detailed below) are only found in human sequence and not in bacteria. Moreover, it is found towards the c-terminus and probably truncated in the bacteria sequence. This is the reason why the particular region is not conserved in the multiple sequence alignment. | |||
[[Image:RR1.png|framed|'''(A)'''|left]] | |||
[[Image:RR2.png|framed|'''(B)'''|left]] | |||
[[Image:RR3.png|framed|'''(C)'''|left|none]] | |||
[[Image:ZZZ.png|framed|'''(D)'''<P><B>Figure 1</B> : <BR>Multiple sequence alignment with 260 homologous sequences and 2qgnA (tRNA isopentenyltransferase 1, 69th sequence which is highlighted on the right hand column) was constructed by ClustalX. Gaps are represented as ‘-‘. Orange, red, blue and green indicate residue code of “A”, “C”, “T” and “G” respectively (Kohli and Bachhawat 2003). Conserved regions are shown in the black box. <B>(A)</B> region from 750bp to 870 bps <B>(B)</B> region from 880bp to 1000bp <B>(C)</B> region from 1160bp to 1270bp <B>(D)</B>region on the c-terminus.|none]] | |||
==Tree== | |||
[[Image:TTT.PNG|framed|'''Figure 2'''<BR>A rooted bootstrap phylogenetic tree with 100 bootstrap trials viewed in <I>FigTree</I>. The asterik indicates branches with low bootstrap value. This tree shows homologous sequences are from wide range of different organisms. <I>tRNA isopentenyl transferase 1 </I> is branched with the bacteria and surprisingly plasmodium is also from the same branch as bacteria. |none]] | |||
Although there are a couple of branches with asterisks, the phylogenetic tree reflects that our protein sequence (tRNA isopentenyl transferase 1) are found across many types of species and consistent with tradition taxonomic groupings (shown in Figure 2). However, notable exception with plasmodium which is obligate eukaryotic parasites. The close homologues are detected in different life domains (fungi, green plant, worms, unicellular organisms, bacteria and even in some higher eukaryote), indicating that the source of our genes may have been outside the Bacteria clade. The homologous sequences contains many different phylum of bacteria, they are Planctomycetes, Proteobacteria, Actinobacteria, Chloroflexi, Proteobacteria, cyanobacteria, Aquificae Bacteria and Firmicutes Bacteria. The higher eukaryote organisms include human, mouse, cow, fly, Platypus , frog, fish, honeybee and bird. | |||
[[Image:plasm.png|framed|'''Figure 3'''<BR>This is a magnified version of <B>Figure 2</B> on <I>Plasmodium</I> with taxa name displayed. The values on the nodes indicate bootstrap values drawn from a phylogenetic tree with 100 bootstrap value. Node with no value presented has bootstrap value greater than 70.|none]] | |||
In Figure 3 Plasmodium berghei and Plasmodium yoelii are branched within the bacteria species, one possible reason may be lateral gene transfer has occurred for plasmodium so there is a mix up for it being consider as bacteria instead of in the eukaryote branch. This is a remarkable outcome in this research, advance genome analysis will be required for to determine the possible function for this protein. | |||
'''Treeview and multiview''' | |||
[[Image:treeview and multiview.jpg]] | |||
=='''Structure of tRNA isopentenyltransferase'''== | =='''Structure of tRNA isopentenyltransferase'''== | ||
'''Protein Sequence in FASTA format''' | '''Protein Sequence in FASTA format''' | ||
>gi|152149497|pdb|2QGN|A Chain A, Crystal Structure Of Trna Isopentenylpyrophosphate Transferase (Bh2366) From Bacillus Halodurans, Northeast Structural Genomics Consortium Target Bhr41. | >gi|152149497|pdb|2QGN|A Chain A, Crystal Structure Of Trna Isopentenylpyrophosphate Transferase (Bh2366) From Bacillus Halodurans, Northeast Structural Genomics Consortium Target Bhr41. | ||
XKEKLVAIVGPTAVGKTKTSVXLAKRLNGEVISGDSXQVYRGXDIGTAKITAEEXDGVPHHLIDIKDPSE | XKEKLVAIVGPTAVGKTKTSVXLAKRLNGEVISGDSXQVYRGXDIGTAKITAEEXDGVPHHLIDIKDPSE | ||
Line 9: | Line 44: | ||
==Protein Structure== | ==Protein Structure== | ||
[[Image: | [[Image:2qgnA3.png|framed|'''Figure 4'''<BR>Structure of tRNA isopentenyl transferase 1 showing helix, sheet and loop. Image constructed from PyMOL.|none]]<BR> | ||
== | ==Secondary Structure== | ||
Analysis of the secondary structure acquired from Protein Data Bank showed results as displayed below : | Analysis of the secondary structure acquired from Protein Data Bank showed results as displayed below : | ||
[[Image:Secondary.jpg|framed|none]] | [[Image:Secondary.jpg|framed|none]] | ||
==Surface | ==Surface Structure of 2qgn== | ||
[[Image:surface | [[Image:surface view.jpg|framed|'''Figure 5''' Surface view of protein with ligand found in the cavitiy of the protein.|none]] | ||
==Electrostatic Surface Potential== | |||
[[Image:esp.jpg|framed|'''Figure 5.1''' Electrostatic surface potential of 2qgn. Positve regions indicated by blue, negatively charged in red.|none]] | |||
==Surface Topography== | |||
[[Image:pocket.jpg|framed|'''Figure 6''' Putative pocket in tRNA isopentenyl transferase. A total of 19 pockets were found in 2qgnA, displayed in green is the pocket with the largest area and volume.|none]] | |||
==Domains== | |||
2qgnA is composed of two main domains. CATH analysis of 2qgn resulted in the finding of two main domains composing 2qgnA. | |||
Domain 1 ranges from residue 2-200 and residue 283-314. Domain 2 encompasses residues stretching from 201-282. | |||
[[Image:domain pic3.jpg|framed|'''Figure 7''' <BR>Two main domains exhibited by tRNA isopentenyl transferase(2qgnA). Blue regions denote first domain while Red regions underlies second domain.|none]]<BR> | |||
[[Image:domain--1.jpg|framed|left|'''Figure 8''' Ribbon structure of domain 2 signified by red regions in Figure 4.]][[Image:domain--2.jpg|framed|left|'''Figure 9''' Ribbon structure of domain 1 denoted by blue regions in Figure 4.]]<BR> | |||
<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR> | |||
==Ligand Binding Sites and Surface Clefts== | ==Ligand Binding Sites and Surface Clefts== | ||
Line 24: | Line 75: | ||
[[Image:surface cleft.png|framed|none]] | [[Image:surface cleft.png|framed|none]] | ||
==Protein-ligand interaction == | |||
'''Hydrophillic binding sites''' | |||
[[Image:hydrophillic.jpg|framed|none]] | |||
'''Bridged-H-bond binding sites''' | |||
[[Image:H-bond.jpg|framed|none]] | |||
'''Hydrophobic binding sites''' | |||
[[Image:hydrophobic.jpg|framed|none]] | |||
==Conserved residues for tRNA isopentenyl transferase from Clustal alignment== | |||
Multiple sequence alignment from ClustalX allowed conserved regions in 2qgn and related species to be found. | |||
[[Image:conserved regions.jpg|framed|'''Figure 5''' Conserved regions among various species were shown in red, with their respective residues labelled. Yellow sphere shows the location of the ligand. Image was constructed from PyMOL. |none]]<BR> | |||
== Structural Alignment== | |||
'''Dali Output''' | |||
PDB entry code for 2qgn was loaded onto DALI server to search for structurally similar neighbours. Displayed below are the results from DALI search :- | |||
'''Localisation Expression of tRNA isopentenyltransferase''' | [[Image:Dali output3.jpg|framed|none]] | ||
DALI output describes the following : | |||
''Z score'' , the statistical significance of the similarity between protein-of-interest and other neighbourhood protines. The program optimises a weighted sum of similarities of intramolecular distances. | |||
''Root Mean Square Distance (RMSD)'', root-mean-square deviation of C-alpha atoms in the least-squares superimposition of the structurally equivalent C-alpha atoms. As in indicated in DALI, rmsd is not optimised and is only reported for information. | |||
''lali'', the number of structurally equivalent residues. | |||
''nres'', or the total number of amino acids in the hit protein. | |||
''%id'' - percentage of identical amino acids over structurally equivalent residues. | |||
A total of 527 hits were found from DALI search, nonetheless only the first 20 hits that may be of significance were shown on the figure. | |||
'''Profunc''' | |||
''Related protein sequences'' | |||
[[Image:profunc2.jpg|framed|none]] | |||
''Proteins with similar fold retrived from SSM (Secondary Structure Matching)'' | |||
[[Image:ssm2.jpg|framed|none]] | |||
From Profunc, similarities of related proteins and proteins with similar fold to query protein were compared with results from DALI. 2qgnA is the query protein highlighted in black in all tables. 2crm, 2crr and 2crq were both found in DALI and Profunc(highlighted in red). On the other hand, 2ze5,2ze6,2ze7 and 2ze8, as well as 3adk and 2qor were also found in DALI output(highlighted in blue). | |||
Based on the outcome of DALI and Profunc, PDB files of each structurally similar protein was obtained from PDB. These were each superimposed against 2qgn using the PyMOL software, to compare the structural similiarity. Results are as below : | |||
[[Image:2qgn421.jpg|framed|left|'''Figure 10''' 3crq superimposed against 2qgn via PyMOL. 2qgn indicated in green.]][[Image:2qgn2.png|framed|left|'''Figure 11''' 3crm superimposed against 2qgn via PyMOL. 2qgn indicated in green.]]<BR> <BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR> | |||
[[Image:2qgn.png|framed|left|'''Figure 12''' 2ze7 superimposed against 2qgn via PyMOL. 2qgn indicated in green.]][[Image:2qor.jpg|framed|left|'''Figure 13''' 2qor superimposed against 2qgn via PyMOL. 2qgn indicated in green]]<BR> | |||
<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR> | |||
As indicated by the figures above, each structures were structurally similar to 2qgn, suggesting that they could have functionally similar properties. Nonetheless,notice that 2-qor is only partially similar to 2qgn structure. | |||
As the Z-score decreases for the DALI output, the structural similarity decreases as well. For this reason, functional analysis of 2qgn was only done for DALI outputs with lali scores higher than 200. | |||
=='''Localisation Expression of tRNA isopentenyltransferase'''== | |||
Generally, this enzyme is expressed in all tissue types since it is important that functional protein are synthesized in each of these tissues. Specifically, it is highly expressed in adipose tissues as well as oocytes. Relatively high amounts of this enzyme is expressed in prostate, adrenal gland, B-cells and trachea. The reason why tRNA-IPT are at higher concentrations in these tissues may reflect higher levels of protein synthesis. | Generally, this enzyme is expressed in all tissue types since it is important that functional protein are synthesized in each of these tissues. Specifically, it is highly expressed in adipose tissues as well as oocytes. Relatively high amounts of this enzyme is expressed in prostate, adrenal gland, B-cells and trachea. The reason why tRNA-IPT are at higher concentrations in these tissues may reflect higher levels of protein synthesis. | ||
Line 33: | Line 145: | ||
'''Molecular Function''' | |||
[[Image:PROKNOW2- molecular function.png]] | |||
'''Biological Process''' | |||
- | [[Image:PROKNOW-biological process.png]] | ||
Both the molecular function and biological proceses are obtained from ProKnow. | |||
- | =='''Annotations for tRNA isopentenyltransferase'''== | ||
The annotations below shows the cellular, biological processes and functional function of the protein in plants. tRNA-IPT was first found in plants and it is a very important hormaone enzyme that affects plant growth and development. | |||
[[Image:annotation1.JPG]] |
Latest revision as of 09:53, 10 June 2008
Multiple Sequence Alignment
Majority of the blast search results have significant match (extremely low E value), except 25 out of the 500 matches have E-value of zero which means 25 of them are not significant and will be ignored. Some of the similar sequences with nearly identical annotation will be drop out to ease alignment.
Due to the fact that the human sequence contains eukaryotes as well as many other organisms like plants and microorganisms so the bacteria sequence will not be necessary to be considered at this stage. I have taken 55 matches from the human sequence homolog with extremely low E value. The multiple sequence alignment and a bootstrap tree was constructed
The sequence CDLCDRIIIGDREWAAHIKSKSH shown in Figure 1 (D) is deemed to be zinc finger (further discussion will be detailed below) are only found in human sequence and not in bacteria. Moreover, it is found towards the c-terminus and probably truncated in the bacteria sequence. This is the reason why the particular region is not conserved in the multiple sequence alignment.
Tree
Although there are a couple of branches with asterisks, the phylogenetic tree reflects that our protein sequence (tRNA isopentenyl transferase 1) are found across many types of species and consistent with tradition taxonomic groupings (shown in Figure 2). However, notable exception with plasmodium which is obligate eukaryotic parasites. The close homologues are detected in different life domains (fungi, green plant, worms, unicellular organisms, bacteria and even in some higher eukaryote), indicating that the source of our genes may have been outside the Bacteria clade. The homologous sequences contains many different phylum of bacteria, they are Planctomycetes, Proteobacteria, Actinobacteria, Chloroflexi, Proteobacteria, cyanobacteria, Aquificae Bacteria and Firmicutes Bacteria. The higher eukaryote organisms include human, mouse, cow, fly, Platypus , frog, fish, honeybee and bird.
In Figure 3 Plasmodium berghei and Plasmodium yoelii are branched within the bacteria species, one possible reason may be lateral gene transfer has occurred for plasmodium so there is a mix up for it being consider as bacteria instead of in the eukaryote branch. This is a remarkable outcome in this research, advance genome analysis will be required for to determine the possible function for this protein.
Treeview and multiview
Structure of tRNA isopentenyltransferase
Protein Sequence in FASTA format
>gi|152149497|pdb|2QGN|A Chain A, Crystal Structure Of Trna Isopentenylpyrophosphate Transferase (Bh2366) From Bacillus Halodurans, Northeast Structural Genomics Consortium Target Bhr41. XKEKLVAIVGPTAVGKTKTSVXLAKRLNGEVISGDSXQVYRGXDIGTAKITAEEXDGVPHHLIDIKDPSE SFSVADFQDLATPLITEIHERGRLPFLVGGTGLYVNAVIHQFNLGDIRADEDYRHELEAFVNSYGVQALH DKLSKIDPKAAAAIHPNNYRRVIRALEIIKLTGKTVTEQARHEEETPSPYNLVXIGLTXERDVLYDRINR RVDQXVEEGLIDEAKKLYDRGIRDCQSVQAIGYKEXYDYLDGNVTLEEAIDTLKRNSRRYAKRQLTWFRN KANVTWFDXTDVDFDKKIXEIHNFIAGKLEEKSKLEHHHHHH
Protein Structure
Secondary Structure
Analysis of the secondary structure acquired from Protein Data Bank showed results as displayed below :
Surface Structure of 2qgn
Electrostatic Surface Potential
Surface Topography
Domains
2qgnA is composed of two main domains. CATH analysis of 2qgn resulted in the finding of two main domains composing 2qgnA.
Domain 1 ranges from residue 2-200 and residue 283-314. Domain 2 encompasses residues stretching from 201-282.
Ligand Binding Sites and Surface Clefts
Protein-ligand interaction
Hydrophillic binding sites
Bridged-H-bond binding sites
Hydrophobic binding sites
Conserved residues for tRNA isopentenyl transferase from Clustal alignment
Multiple sequence alignment from ClustalX allowed conserved regions in 2qgn and related species to be found.
Structural Alignment
Dali Output
PDB entry code for 2qgn was loaded onto DALI server to search for structurally similar neighbours. Displayed below are the results from DALI search :-
DALI output describes the following :
Z score , the statistical significance of the similarity between protein-of-interest and other neighbourhood protines. The program optimises a weighted sum of similarities of intramolecular distances.
Root Mean Square Distance (RMSD), root-mean-square deviation of C-alpha atoms in the least-squares superimposition of the structurally equivalent C-alpha atoms. As in indicated in DALI, rmsd is not optimised and is only reported for information.
lali, the number of structurally equivalent residues.
nres, or the total number of amino acids in the hit protein.
%id - percentage of identical amino acids over structurally equivalent residues.
A total of 527 hits were found from DALI search, nonetheless only the first 20 hits that may be of significance were shown on the figure.
Profunc
Related protein sequences
Proteins with similar fold retrived from SSM (Secondary Structure Matching)
From Profunc, similarities of related proteins and proteins with similar fold to query protein were compared with results from DALI. 2qgnA is the query protein highlighted in black in all tables. 2crm, 2crr and 2crq were both found in DALI and Profunc(highlighted in red). On the other hand, 2ze5,2ze6,2ze7 and 2ze8, as well as 3adk and 2qor were also found in DALI output(highlighted in blue).
Based on the outcome of DALI and Profunc, PDB files of each structurally similar protein was obtained from PDB. These were each superimposed against 2qgn using the PyMOL software, to compare the structural similiarity. Results are as below :
As indicated by the figures above, each structures were structurally similar to 2qgn, suggesting that they could have functionally similar properties. Nonetheless,notice that 2-qor is only partially similar to 2qgn structure.
As the Z-score decreases for the DALI output, the structural similarity decreases as well. For this reason, functional analysis of 2qgn was only done for DALI outputs with lali scores higher than 200.
Localisation Expression of tRNA isopentenyltransferase
Generally, this enzyme is expressed in all tissue types since it is important that functional protein are synthesized in each of these tissues. Specifically, it is highly expressed in adipose tissues as well as oocytes. Relatively high amounts of this enzyme is expressed in prostate, adrenal gland, B-cells and trachea. The reason why tRNA-IPT are at higher concentrations in these tissues may reflect higher levels of protein synthesis.
Molecular Function
Biological Process
Both the molecular function and biological proceses are obtained from ProKnow.
Annotations for tRNA isopentenyltransferase
The annotations below shows the cellular, biological processes and functional function of the protein in plants. tRNA-IPT was first found in plants and it is a very important hormaone enzyme that affects plant growth and development.