Results - 2gqnA: Difference between revisions

Latest revision as of 09:53, 10 June 2008

Multiple Sequence Alignment

Majority of the blast search results have significant match (extremely low E value), except 25 out of the 500 matches have E-value of zero which means 25 of them are not significant and will be ignored. Some of the similar sequences with nearly identical annotation will be drop out to ease alignment.

Due to the fact that the human sequence contains eukaryotes as well as many other organisms like plants and microorganisms so the bacteria sequence will not be necessary to be considered at this stage. I have taken 55 matches from the human sequence homolog with extremely low E value. The multiple sequence alignment and a bootstrap tree was constructed

The sequence CDLCDRIIIGDREWAAHIKSKSH shown in Figure 1 (D) is deemed to be zinc finger (further discussion will be detailed below) are only found in human sequence and not in bacteria. Moreover, it is found towards the c-terminus and probably truncated in the bacteria sequence. This is the reason why the particular region is not conserved in the multiple sequence alignment.

(A)

(B)

(C)

(D)

Figure 1 :
Multiple sequence alignment with 260 homologous sequences and 2qgnA (tRNA isopentenyltransferase 1, 69th sequence which is highlighted on the right hand column) was constructed by ClustalX. Gaps are represented as ‘-‘. Orange, red, blue and green indicate residue code of “A”, “C”, “T” and “G” respectively (Kohli and Bachhawat 2003). Conserved regions are shown in the black box. (A) region from 750bp to 870 bps (B) region from 880bp to 1000bp (C) region from 1160bp to 1270bp (D)region on the c-terminus.

Tree

Figure 2
A rooted bootstrap phylogenetic tree with 100 bootstrap trials viewed in FigTree. The asterik indicates branches with low bootstrap value. This tree shows homologous sequences are from wide range of different organisms. tRNA isopentenyl transferase 1 is branched with the bacteria and surprisingly plasmodium is also from the same branch as bacteria.

Although there are a couple of branches with asterisks, the phylogenetic tree reflects that our protein sequence (tRNA isopentenyl transferase 1) are found across many types of species and consistent with tradition taxonomic groupings (shown in Figure 2). However, notable exception with plasmodium which is obligate eukaryotic parasites. The close homologues are detected in different life domains (fungi, green plant, worms, unicellular organisms, bacteria and even in some higher eukaryote), indicating that the source of our genes may have been outside the Bacteria clade. The homologous sequences contains many different phylum of bacteria, they are Planctomycetes, Proteobacteria, Actinobacteria, Chloroflexi, Proteobacteria, cyanobacteria, Aquificae Bacteria and Firmicutes Bacteria. The higher eukaryote organisms include human, mouse, cow, fly, Platypus , frog, fish, honeybee and bird.

Figure 3
This is a magnified version of Figure 2 on Plasmodium with taxa name displayed. The values on the nodes indicate bootstrap values drawn from a phylogenetic tree with 100 bootstrap value. Node with no value presented has bootstrap value greater than 70.

In Figure 3 Plasmodium berghei and Plasmodium yoelii are branched within the bacteria species, one possible reason may be lateral gene transfer has occurred for plasmodium so there is a mix up for it being consider as bacteria instead of in the eukaryote branch. This is a remarkable outcome in this research, advance genome analysis will be required for to determine the possible function for this protein.

Treeview and multiview

Structure of tRNA isopentenyltransferase

Protein Sequence in FASTA format

>gi|152149497|pdb|2QGN|A Chain A, Crystal Structure Of Trna Isopentenylpyrophosphate Transferase (Bh2366) From Bacillus Halodurans, Northeast Structural Genomics Consortium Target Bhr41. XKEKLVAIVGPTAVGKTKTSVXLAKRLNGEVISGDSXQVYRGXDIGTAKITAEEXDGVPHHLIDIKDPSE SFSVADFQDLATPLITEIHERGRLPFLVGGTGLYVNAVIHQFNLGDIRADEDYRHELEAFVNSYGVQALH DKLSKIDPKAAAAIHPNNYRRVIRALEIIKLTGKTVTEQARHEEETPSPYNLVXIGLTXERDVLYDRINR RVDQXVEEGLIDEAKKLYDRGIRDCQSVQAIGYKEXYDYLDGNVTLEEAIDTLKRNSRRYAKRQLTWFRN KANVTWFDXTDVDFDKKIXEIHNFIAGKLEEKSKLEHHHHHH

Protein Structure

Figure 4
Structure of tRNA isopentenyl transferase 1 showing helix, sheet and loop. Image constructed from PyMOL.

Secondary Structure

Analysis of the secondary structure acquired from Protein Data Bank showed results as displayed below :

Surface Structure of 2qgn

Figure 5 Surface view of protein with ligand found in the cavitiy of the protein.

Electrostatic Surface Potential

Figure 5.1 Electrostatic surface potential of 2qgn. Positve regions indicated by blue, negatively charged in red.

Surface Topography

Figure 6 Putative pocket in tRNA isopentenyl transferase. A total of 19 pockets were found in 2qgnA, displayed in green is the pocket with the largest area and volume.

Domains

2qgnA is composed of two main domains. CATH analysis of 2qgn resulted in the finding of two main domains composing 2qgnA.

Domain 1 ranges from residue 2-200 and residue 283-314. Domain 2 encompasses residues stretching from 201-282.

Figure 7
Two main domains exhibited by tRNA isopentenyl transferase(2qgnA). Blue regions denote first domain while Red regions underlies second domain.

Figure 8 Ribbon structure of domain 2 signified by red regions in Figure 4.

Figure 9 Ribbon structure of domain 1 denoted by blue regions in Figure 4.

Ligand Binding Sites and Surface Clefts

Protein-ligand interaction

Hydrophillic binding sites

File:Hydrophillic.jpg

Bridged-H-bond binding sites

File:H-bond.jpg

Hydrophobic binding sites

File:Hydrophobic.jpg

Conserved residues for tRNA isopentenyl transferase from Clustal alignment

Multiple sequence alignment from ClustalX allowed conserved regions in 2qgn and related species to be found.

Figure 5 Conserved regions among various species were shown in red, with their respective residues labelled. Yellow sphere shows the location of the ligand. Image was constructed from PyMOL.

Structural Alignment

Dali Output

PDB entry code for 2qgn was loaded onto DALI server to search for structurally similar neighbours. Displayed below are the results from DALI search :-

DALI output describes the following :

Z score , the statistical significance of the similarity between protein-of-interest and other neighbourhood protines. The program optimises a weighted sum of similarities of intramolecular distances.

Root Mean Square Distance (RMSD), root-mean-square deviation of C-alpha atoms in the least-squares superimposition of the structurally equivalent C-alpha atoms. As in indicated in DALI, rmsd is not optimised and is only reported for information.

lali, the number of structurally equivalent residues.

nres, or the total number of amino acids in the hit protein.

%id - percentage of identical amino acids over structurally equivalent residues.

A total of 527 hits were found from DALI search, nonetheless only the first 20 hits that may be of significance were shown on the figure.

Profunc

Related protein sequences

Proteins with similar fold retrived from SSM (Secondary Structure Matching)

From Profunc, similarities of related proteins and proteins with similar fold to query protein were compared with results from DALI. 2qgnA is the query protein highlighted in black in all tables. 2crm, 2crr and 2crq were both found in DALI and Profunc(highlighted in red). On the other hand, 2ze5,2ze6,2ze7 and 2ze8, as well as 3adk and 2qor were also found in DALI output(highlighted in blue).

Based on the outcome of DALI and Profunc, PDB files of each structurally similar protein was obtained from PDB. These were each superimposed against 2qgn using the PyMOL software, to compare the structural similiarity. Results are as below :

Figure 10 3crq superimposed against 2qgn via PyMOL. 2qgn indicated in green.

Figure 11 3crm superimposed against 2qgn via PyMOL. 2qgn indicated in green.

Figure 12 2ze7 superimposed against 2qgn via PyMOL. 2qgn indicated in green.

Figure 13 2qor superimposed against 2qgn via PyMOL. 2qgn indicated in green

As indicated by the figures above, each structures were structurally similar to 2qgn, suggesting that they could have functionally similar properties. Nonetheless,notice that 2-qor is only partially similar to 2qgn structure.

As the Z-score decreases for the DALI output, the structural similarity decreases as well. For this reason, functional analysis of 2qgn was only done for DALI outputs with lali scores higher than 200.

Localisation Expression of tRNA isopentenyltransferase

Generally, this enzyme is expressed in all tissue types since it is important that functional protein are synthesized in each of these tissues. Specifically, it is highly expressed in adipose tissues as well as oocytes. Relatively high amounts of this enzyme is expressed in prostate, adrenal gland, B-cells and trachea. The reason why tRNA-IPT are at higher concentrations in these tissues may reflect higher levels of protein synthesis.

Molecular Function

Biological Process

Both the molecular function and biological proceses are obtained from ProKnow.

Annotations for tRNA isopentenyltransferase

The annotations below shows the cellular, biological processes and functional function of the protein in plants. tRNA-IPT was first found in plants and it is a very important hormaone enzyme that affects plant growth and development.

@@ Line 1: / Line 1: @@
+==Multiple Sequence Alignment==
+Majority of the blast search results have significant match (extremely low E value), except 25 out of the 500 matches have E-value of zero which means 25 of them are not significant and will be ignored. Some of the similar sequences with nearly identical annotation will be drop out to ease alignment.
+Due to the fact that the human sequence contains eukaryotes as well as many other organisms like plants and microorganisms so the bacteria sequence will not be necessary to be considered at this stage. I have taken 55 matches from the human sequence homolog with extremely low E value. The multiple sequence alignment and a bootstrap tree was constructed
+The sequence CDLCDRIIIGDREWAAHIKSKSH shown in '''Figure 1 (D)''' is deemed to be zinc finger (further discussion will be detailed below) are only found in human sequence and not in bacteria. Moreover, it is found towards the c-terminus and probably truncated in the bacteria sequence. This is the reason why the particular region is not conserved in the multiple sequence alignment.
+[[Image:RR1.png|framed|'''(A)'''|left]]
+[[Image:RR2.png|framed|'''(B)'''|left]]
+[[Image:RR3.png|framed|'''(C)'''|left|none]]
+[[Image:ZZZ.png|framed|'''(D)'''<P><B>Figure 1</B> : <BR>Multiple sequence alignment with 260 homologous sequences and 2qgnA (tRNA isopentenyltransferase 1, 69th sequence which is highlighted on the right hand column) was constructed by ClustalX. Gaps are represented as ‘-‘.  Orange, red, blue and green indicate residue code of “A”, “C”, “T” and “G” respectively (Kohli and Bachhawat 2003). Conserved regions are shown in the black box. <B>(A)</B> region from 750bp to 870 bps <B>(B)</B> region from 880bp to 1000bp <B>(C)</B> region from 1160bp to 1270bp <B>(D)</B>region on the c-terminus.|none]]
+==Tree==
+[[Image:TTT.PNG|framed|'''Figure 2'''<BR>A rooted bootstrap phylogenetic tree with 100 bootstrap trials viewed in <I>FigTree</I>. The asterik indicates branches with low bootstrap value. This tree shows homologous sequences are from wide range of different organisms. <I>tRNA isopentenyl transferase 1 </I> is branched with the bacteria and surprisingly plasmodium is also from the same branch as bacteria. |none]]
+Although there are a couple of branches with asterisks, the phylogenetic tree reflects that our protein sequence (tRNA isopentenyl transferase 1) are found across many types of species and consistent with tradition taxonomic groupings (shown in Figure 2). However, notable exception with plasmodium which is obligate eukaryotic parasites. The close homologues are detected in different life domains (fungi, green plant, worms, unicellular organisms, bacteria and even in some higher eukaryote), indicating that the source of our genes may have been outside the Bacteria clade. The homologous sequences contains many different phylum of bacteria, they are Planctomycetes, Proteobacteria, Actinobacteria, Chloroflexi, Proteobacteria, cyanobacteria, Aquificae Bacteria  and Firmicutes Bacteria. The higher eukaryote organisms include human, mouse, cow, fly, Platypus , frog, fish, honeybee and bird.
+[[Image:plasm.png|framed|'''Figure 3'''<BR>This is a magnified version of <B>Figure 2</B> on <I>Plasmodium</I> with taxa name displayed. The values on the nodes indicate bootstrap values drawn from a phylogenetic tree with 100 bootstrap value. Node with no value presented has bootstrap value greater than 70.|none]]
+In Figure 3 Plasmodium berghei and Plasmodium yoelii are branched within the bacteria species, one possible reason may be lateral gene transfer has occurred for plasmodium so there is a mix up for it being consider as bacteria instead of in the eukaryote branch. This is a remarkable outcome in this research, advance genome analysis will be required for to determine the possible function for this protein.
+'''Treeview and multiview'''
+[[Image:treeview and multiview.jpg]]
 =='''Structure of tRNA isopentenyltransferase'''==
 '''Protein Sequence in FASTA format'''
 >gi|152149497|pdb|2QGN|A Chain A, Crystal Structure Of Trna Isopentenylpyrophosphate Transferase (Bh2366) From Bacillus Halodurans, Northeast Structural Genomics Consortium Target Bhr41.
 XKEKLVAIVGPTAVGKTKTSVXLAKRLNGEVISGDSXQVYRGXDIGTAKITAEEXDGVPHHLIDIKDPSE
@@ Line 9: / Line 44: @@
 ==Protein Structure==
-[[Image:2qgnA.png|framed|'''Figure 1'''<BR>Structure of TRNA isopentenyl transferase 1 showing helix, sheet and loop. Image constructed from PyMOL.|none]]<BR>
+[[Image:2qgnA3.png|framed|'''Figure 4'''<BR>Structure of tRNA isopentenyl transferase 1 showing helix, sheet and loop. Image constructed from PyMOL.|none]]<BR>
-==Structural Analysis==
+==Secondary Structure==
 Analysis of the secondary structure acquired from Protein Data Bank showed results as displayed below :
 [[Image:Secondary.jpg|framed|none]]
-==Surface Properties of 2qgn==
+==Surface Structure of 2qgn==
-[[Image:surface prop.png|framed|'''Figure 2'''<BR>Surface property displayed by 2qgn. Red colour indicates negatively charged regions, while blue colour indicates positively charged regions.|none]]<BR>
+[[Image:surface view.jpg|framed|'''Figure 5''' Surface view of protein with ligand found in the cavitiy of the protein.|none]]
+==Electrostatic Surface Potential==
+[[Image:esp.jpg|framed|'''Figure 5.1''' Electrostatic surface potential of 2qgn. Positve regions indicated by blue, negatively charged in red.|none]]
+==Surface Topography==
+[[Image:pocket.jpg|framed|'''Figure 6''' Putative pocket in tRNA isopentenyl transferase. A total of 19 pockets were found in 2qgnA, displayed in green is the pocket with the largest area and volume.|none]]
+==Domains==
+qgnA is composed of two main domains. CATH analysis of 2qgn resulted in the finding of two main domains composing 2qgnA.
+Domain 1 ranges from residue 2-200 and residue 283-314. Domain 2 encompasses residues stretching from 201-282.
+[[Image:domain pic3.jpg|framed|'''Figure 7''' <BR>Two main domains exhibited by tRNA isopentenyl transferase(2qgnA). Blue regions denote first domain while Red regions underlies second domain.|none]]<BR>
+[[Image:domain--1.jpg|framed|left|'''Figure 8''' Ribbon structure of domain 2 signified by red regions in Figure 4.]][[Image:domain--2.jpg|framed|left|'''Figure 9''' Ribbon structure of domain 1 denoted by blue regions in Figure 4.]]<BR>
+<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>
 ==Ligand Binding Sites and Surface Clefts==
@@ Line 24: / Line 75: @@
 [[Image:surface cleft.png|framed|none]]
+==Protein-ligand interaction ==
+'''Hydrophillic binding sites'''
+[[Image:hydrophillic.jpg|framed|none]]
+'''Bridged-H-bond binding sites'''
+[[Image:H-bond.jpg|framed|none]]
+'''Hydrophobic binding sites'''
+[[Image:hydrophobic.jpg|framed|none]]
+==Conserved residues for tRNA isopentenyl transferase from Clustal alignment==
+Multiple sequence alignment from ClustalX allowed conserved regions in 2qgn and related species to be found.
+[[Image:conserved regions.jpg|framed|'''Figure 5''' Conserved regions among various species were shown in red, with their respective residues labelled. Yellow sphere shows the location of the ligand. Image was constructed from PyMOL. |none]]<BR>
+== Structural Alignment==
+'''Dali Output'''
+PDB entry code for 2qgn was loaded onto DALI server to search for structurally similar neighbours. Displayed below are the results from DALI search :-
-'''Localisation Expression of tRNA isopentenyltransferase'''
+[[Image:Dali output3.jpg|framed|none]]
+DALI output describes the following :
+''Z score'' , the statistical significance of the similarity between protein-of-interest and other neighbourhood protines. The program optimises a weighted sum of similarities of intramolecular distances.
+''Root Mean Square Distance (RMSD)'', root-mean-square deviation of C-alpha atoms in the least-squares superimposition of the structurally equivalent C-alpha atoms. As in indicated in DALI, rmsd is not optimised and is only reported for information.
+''lali'', the number of structurally equivalent residues.
+''nres'', or the total number of amino acids in the hit protein.
+''%id'' - percentage of identical amino acids over structurally equivalent residues.
+A total of 527 hits were found from DALI search, nonetheless only the first 20 hits that may be of significance were shown on the figure.
+'''Profunc'''
+''Related protein sequences''
+[[Image:profunc2.jpg|framed|none]]
+''Proteins with similar fold retrived from SSM (Secondary Structure Matching)''
+[[Image:ssm2.jpg|framed|none]]
+From Profunc, similarities of related proteins and proteins with similar fold to query protein were compared with results from DALI. 2qgnA is the query protein highlighted in black in all tables. 2crm, 2crr and 2crq were both found in DALI and Profunc(highlighted in red). On the other hand, 2ze5,2ze6,2ze7 and 2ze8, as well as 3adk and 2qor were also found in DALI output(highlighted in blue).
+Based on the outcome of DALI and Profunc, PDB files of each structurally similar protein was obtained from PDB. These were each superimposed against 2qgn using the PyMOL software, to compare the structural similiarity. Results are as below :
+[[Image:2qgn421.jpg|framed|left|'''Figure 10''' 3crq superimposed against 2qgn via PyMOL. 2qgn indicated in green.]][[Image:2qgn2.png|framed|left|'''Figure 11''' 3crm superimposed against 2qgn via PyMOL. 2qgn indicated in green.]]<BR> <BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>
+[[Image:2qgn.png|framed|left|'''Figure 12''' 2ze7 superimposed against 2qgn via PyMOL. 2qgn indicated in green.]][[Image:2qor.jpg|framed|left|'''Figure 13''' 2qor superimposed against 2qgn via PyMOL. 2qgn indicated in green]]<BR>
+<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR>
+As indicated by the figures above, each structures were structurally similar to 2qgn, suggesting that they could have functionally similar properties. Nonetheless,notice that 2-qor is only partially similar to 2qgn structure.
+As the Z-score decreases for the DALI output, the structural similarity decreases as well. For this reason, functional analysis of 2qgn was only done for DALI outputs with lali scores higher than 200.
+=='''Localisation Expression of tRNA isopentenyltransferase'''==
 Generally, this enzyme is expressed in all tissue types since it is important that functional protein are synthesized in each of these tissues. Specifically, it is highly expressed in adipose tissues as well as oocytes. Relatively high amounts of this enzyme is expressed in prostate, adrenal gland, B-cells and trachea. The reason why tRNA-IPT are at higher concentrations in these tissues may reflect  higher levels of protein synthesis.
@@ Line 33: / Line 145: @@
-=='''Domain and Structural Analysis'''==
-There are two main domains regarding 2qgnA.
-[[Image:domain--1.jpg]]
-[[Image:domain--2.jpg]]
+'''Molecular Function'''
-'''Structural Elements and Functional Binding Sites of tRNA isopentenyltransferase'''
+[[Image:PROKNOW2- molecular function.png]]
+'''Biological Process'''
--Functional Sites Found By Pattern Search
- Table 1
--Functional Sites Found by Sequence Conservation In Structurally Related Proteins
+[[Image:PROKNOW-biological process.png]]
--Functional Sites Found by Structure Conservation In Structurally Related Proteins
+Both the molecular function and biological proceses are obtained from ProKnow.
--Multiple Sequence Alignment
+=='''Annotations for tRNA isopentenyltransferase'''==
+The annotations below shows the cellular, biological processes and functional function of the protein in plants. tRNA-IPT was first found in plants and it is a very important hormaone enzyme that affects plant growth and development.
- Tree
+[[Image:annotation1.JPG]]