2gnx Results
Evolutionaty analysis
Structural analysis
An analysis of the secondary structure of the protein from its amino acid sequence (Figure 1) shows the secondary structural arrangement of different regions of our protein
Table 1: Dali analysis of the 2GNX protein
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3023-A 2gnx-A 42.9 0.0 280 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3023-A 2cmr-A 5.7 3.5 114 192 11 0 0 11 S IMMUNOGLOBULIN COMPLEX d5 (fab heavy chain) d5 (fab li 3: 3023-A 1j3w-A 5.7 3.2 99 134 12 0 0 9 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION giding protein-m 4: 3023-A 1jmr-A 5.5 3.0 94 246 9 0 0 12 S 5: 3023-A 1f5m-B 5.5 5.0 107 177 9 0 0 13 S SIGNALING PROTEIN gaf (saccharomyces cerevisiae) yeas 6: 3023-A 1vcs-A 5.0 4.7 82 102 9 0 0 8 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION vesicle transpor 7: 3023-A 1kt0-A 4.9 2.8 81 357 6 0 0 7 S ISOMERASE 51 kda fk506-binding protein (fkbp51) Mutant 8: 3023-A 1e2a-A 4.9 4.5 80 102 9 0 0 6 S TRANSFERASE enzyme iia (enzyme iii, lactose-specific i 9: 3023-A 2d2s-A 4.8 3.1 75 217 11 0 0 5 S ENDOCYTOSIS/EXOCYTOSIS exocyst complex component exo84 10: 3023-A 2oew-A 4.7 2.8 119 358 8 0 0 12 S PROTEIN TRANSPORT programmed cell death 6-interacting 11: 3023-A 1h3q-A 4.7 4.2 92 140 4 0 0 11 S TRANSPORT sedlin (sedl) (mus musculus) mouse S.B.Jan 12: 3023-A 2oev-A 4.5 36.5 151 697 7 0 0 14 S PROTEIN TRANSPORT programmed cell death 6-interacting 13: 3023-A 2cwy-A 4.5 2.4 82 92 20 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 14: 3023-A 2c5i-T 4.5 2.8 75 93 11 0 0 5 S PROTEIN TRANSPORT/COMPLEX t-snare affecting a late gol 15: 3023-A 3nul 4.4 3.4 93 130 5 0 0 11 S ACTIN-BINDING PROTEIN profilin i (arabidopsis thalian
A Dali analysis (Table 1) of the 2GNX protein was highly inconclusive and there were no significant structural matches to the hypothetical protein.
Table 2: Dali analysis of N-terminal domain
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3256-A 2gnx-A 23.2 0.0 173 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3256-A 1e2a-A 7.5 4.5 80 102 9 0 0 6 S TRANSFERASE enzyme iia (enzyme iii, lactose-specific i 3: 3256-A 1kt0-A 7.4 2.8 81 357 6 0 0 7 S ISOMERASE 51 kda fk506-binding protein (fkbp51) Mutant 4: 3256-A 2d2s-A 7.3 3.1 75 217 11 0 0 5 S ENDOCYTOSIS/EXOCYTOSIS exocyst complex component exo84 5: 3256-A 1vcs-A 7.3 4.7 78 102 9 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION vesicle transpor 6: 3256-A 2cmr-A 6.9 3.2 104 192 11 0 0 9 S IMMUNOGLOBULIN COMPLEX d5 (fab heavy chain) d5 (fab li 7: 3256-A 2c5i-T 6.9 2.8 75 93 11 0 0 5 S PROTEIN TRANSPORT/COMPLEX t-snare affecting a late gol 8: 3256-A 2h7o-A 6.8 3.0 81 270 5 0 0 7 S SIGNALING PROTEIN protein kinase ypka fragment (protei 9: 3256-A 2h7v-C 6.6 4.2 76 269 13 0 0 5 S SIGNALING PROTEIN migration-inducing protein 5 (ras-re 10: 3256-A 2dnx-A 6.5 4.9 80 130 6 0 0 6 S TRANSPORT PROTEIN syntaxin-12 fragment (homo sapiens) 11: 3256-A 1hg5-A 6.5 3.2 85 263 9 0 0 6 S ENDOCYTOSIS clathrin assembly protein short form frag 12: 3256-A 1a17 6.4 2.5 71 159 3 0 0 5 S HYDROLASE serineTHREONINE PROTEIN PHOSPHATASE 5 fragme 13: 3256-A 2if4-A 6.3 2.5 82 258 7 0 0 7 S SIGNALING PROTEIN atfkbp42 fragment (twd1 (twisted dwa 14: 3256-A 1owa-A 6.2 3.3 76 156 12 0 0 6 S CYTOKINE spectrin alpha chain, erythrocyte fragment (e 15: 3256-A 2oew-A 6.1 2.8 119 358 8 0 0 12 S PROTEIN TRANSPORT programmed cell death 6-interacting
A Dali analysis carried out separately with only the N-terminal domain (Table 2) of the protein also did not produce any significant structural matches.
A CE alignment between IMMUNOGLOBULIN COMPLEX d5 (2CMR) and 2GNX was performed. The result revealed that the C-terminus of 2GNX matched 2CMR:A which was a TRANSMEMBRANE GLYCOPROTEIN, with Rmsd = 3.8Å and Z-Score = 3.7. The 3D figure showed that two proteins both had five-helix strucuture and they were well fitted. However, the function of this 5-helix stucture was not clear.
Table 3: Dali analysis of C-terminal domain
NR. STRID1 STRID2 Z RMSD LALI LSEQ2 %IDE REVERS PERMUT NFRAG TOPO PROTEIN 1: 3257-A 2gnx-A 24.3 0.0 118 280 100 0 0 1 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 2: 3257-A 1jmr-A 7.6 3.0 94 246 9 0 0 12 S 3: 3257-A 1j3w-A 7.5 2.9 91 134 13 0 0 7 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION giding protein-m 4: 3257-A 1f5m-B 6.8 2.9 95 177 9 0 0 10 S SIGNALING PROTEIN gaf (saccharomyces cerevisiae) yeas 5: 3257-A 1h3q-A 6.6 4.2 92 140 4 0 0 11 S TRANSPORT sedlin (sedl) (mus musculus) mouse S.B.Jan 6: 3257-A 3nul 6.3 3.4 93 130 5 0 0 11 S ACTIN-BINDING PROTEIN profilin i (arabidopsis thalian 7: 3257-A 1mc0-A 5.8 4.1 99 341 8 0 0 11 S HYDROLASE 3',5'-cyclic nucleotide phosphodiesterase 2a 8: 3257-A 2h28-A 5.4 2.8 75 106 8 0 0 10 S STRUCTURAL GENOMICS, UNKNOWN FUNCTION hypothetical pro 9: 3257-A 2p7j-A 5.0 2.9 79 262 13 0 0 11 S TRANSCRIPTION putative sensory boxGGDEF FAMILY PROTEIN 10: 3257-A 2dmw-A 5.0 3.3 85 131 7 0 0 11 S MEMBRANE PROTEIN synaptobrevin-like 1 variant fragment 11: 3257-A 2avx-A 4.8 3.6 93 171 5 0 0 10 S TRANSCRIPTION regulatory protein sdia Mutant (escheri 12: 3257-A 2j3t-C 4.7 5.2 83 141 7 0 0 8 S PROTEIN TRANSPORT trafficking protein particle complex 13: 3257-A 2hj9-C 4.7 3.3 76 210 5 0 0 9 S SIGNALING PROTEIN autoinducer 2-binding periplasmic pr 14: 3257-A 2hje-A 4.6 3.0 75 210 5 0 0 9 S SIGNALING PROTEIN autoinducer 2 sensor kinasePHOSPHATA 15: 3257-A 2uv0-E 4.5 3.5 93 159 9 0 0 12 S TRANSCRIPTION transcriptional activator protein lasr
However, a Dali analysis (Table 3) carried out with the C-terminal domain of the protein produced one significant structural match, this being the GAF signalling protein, i.e the 4th result in the Dali analysis.
The Dotlet analysis (Figure 2) showed that there was no internally homologous repeats in the C-terminus of 2GNX.
USR1:A 185/392 QVAKNLFTH---LDDVSVLLQEIITEARNLSNAEICSVFLLDQ----------------- USR2:A 181/283 TASEXKALTAKANPDLFGKISSFIRKY------DAANVSLIFDNRGSESFQGHGYHHPHS USR1:A 225/432 ----------NELVAKVFDGGVVDDESYEIRIPADQGIAGHVATTG----------QILN USR2:A 235/#44 YREAPKGVDQYPAVVSLP----------SDRPVXHWPNVIXIXTDRASDLNSLEKVVHFY USR1:A 265/472 IPDAYAHPLFYRGVDDSTGFRTRNILCFPIKNENQEVIGVAELVNKINGPWFSKFDEDLA USR2:A 285/387 DDKV-------------------QSTYFLTRPEP-HFTIVVIFESK---------KSERD USR1:A 325/532 TAFSIYCGISIAHSLL USR2:A 316/418 SHFISFLNELSLALKN
Figure 4: CE predicted structural alignment. USR1 = 1MC0(PDB code), Regulatory Segment of Mouse 3',5'-Cyclic Nucleotide Phosphodiesterase 2A, Containing the GAF A and GAF B Domains. USR2= 2GNX
The conserved residues of the ligand binding site in 1MC0 were not consistent with the aligned residues in 2GNX.
Zoraghi R. et al. (2003) indicated a fingerprint of the ligand binding site in 1MC0, which was the following patterns:
SX(13-18)FDX(18-22)IAX(21)[Y/N]X(2)VDX(2)TX(3)TX(19)[E/Q]
>2GNX C-terminus sequence against the published patterns
The alignment above (Figure ) indicated that the published patterns roughly fit into the protein sequence of 2GNX. The 3D structure analysis (figure ) revealed that some residues (in yellow) were likely not within the ligand binding pocket, however other residues (in red) were still potential ligand binding site.
Functional Analysis
STRING and CDART returned no results for the submitted protein data.
Table 4: BlastP Results
BlastP returned results however the results were limited to hypothetical proteins that gave no added information.
Score (Bits) | E Value | |||
ref | XP_001163972.1 | PREDICTED: similar to FLJ32549 protein [Pan | 850 | 0.0 |
ref | XP_001116860.1 | PREDICTED: hypothetical protein isoform 1 [M | 848 | 0.0 |
ref | NP_689653.3 | hypothetical protein LOC144577 [Homo sapiens... | 847 | 0.0 |
gb | AAH36246.1 | FLJ32549 protein [Homo sapiens] | 846 | 0.0 |
ref | XP_001116875.1 | PREDICTED: hypothetical protein isoform 3 [M | 843 | 0.0 |
ref | XP_531657.2 | PREDICTED: hypothetical protein XP_531657 [Cani | 827 | 0.0 |
ref | XP_615557.3 | PREDICTED: hypothetical protein [Bos taurus] | 823 | 0.0 |
gb | EDL24424.1 | cDNA sequence BC048403, isoform CRA_a [Mus muscul | 803 | 0.0 |
ref | NP_766610.2 | hypothetical protein LOC270802 [Mus musculus... | 803 | 0.0 |
ref | XP_576234.2 | PREDICTED: hypothetical protein [Rattus norv... | 802 | 0.0 |
ref | XP_001364942.1 | PREDICTED: hypothetical protein [Monodelphis | 797 | 0.0 |
ref | XP_416063.1 | PREDICTED: hypothetical protein [Gallus gallus] | 796 | 0.0 |
dbj | BAC39804.1 | unnamed protein product [Mus musculus] | 760 | 0.0 |
ref | XP_001116868.1 | PREDICTED: hypothetical protein isoform 2 [M | 743 | 0.0 |
ref | NP_001085035.1 | hypothetical protein LOC432102 [Xenopus l... | 697 | 0.0 |
ref | NP_001025261.1 | hypothetical protein LOC555715 [Danio rer... | 665 | 0.0 |
ref | NP_001076454.1 | hypothetical protein LOC100005809 [Danio ... | 661 | 0.0 |
ref | XP_001331282.1 | PREDICTED: hypothetical protein [Danio rerio | 598 | 2e-169 |
emb | CAG12393.1 | unnamed protein product [Tetraodon nigroviridis] | 593 | 8e-168 |
pdb | 2GNX | A Chain A, X-Ray Structure Of A Hypothetical Protein... | 554 | 3e-156 |
dbj | BAE41440.1 | unnamed protein product [Mus musculus] | 508 | 2e-142 |
ref | NP_001038719.1 | hypothetical protein LOC692281 [Danio rer... | 357 | 1e-96 |
ref | XP_624797.1 | PREDICTED: hypothetical protein [Apis mellifera | 235 | 3e-60 |
ref | XP_974676.1 | PREDICTED: hypothetical protein [Tribolium cast | 232 | 5e-59 |
ref | XP_001193974.1 | PREDICTED: hypothetical protein [Strongyloce | 208 | 5e-52 |
ref | XP_797380.2 | PREDICTED: hypothetical protein, partial [St... | 207 | 2e-51 |
dbj | BAE37112.1 | unnamed protein product [Mus musculus] >dbj B... | 134 | 2e-29 |
gb | EDL24425.1 | cDNA sequence BC048403, isoform CRA_b [Mus muscul | 132 | 6e-29 |
ref | XP_642387.1 | hypothetical protein DDBDRAFT_0205477 [Dicty... | 87.8 | 1e-15 |
emb | CAJ08583.1 | hypothetical protein, conserved [Leishmania majo | 36.6 | 3.5 |
Table 5: Method Predicted Subcellular Location Evaluation
Locate analysis predicted that the protein is a soluble non-secreted protein. Localisation data was diverse as follows:
Method | Location | Score |
CELLO | Mitochondrion | 1.34 |
CELLO | Extracellular region | 1.08 |
pTarget | Endoplasmic reticulum | 93.90 |
Proteome Analyst | No prediction | 0.00 |
WoLFPSORT | Cytoplasm | 13.00 |
WoLFPSORT | Nucleus | 12.00 |
WoLFPSORT | Golgi apparatus | 3.00 |
MultiLoc | Peroxisome | 0.49 |
MultiLoc | Mitochondrion | 0.23 |
MultiLoc | Extracellular region | 0.09 |
Figure 6: BC048403 Symatlas Expression Profile
Pfam, Profunc, Proknow, and Interpro all returned no results for the protein 2gnxA. However, Symatlas did provide an interesting lead. The expression data is presented in the following diagram. However, the significant results were the number of olfactory receptors with correlated expression profiles.
Table 6: Co-occurring Motifs Corresponding to BC048403
Olfactory receptors were also encountered when the protein was submitted to cis-RED to retrieve the corresponding cis-regulatory motif patterns. All fourteen motif patterns or modules, corresponding to the BC048403 protein are also motif patterns that are found in many different olfactory receptors. Motifs are predicted by cisRED with p-values < 0.005.
In total, the fourteen motifs corresponded to 120 different olfactory receptors. The following table lists the olfactory receptors with 3 or more co-occurring motifs. The header row lists the fourteen modules. Highlighted in orange (nine co-occurring modules) and green (7 co-occurring modules), are the olfactory receptors having the most modules in common with the BC048403 protein.
Figure 7: Number of Motifs Corresponding to each Olfactory Receptor
The following graph represents the number of co-occurring motifs across the entire range of 120 corresponding olfactory receptors.
These motifs were searched for in the other species databases of cis-RED however they were not found as there is no inter-species search tool. Unfortunately, micro-array expression data for the olfactory receptors with the most co-occurring motifs, were unavailable.
Figure 8: Micro-array Expression Profiles Similar to FLJ32549
The following micro-array data was found by browsing through the profile neighbours of the human ortholog using GEO Profiles.
Other interesting motifs found to appear in the Bc048403 protein were motifs that corresponded to the cadherin family.