Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins
Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins i...
Uloženo v:
| Hlavní autor: | |
|---|---|
| Médium: | Dissertation |
| Jazyk: | angličtina |
| Vydáno: |
ProQuest Dissertations & Theses
01.01.2011
|
| Témata: | |
| ISBN: | 1124584323, 9781124584324 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins is important because it shows how various domains fuse to form more complex proteins, and acquire new functions possibly affecting the organismal level of evolution. In this thesis, I first investigated the performance of several protein classifiers using one of the most divergent transmembrane protein families, the G-protein-coupled receptor (GPCR) superfamily, as an example. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. While a support vector machine using local pairwise-alignment scores showed very well-balanced performance, profile hidden Markov models were generally highly specific and well suited for classifying well-established protein family members. We suggested that different types of protein classifiers should be applied to gain the optimal mining power. Including some of these methods, combinations of multiple protein classification methods were applied to identify especially divergent plant GPCRs (or seven-transmembrane receptors) from the Arabidopsis thaliana genome. We identified 394 proteins as the candidates and provided a prioritized list including 54 proteins for further investigation. For multi-domain protein families, the distribution of urea amidolyase, urea carboxylase, and sterol-sensing domain (SSD) proteins across kingdoms was investigated. Molecular evolutionary analysis showed that the urea amidolyase genes currently found only in fungi among eukaryotes are the results of a horizontal gene transfer event from proteobacteria. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Finally we showed the possibility of the bacterial origin of the eukaryotic SSD-containing proteins and that these ancestral sequences evolved into four different SSD-containing proteins acquiring specific functions. Two groups of SSD-containing proteins seemed to have been formed before the divergence of fungal and metazoan lineages by domain acquisition. |
|---|---|
| AbstractList | Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins is important because it shows how various domains fuse to form more complex proteins, and acquire new functions possibly affecting the organismal level of evolution. In this thesis, I first investigated the performance of several protein classifiers using one of the most divergent transmembrane protein families, the G-protein-coupled receptor (GPCR) superfamily, as an example. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. While a support vector machine using local pairwise-alignment scores showed very well-balanced performance, profile hidden Markov models were generally highly specific and well suited for classifying well-established protein family members. We suggested that different types of protein classifiers should be applied to gain the optimal mining power. Including some of these methods, combinations of multiple protein classification methods were applied to identify especially divergent plant GPCRs (or seven-transmembrane receptors) from the Arabidopsis thaliana genome. We identified 394 proteins as the candidates and provided a prioritized list including 54 proteins for further investigation. For multi-domain protein families, the distribution of urea amidolyase, urea carboxylase, and sterol-sensing domain (SSD) proteins across kingdoms was investigated. Molecular evolutionary analysis showed that the urea amidolyase genes currently found only in fungi among eukaryotes are the results of a horizontal gene transfer event from proteobacteria. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Finally we showed the possibility of the bacterial origin of the eukaryotic SSD-containing proteins and that these ancestral sequences evolved into four different SSD-containing proteins acquiring specific functions. Two groups of SSD-containing proteins seemed to have been formed before the divergence of fungal and metazoan lineages by domain acquisition. |
| Author | Strope, Pooja K |
| Author_xml | – sequence: 1 givenname: Pooja surname: Strope middlename: K fullname: Strope, Pooja K |
| BookMark | eNo1j01LAzEQhgMqaGv_Q_C-kN1ks8lRilWh4KX3ko8ZjaSJbjb9_UaspwdmeN6Zd0WuU05wRVZ9P4hRCT7wW7IpJVjGmOacieGOhF1Nbgk5mUhdNG2LwZnfAc1IfTjD_A5poV9zXiAkWuC7QnJQqEmennIEV6OZKZxzrP_aqcYldD6fTDMuZrknN2higc2Fa3LYPR22L93-7fl1-7jvPoTuO43eWi_RDg0eNEya96C5QXTSKzWNTg8cpWVg0XHUArjUqAxDz2FSfE0e_mLb3fZqWY6fuc6tXjkqKUYhpej5D30FWQg |
| ContentType | Dissertation |
| Copyright | Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
| Copyright_xml | – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
| DBID | 053 0BH 0NQ AAMXL ABOIG AFLLJ BBNVY BHPHI CBPLH EU9 G20 HCIFZ M8- OK5 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI |
| DatabaseName | Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ University of Nebraska - Lincoln Natural Science Collection - hybrid linking Biological Science Collection - hybrid linking SciTech Premium Collection - hybrid linking Biological Science Database Natural Science Collection ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global SciTech Premium Collection ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection Dissertations & Theses @ Big Ten Academic Alliance ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition |
| DatabaseTitle | ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global Dissertations & Theses Europe Full Text: Science & Technology ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Dissertations & Theses @ University of Nebraska - Lincoln Natural Science Collection Biological Science Collection Dissertations & Theses @ CIC Institutions ProQuest Central (New) ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest One Academic (New) |
| DatabaseTitleList | ProQuest One Academic Middle East (New) |
| Database_xml | – sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| ExternalDocumentID | 2336047491 |
| Genre | Dissertation/Thesis |
| GroupedDBID | 053 0BH 0NQ 8R4 8R5 BBNVY BHPHI CBPLH EU9 G20 HCIFZ M8- OK5 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI Q2X |
| ID | FETCH-LOGICAL-h491-9fdbbd6fb2bbdde9e7931e93affc6d8875c923f6b0ebfc3f94e369f8a0fd3e783 |
| IEDL.DBID | G20 |
| ISBN | 1124584323 9781124584324 |
| IngestDate | Sun Jun 29 16:32:38 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-h491-9fdbbd6fb2bbdde9e7931e93affc6d8875c923f6b0ebfc3f94e369f8a0fd3e783 |
| Notes | SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12 |
| PQID | 864546641 |
| PQPubID | 18750 |
| ParticipantIDs | proquest_journals_864546641 |
| PublicationCentury | 2000 |
| PublicationDate | 20110101 |
| PublicationDateYYYYMMDD | 2011-01-01 |
| PublicationDate_xml | – month: 01 year: 2011 text: 20110101 day: 01 |
| PublicationDecade | 2010 |
| PublicationYear | 2011 |
| Publisher | ProQuest Dissertations & Theses |
| Publisher_xml | – name: ProQuest Dissertations & Theses |
| SSID | ssib000933042 |
| Score | 1.5545208 |
| Snippet | Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Bioinformatics Molecular biology |
| Title | Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins |
| URI | https://www.proquest.com/docview/864546641 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV09T8MwED1BYUAMgAABBeSB1SKpEyeeGICIqWLo0K3yx1lkaAJN6e_Hdp2oEhILUqQkSk6ynPjune_dHcBDYXjKpS1prqxzUKy7ks4QUpEZnGgHCfJEh2YTxXRazufiPXJzukir7HViUNSm1X6P_LH0pac4z9Knzy_qm0b54GrsoLEPBz65NuT67qKfwVl3oMLHA9mExSpP_X32SwUHu1Kd_HNEp3D8shNPP4M9bM6hrpyt2m7xEe3BsWcDhRdIa4nxRAyfT0VCiYa6IQObmsjGkGXfL5fgJv6WXiwQD6lpl9JJRMnuAmbV6-z5jcaOCvQjEykV1ihluFUTdzIo0C3OFAWT1mpunLrJtcN7lqsEldXMigwZF7aUiTUMi5JdwqhpG7wCkuRKZAwTmafM-TxCoikEauYABfJUqmsY95O2iKuiWwwzdvPn0zEcbXdu_XELo_XqG-_gUG_Wdbe6D9_4B6r4tTE |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1R1NS8MwNMwpKB5UVNT5kYMei23Tps1BPDjHxubwsMNuJU1ecAdbXefE_-SP9KVrx0DwtoNQaEv7Au37fnkfhFxHmntcmtgJU4MOisEriYrQEYEGX6FJELqqHDYRDYfxeCyeG-S7roWxaZW1TCwFtc6VjZHfxrb1FOeBd__27tihUXZztZ6gsaCKPnx9osdW3PXaiN4b3-88jh66TjVUwHkJhOcIo9NUc5P6eNIgAOnTA8GkMYpr5LhQocljeOpCahQzIgDGhYmlazSDKGa47AbZDGyjO1tavGpsLWMDaMPY7Ufms6qpVH0f_JL4pRrr7P2vH7BPdtsr2QIHpAHZIZl0UBMvAphUWdPf5jqVL9DcUG3TTGy1GC0bUEwyuswVpzLT9LWeBkxhXjGdBSvTKh2dv0qEqCCLIzJax5cdk2aWZ3BCqBumImDgytBj6NEJCToSoBiaS8A9mZ6SVo2jpOL5Ilki6OzPp1dkuzt6GiSD3rDfIjuLGLU9zklzNv2AC7Kl5rNJMb0syYuSZM3Y_AGpRxYn |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Functional+classification+of+divergent+protein+sequences+and+molecular+evolution+of+multi-domain+proteins&rft.DBID=053%3B0BH%3B0NQ%3BAAMXL%3BABOIG%3BAFLLJ%3BBBNVY%3BBHPHI%3BCBPLH%3BEU9%3BG20%3BHCIFZ%3BM8-%3BOK5%3BPHGZM%3BPHGZT%3BPKEHL%3BPQEST%3BPQGLB%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Strope%2C+Pooja+K&rft.date=2011-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=1124584323&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=2336047491 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/sc.gif&client=summon&freeimage=true |

