Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins

Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins i...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Strope, Pooja K
Médium: Dissertation
Jazyk:angličtina
Vydáno: ProQuest Dissertations & Theses 01.01.2011
Témata:
ISBN:1124584323, 9781124584324
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins is important because it shows how various domains fuse to form more complex proteins, and acquire new functions possibly affecting the organismal level of evolution. In this thesis, I first investigated the performance of several protein classifiers using one of the most divergent transmembrane protein families, the G-protein-coupled receptor (GPCR) superfamily, as an example. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. While a support vector machine using local pairwise-alignment scores showed very well-balanced performance, profile hidden Markov models were generally highly specific and well suited for classifying well-established protein family members. We suggested that different types of protein classifiers should be applied to gain the optimal mining power. Including some of these methods, combinations of multiple protein classification methods were applied to identify especially divergent plant GPCRs (or seven-transmembrane receptors) from the Arabidopsis thaliana genome. We identified 394 proteins as the candidates and provided a prioritized list including 54 proteins for further investigation. For multi-domain protein families, the distribution of urea amidolyase, urea carboxylase, and sterol-sensing domain (SSD) proteins across kingdoms was investigated. Molecular evolutionary analysis showed that the urea amidolyase genes currently found only in fungi among eukaryotes are the results of a horizontal gene transfer event from proteobacteria. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Finally we showed the possibility of the bacterial origin of the eukaryotic SSD-containing proteins and that these ancestral sequences evolved into four different SSD-containing proteins acquiring specific functions. Two groups of SSD-containing proteins seemed to have been formed before the divergence of fungal and metazoan lineages by domain acquisition.
AbstractList Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins is important because it shows how various domains fuse to form more complex proteins, and acquire new functions possibly affecting the organismal level of evolution. In this thesis, I first investigated the performance of several protein classifiers using one of the most divergent transmembrane protein families, the G-protein-coupled receptor (GPCR) superfamily, as an example. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. While a support vector machine using local pairwise-alignment scores showed very well-balanced performance, profile hidden Markov models were generally highly specific and well suited for classifying well-established protein family members. We suggested that different types of protein classifiers should be applied to gain the optimal mining power. Including some of these methods, combinations of multiple protein classification methods were applied to identify especially divergent plant GPCRs (or seven-transmembrane receptors) from the Arabidopsis thaliana genome. We identified 394 proteins as the candidates and provided a prioritized list including 54 proteins for further investigation. For multi-domain protein families, the distribution of urea amidolyase, urea carboxylase, and sterol-sensing domain (SSD) proteins across kingdoms was investigated. Molecular evolutionary analysis showed that the urea amidolyase genes currently found only in fungi among eukaryotes are the results of a horizontal gene transfer event from proteobacteria. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Finally we showed the possibility of the bacterial origin of the eukaryotic SSD-containing proteins and that these ancestral sequences evolved into four different SSD-containing proteins acquiring specific functions. Two groups of SSD-containing proteins seemed to have been formed before the divergence of fungal and metazoan lineages by domain acquisition.
Author Strope, Pooja K
Author_xml – sequence: 1
  givenname: Pooja
  surname: Strope
  middlename: K
  fullname: Strope, Pooja K
BookMark eNo1j01LAzEQhgMqaGv_Q_C-kN1ks8lRilWh4KX3ko8ZjaSJbjb9_UaspwdmeN6Zd0WuU05wRVZ9P4hRCT7wW7IpJVjGmOacieGOhF1Nbgk5mUhdNG2LwZnfAc1IfTjD_A5poV9zXiAkWuC7QnJQqEmennIEV6OZKZxzrP_aqcYldD6fTDMuZrknN2higc2Fa3LYPR22L93-7fl1-7jvPoTuO43eWi_RDg0eNEya96C5QXTSKzWNTg8cpWVg0XHUArjUqAxDz2FSfE0e_mLb3fZqWY6fuc6tXjkqKUYhpej5D30FWQg
ContentType Dissertation
Copyright Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID 053
0BH
0NQ
AAMXL
ABOIG
AFLLJ
BBNVY
BHPHI
CBPLH
EU9
G20
HCIFZ
M8-
OK5
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DatabaseName Dissertations & Theses Europe Full Text: Science & Technology
ProQuest Dissertations and Theses Professional
Dissertations & Theses @ University of Nebraska - Lincoln
Natural Science Collection - hybrid linking
Biological Science Collection - hybrid linking
SciTech Premium Collection - hybrid linking
Biological Science Database
Natural Science Collection
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations & Theses A&I
ProQuest Dissertations & Theses Global
SciTech Premium Collection
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
Dissertations & Theses @ Big Ten Academic Alliance
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
DatabaseTitle ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations and Theses Professional
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest Dissertations & Theses Global
Dissertations & Theses Europe Full Text: Science & Technology
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Dissertations & Theses @ University of Nebraska - Lincoln
Natural Science Collection
Biological Science Collection
Dissertations & Theses @ CIC Institutions
ProQuest Central (New)
ProQuest One Academic
ProQuest Dissertations & Theses A&I
ProQuest One Academic (New)
DatabaseTitleList ProQuest One Academic Middle East (New)
Database_xml – sequence: 1
  dbid: G20
  name: ProQuest Dissertations & Theses Global
  url: https://www.proquest.com/pqdtglobal1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2336047491
Genre Dissertation/Thesis
GroupedDBID 053
0BH
0NQ
8R4
8R5
BBNVY
BHPHI
CBPLH
EU9
G20
HCIFZ
M8-
OK5
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
Q2X
ID FETCH-LOGICAL-h491-9fdbbd6fb2bbdde9e7931e93affc6d8875c923f6b0ebfc3f94e369f8a0fd3e783
IEDL.DBID G20
ISBN 1124584323
9781124584324
IngestDate Sun Jun 29 16:32:38 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-h491-9fdbbd6fb2bbdde9e7931e93affc6d8875c923f6b0ebfc3f94e369f8a0fd3e783
Notes SourceType-Dissertations & Theses-1
ObjectType-Dissertation/Thesis-1
content type line 12
PQID 864546641
PQPubID 18750
ParticipantIDs proquest_journals_864546641
PublicationCentury 2000
PublicationDate 20110101
PublicationDateYYYYMMDD 2011-01-01
PublicationDate_xml – month: 01
  year: 2011
  text: 20110101
  day: 01
PublicationDecade 2010
PublicationYear 2011
Publisher ProQuest Dissertations & Theses
Publisher_xml – name: ProQuest Dissertations & Theses
SSID ssib000933042
Score 1.5545208
Snippet Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Bioinformatics
Molecular biology
Title Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins
URI https://www.proquest.com/docview/864546641
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV09T8MwED1BYUAMgAABBeSB1SKpEyeeGICIqWLo0K3yx1lkaAJN6e_Hdp2oEhILUqQkSk6ynPjune_dHcBDYXjKpS1prqxzUKy7ks4QUpEZnGgHCfJEh2YTxXRazufiPXJzukir7HViUNSm1X6P_LH0pac4z9Knzy_qm0b54GrsoLEPBz65NuT67qKfwVl3oMLHA9mExSpP_X32SwUHu1Kd_HNEp3D8shNPP4M9bM6hrpyt2m7xEe3BsWcDhRdIa4nxRAyfT0VCiYa6IQObmsjGkGXfL5fgJv6WXiwQD6lpl9JJRMnuAmbV6-z5jcaOCvQjEykV1ihluFUTdzIo0C3OFAWT1mpunLrJtcN7lqsEldXMigwZF7aUiTUMi5JdwqhpG7wCkuRKZAwTmafM-TxCoikEauYABfJUqmsY95O2iKuiWwwzdvPn0zEcbXdu_XELo_XqG-_gUG_Wdbe6D9_4B6r4tTE
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1R1NS8MwNMwpKB5UVNT5kYMei23Tps1BPDjHxubwsMNuJU1ecAdbXefE_-SP9KVrx0DwtoNQaEv7Au37fnkfhFxHmntcmtgJU4MOisEriYrQEYEGX6FJELqqHDYRDYfxeCyeG-S7roWxaZW1TCwFtc6VjZHfxrb1FOeBd__27tihUXZztZ6gsaCKPnx9osdW3PXaiN4b3-88jh66TjVUwHkJhOcIo9NUc5P6eNIgAOnTA8GkMYpr5LhQocljeOpCahQzIgDGhYmlazSDKGa47AbZDGyjO1tavGpsLWMDaMPY7Ufms6qpVH0f_JL4pRrr7P2vH7BPdtsr2QIHpAHZIZl0UBMvAphUWdPf5jqVL9DcUG3TTGy1GC0bUEwyuswVpzLT9LWeBkxhXjGdBSvTKh2dv0qEqCCLIzJax5cdk2aWZ3BCqBumImDgytBj6NEJCToSoBiaS8A9mZ6SVo2jpOL5Ilki6OzPp1dkuzt6GiSD3rDfIjuLGLU9zklzNv2AC7Kl5rNJMb0syYuSZM3Y_AGpRxYn
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Functional+classification+of+divergent+protein+sequences+and+molecular+evolution+of+multi-domain+proteins&rft.DBID=053%3B0BH%3B0NQ%3BAAMXL%3BABOIG%3BAFLLJ%3BBBNVY%3BBHPHI%3BCBPLH%3BEU9%3BG20%3BHCIFZ%3BM8-%3BOK5%3BPHGZM%3BPHGZT%3BPKEHL%3BPQEST%3BPQGLB%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Strope%2C+Pooja+K&rft.date=2011-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=1124584323&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=2336047491
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124584324/sc.gif&client=summon&freeimage=true