Mixture modeling of next generation sequencing data and its applications to genotyping and estimating genotype frequencies
Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of...
Uloženo v:
| Hlavní autor: | |
|---|---|
| Médium: | Dissertation |
| Jazyk: | angličtina |
| Vydáno: |
ProQuest Dissertations & Theses
01.01.2013
|
| Témata: | |
| ISBN: | 9781303807114, 1303807114 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of an individual. A mixture distribution was used to model the proportion of alternative alleles at a specified base pair position assuming a biallelic single nucleotide polymorphism model. I measured the proportion of alternative alleles for positions in chromosome 1 exome sequencing data fro two trios taken from the Pilot 3 data in the 1000 Genomes Project. The measurements were based on the counts of reference and alternative alleles calculated by the SAMtools genetic software. The mixture model studied here had two point distributions and five continuous distributions. I applied the expectation-maximization algorithm to obtain the maximum likelihood estimates of the mixture model parameters for each individual. The fitted mixture model well described the properties of the distribution of the alternative allele proportions. The estimates of mixing proportions were used to estimate the genotype frequencies in the data. Each individual had different estimates of model parameters, but the estimates of genotype fractions of the six individuals were similar. The estimated fractions of the members from each trio were similar to each other. I next combined two approaches of clustering and mixture modeling to genotype the exomic base pair positions of an individual using next generation sequencing data. The alternative allele proportion at a position was used to measure the Bayesian posterior probability of single nucleotide polymorphism at a position. I developed software package named "SNVclust" to generate alternative allele proportions and genotypes of an individual. This software was used to make a call set of single nucleotide polymorphism positions and genotypes for each of three members of a trio from the 1000 Genomes Project. The results from this software were compared with the released single nucleotide polymorphisms in the 1000 Genomes Project and results from two other programs. Then I found that minimal average coverage greater than 43 should be to use SNVclust for whole exome sequencing data. |
|---|---|
| AbstractList | Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of an individual. A mixture distribution was used to model the proportion of alternative alleles at a specified base pair position assuming a biallelic single nucleotide polymorphism model. I measured the proportion of alternative alleles for positions in chromosome 1 exome sequencing data fro two trios taken from the Pilot 3 data in the 1000 Genomes Project. The measurements were based on the counts of reference and alternative alleles calculated by the SAMtools genetic software. The mixture model studied here had two point distributions and five continuous distributions. I applied the expectation-maximization algorithm to obtain the maximum likelihood estimates of the mixture model parameters for each individual. The fitted mixture model well described the properties of the distribution of the alternative allele proportions. The estimates of mixing proportions were used to estimate the genotype frequencies in the data. Each individual had different estimates of model parameters, but the estimates of genotype fractions of the six individuals were similar. The estimated fractions of the members from each trio were similar to each other. I next combined two approaches of clustering and mixture modeling to genotype the exomic base pair positions of an individual using next generation sequencing data. The alternative allele proportion at a position was used to measure the Bayesian posterior probability of single nucleotide polymorphism at a position. I developed software package named "SNVclust" to generate alternative allele proportions and genotypes of an individual. This software was used to make a call set of single nucleotide polymorphism positions and genotypes for each of three members of a trio from the 1000 Genomes Project. The results from this software were compared with the released single nucleotide polymorphisms in the 1000 Genomes Project and results from two other programs. Then I found that minimal average coverage greater than 43 should be to use SNVclust for whole exome sequencing data. |
| Author | Lihm, Jayon |
| Author_xml | – sequence: 1 givenname: Jayon surname: Lihm fullname: Lihm, Jayon |
| BookMark | eNotjVtLxDAQhQMqqGv_Q8DnwqRJL3mUxRus-LLvy7SZLNFuUpsUVn-9rfZpGL7vnHPLLn3wdMEyXTdCgmygFkJdsyxG1wKAlhJUccN-3tw5TSPxUzDUO3_kwXJP58SP5GnE5ILnkb4m8t1CDSbk6A13KXIcht51f07kKSyRkL6HxVsUismdZjq_KyFux7WL4h27sthHyta7Yfunx_32Jd-9P79uH3b5p6rLvLIIAoCk6RptFFZCad0JU0uBtqtAN4RKGY3aYkUt6Eq0jabK2NrIAlBu2P1_7TCGeTqmw0eYRj8vHkRZABSibkr5C9AjXpo |
| ContentType | Dissertation |
| Copyright | Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
| Copyright_xml | – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
| DBID | 053 0BH 0KZ ABQRF AFLLJ AFOKG ARAPS BGLVJ CBPLH EU9 G20 HCIFZ M8- PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI |
| DatabaseName | Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ SUNY Stony Brook Technology Collection - hybrid linking SciTech Premium Collection - hybrid linking Advanced Technologies & Aerospace Collection - hybrid linking ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection ProQuest Technology Collection ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global SciTech Premium Collection ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition |
| DatabaseTitle | Advanced Technologies & Aerospace Collection Technology Collection ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global Dissertations & Theses Europe Full Text: Science & Technology ProQuest One Applied & Life Sciences Dissertations & Theses @ SUNY Stony Brook ProQuest One Academic UKI Edition ProQuest Central (New) ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest One Academic (New) |
| DatabaseTitleList | Advanced Technologies & Aerospace Collection |
| Database_xml | – sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics |
| ExternalDocumentID | 3289904181 |
| Genre | Dissertation/Thesis |
| GroupedDBID | 053 0BH 0KZ 8R4 8R5 ARAPS BGLVJ CBPLH EU9 G20 HCIFZ M8- PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI Q2X |
| ID | FETCH-LOGICAL-k475-6fa0100e3dc89d4a61499c1d731afc6098ea44d9a9fa6eb0961b89e6df7d320a3 |
| IEDL.DBID | G20 |
| ISBN | 9781303807114 1303807114 |
| IngestDate | Mon Jun 30 07:15:48 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-k475-6fa0100e3dc89d4a61499c1d731afc6098ea44d9a9fa6eb0961b89e6df7d320a3 |
| Notes | SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12 |
| PQID | 1520021785 |
| PQPubID | 18750 |
| ParticipantIDs | proquest_journals_1520021785 |
| PublicationCentury | 2000 |
| PublicationDate | 20130101 |
| PublicationDateYYYYMMDD | 2013-01-01 |
| PublicationDate_xml | – month: 01 year: 2013 text: 20130101 day: 01 |
| PublicationDecade | 2010 |
| PublicationYear | 2013 |
| Publisher | ProQuest Dissertations & Theses |
| Publisher_xml | – name: ProQuest Dissertations & Theses |
| SSID | ssib000933042 |
| Score | 1.6129793 |
| Snippet | Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Bioinformatics Statistics |
| Title | Mixture modeling of next generation sequencing data and its applications to genotyping and estimating genotype frequencies |
| URI | https://www.proquest.com/docview/1520021785 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV09T8MwED1BYagY-BYfBXlgtWhiJ7YnBqAwQMXQoVvl2A6qkBJoAgJ-PT7XlEpILIzWxVFkOe-e7873AM5S5UzuwZE6ozLqN4Wm3o3gNa6sKDLHEjZvknQnhkM5HquHGHBrYlnlNyYGoLa1wRj5eZKFcgIhs4vnF4qqUZhdjRIaq7CGzAZ7598s05_FaR2RWnpvmvDY5mkx_oXBwbEMNv_7SVuwcbWUUd-GFVftQBcp5LwD8y583k_fMU1AguiN91SkLknlMZk8hpbTOI_Eimq0Ys0o0ZUl07Yhy_lt0tY4pW4_8I5VeARbdCDl9cNocaScxXe5Zg9Gg-vR5S2Nigv0iYuM5qX2x7O-Y9ZIZbn2rlspk1jBEl2avK-k05xbpVWpc1egWkwhlcttKSxL-5rtQ6eqK3cAxNNOVri0EKmTnBdCexxQzHq2Y7nhOj2E3veaTuJf00x-FvTob_MxdNMgS4GhkB502tmrO4F18-bXdnYaNsEXD0y90w |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V07T8MwED5BQaJi4C0eBTzAGNHEzsMDYqCUVn2IoUO3yokdVCEl0IRH-U_8R3xuUiohsXVgtC62FN_pvvM9AS4criJPK0dLRdy1tFAIS8MIlnG5YegqatNZk6Su3-8HwyF_WIGvshYG0ypLnWgUtUwj9JFf2a5JJ_AD9-b5xcKpURhdLUdozMSio6bv-smWXbcbmr-XjtO8G9y2rGKqgPXEfNfyYqGfIHVFZRRwyYSGJ84jW_rUFnHk1XmgBGOSCx4LT4U4ESUMuPJk7Evq1AXVx67CGqO-jRmE94vW1tw5gMAQaPC2WdFVar7-pfINjjW3_tkNbMNmYyFfYAdWVLILVTSQZ_2l9-CzN_7AIAgxI300DpM0JolGHPJoGmrjPlLkiyMVM2KJSCQZ5xlZjN6TPMUtaT7FCjLzCTYgQYNeLwuKIvGkOEtl-zBYxn8fQCVJE3UIRBvVNFRO6DsqYCz0hdZynEpty0kWMeEcQa1k4ajQCdnoh3_Hf5PPYaM16HVH3Xa_cwJVxwzgQKdPDSr55FWdwnr0pu95cmbkj8Boydz-BspJGng |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Mixture+modeling+of+next+generation+sequencing+data+and+its+applications+to+genotyping+and+estimating+genotype+frequencies&rft.DBID=053%3B0BH%3B0KZ%3BABQRF%3BAFLLJ%3BAFOKG%3BARAPS%3BBGLVJ%3BCBPLH%3BEU9%3BG20%3BHCIFZ%3BM8-%3BPHGZM%3BPHGZT%3BPKEHL%3BPQEST%3BPQGLB%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Lihm%2C+Jayon&rft.date=2013-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781303807114&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=3289904181 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/sc.gif&client=summon&freeimage=true |

