Mixture modeling of next generation sequencing data and its applications to genotyping and estimating genotype frequencies

Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Lihm, Jayon
Médium: Dissertation
Jazyk:angličtina
Vydáno: ProQuest Dissertations & Theses 01.01.2013
Témata:
ISBN:9781303807114, 1303807114
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of an individual. A mixture distribution was used to model the proportion of alternative alleles at a specified base pair position assuming a biallelic single nucleotide polymorphism model. I measured the proportion of alternative alleles for positions in chromosome 1 exome sequencing data fro two trios taken from the Pilot 3 data in the 1000 Genomes Project. The measurements were based on the counts of reference and alternative alleles calculated by the SAMtools genetic software. The mixture model studied here had two point distributions and five continuous distributions. I applied the expectation-maximization algorithm to obtain the maximum likelihood estimates of the mixture model parameters for each individual. The fitted mixture model well described the properties of the distribution of the alternative allele proportions. The estimates of mixing proportions were used to estimate the genotype frequencies in the data. Each individual had different estimates of model parameters, but the estimates of genotype fractions of the six individuals were similar. The estimated fractions of the members from each trio were similar to each other. I next combined two approaches of clustering and mixture modeling to genotype the exomic base pair positions of an individual using next generation sequencing data. The alternative allele proportion at a position was used to measure the Bayesian posterior probability of single nucleotide polymorphism at a position. I developed software package named "SNVclust" to generate alternative allele proportions and genotypes of an individual. This software was used to make a call set of single nucleotide polymorphism positions and genotypes for each of three members of a trio from the 1000 Genomes Project. The results from this software were compared with the released single nucleotide polymorphisms in the 1000 Genomes Project and results from two other programs. Then I found that minimal average coverage greater than 43 should be to use SNVclust for whole exome sequencing data.
AbstractList Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing (NGS) research. I present a method for modeling the frequency of single nucleotide polymorphism variants in the exome capturing sequence data of an individual. A mixture distribution was used to model the proportion of alternative alleles at a specified base pair position assuming a biallelic single nucleotide polymorphism model. I measured the proportion of alternative alleles for positions in chromosome 1 exome sequencing data fro two trios taken from the Pilot 3 data in the 1000 Genomes Project. The measurements were based on the counts of reference and alternative alleles calculated by the SAMtools genetic software. The mixture model studied here had two point distributions and five continuous distributions. I applied the expectation-maximization algorithm to obtain the maximum likelihood estimates of the mixture model parameters for each individual. The fitted mixture model well described the properties of the distribution of the alternative allele proportions. The estimates of mixing proportions were used to estimate the genotype frequencies in the data. Each individual had different estimates of model parameters, but the estimates of genotype fractions of the six individuals were similar. The estimated fractions of the members from each trio were similar to each other. I next combined two approaches of clustering and mixture modeling to genotype the exomic base pair positions of an individual using next generation sequencing data. The alternative allele proportion at a position was used to measure the Bayesian posterior probability of single nucleotide polymorphism at a position. I developed software package named "SNVclust" to generate alternative allele proportions and genotypes of an individual. This software was used to make a call set of single nucleotide polymorphism positions and genotypes for each of three members of a trio from the 1000 Genomes Project. The results from this software were compared with the released single nucleotide polymorphisms in the 1000 Genomes Project and results from two other programs. Then I found that minimal average coverage greater than 43 should be to use SNVclust for whole exome sequencing data.
Author Lihm, Jayon
Author_xml – sequence: 1
  givenname: Jayon
  surname: Lihm
  fullname: Lihm, Jayon
BookMark eNotjVtLxDAQhQMqqGv_Q8DnwqRJL3mUxRus-LLvy7SZLNFuUpsUVn-9rfZpGL7vnHPLLn3wdMEyXTdCgmygFkJdsyxG1wKAlhJUccN-3tw5TSPxUzDUO3_kwXJP58SP5GnE5ILnkb4m8t1CDSbk6A13KXIcht51f07kKSyRkL6HxVsUismdZjq_KyFux7WL4h27sthHyta7Yfunx_32Jd-9P79uH3b5p6rLvLIIAoCk6RptFFZCad0JU0uBtqtAN4RKGY3aYkUt6Eq0jabK2NrIAlBu2P1_7TCGeTqmw0eYRj8vHkRZABSibkr5C9AjXpo
ContentType Dissertation
Copyright Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID 053
0BH
0KZ
ABQRF
AFLLJ
AFOKG
ARAPS
BGLVJ
CBPLH
EU9
G20
HCIFZ
M8-
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DatabaseName Dissertations & Theses Europe Full Text: Science & Technology
ProQuest Dissertations and Theses Professional
Dissertations & Theses @ SUNY Stony Brook
Technology Collection - hybrid linking
SciTech Premium Collection - hybrid linking
Advanced Technologies & Aerospace Collection - hybrid linking
Health Research Premium Collection
Technology collection
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations & Theses A&I
ProQuest Dissertations & Theses Global
SciTech Premium Collection
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
DatabaseTitle Advanced Technologies & Aerospace Collection
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations and Theses Professional
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest Dissertations & Theses Global
Dissertations & Theses Europe Full Text: Science & Technology
ProQuest One Applied & Life Sciences
Dissertations & Theses @ SUNY Stony Brook
ProQuest One Academic UKI Edition
ProQuest Central (New)
ProQuest One Academic
ProQuest Dissertations & Theses A&I
ProQuest One Academic (New)
DatabaseTitleList Advanced Technologies & Aerospace Collection
Database_xml – sequence: 1
  dbid: G20
  name: ProQuest Dissertations & Theses Global
  url: https://www.proquest.com/pqdtglobal1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
ExternalDocumentID 3289904181
Genre Dissertation/Thesis
GroupedDBID 053
0BH
0KZ
8R4
8R5
ARAPS
BGLVJ
CBPLH
EU9
G20
HCIFZ
M8-
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
Q2X
ID FETCH-LOGICAL-k475-6fa0100e3dc89d4a61499c1d731afc6098ea44d9a9fa6eb0961b89e6df7d320a3
IEDL.DBID G20
ISBN 9781303807114
1303807114
IngestDate Mon Jun 30 07:15:48 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-k475-6fa0100e3dc89d4a61499c1d731afc6098ea44d9a9fa6eb0961b89e6df7d320a3
Notes SourceType-Dissertations & Theses-1
ObjectType-Dissertation/Thesis-1
content type line 12
PQID 1520021785
PQPubID 18750
ParticipantIDs proquest_journals_1520021785
PublicationCentury 2000
PublicationDate 20130101
PublicationDateYYYYMMDD 2013-01-01
PublicationDate_xml – month: 01
  year: 2013
  text: 20130101
  day: 01
PublicationDecade 2010
PublicationYear 2013
Publisher ProQuest Dissertations & Theses
Publisher_xml – name: ProQuest Dissertations & Theses
SSID ssib000933042
Score 1.6129793
Snippet Estimating the probability that an individual has a base pair nucleodite different from the reference nucleotide is important in next generation sequencing...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Bioinformatics
Statistics
Title Mixture modeling of next generation sequencing data and its applications to genotyping and estimating genotype frequencies
URI https://www.proquest.com/docview/1520021785
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELagMFQMvMWjIA-sFknj-DExAIUBKoYO3apLbKMKKYEmIODX4zNuqYTEwmidHVmn5L575T5CzkxqE0iBs8T62IQXwrJCaMMKnYGEAhkuAmvJnRwO1XisH2LCrYltlXObGAy1qUvMkZ-neWgnkCq_eH5hyBqF1dVIobFK1tCzwdn5N8vuzyJaR0utPJqmPI55Wqx_2eAALIPN_15pi2xcLVXUt8mKrXZIF13I7wnMu-TzfvqOZQIaSG88UtHa0crbZPoYRk7jORo7qlGKPaMUKkOnbUOX69u0rfFI3X7gP1ZhC47oQJfXL6PEUjeLz7LNHhkNrkeXtywyLrAnLnMmHPjwLLGZKZU2HDx0a12mRmYpuFIkWlng3GjQDoQtkC2mUNoK46TJ-glk-6RT1ZU9IDT3MOfhN1PCx2MuByiNTCTIPjitvJd2SHpznU7iV9NMfhR69Lf4mHT7gZYCUyE90mlnr_aErJdvXrez0_ASfAFMyb7l
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V07T8MwED5BQaJi4C0eBTzAGJG34wExUEqrPsTQoVvkxA6qkBJowqP8J_4jPjcplZDYOjBaF1uKz7rv7ny-D-BCWNLkFncNU6rYxI18aUQ-E0bEHE55hAwXmrWkRweDYDRiDyvwVb2FwbLKyiZqQy2yGHPkV5anywlo4N08vxjIGoW3qxWFxuxYdOX0XYVs-XWnqfR7adutu-Ft2yhZBYwnl3qGn3AVgpjSEXHAhMsVPDEWW4I6Fk9i32SB5K4rGGcJ92WEjChRwKQvEioc2-SOWnYV1lyHWlhBeL_obc2TAwgMgQJvyy27Ss3Hv0y-xrHW1j_bgW3YbC7UC-zAikx3oY4O8qy_9B589scfeAlCNKWPwmGSJSRViEMedUNtnEfKenGUYkUs4akg4yIni7f3pMhwSlZM8QWZ_gQbkKBDr4alRJJkUq4l830YLuO_D6CWZqk8BOIpEFfOhRP4KtpMPM5jQU3Kqc0TFigf9AgalQrD0ibk4Y_-jv8Wn8NGe9jvhb3OoHsCdVsTcGDSpwG1YvIqT2E9flP7PDnT549AuGRtfwM_4BuK
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Mixture+modeling+of+next+generation+sequencing+data+and+its+applications+to+genotyping+and+estimating+genotype+frequencies&rft.DBID=053%3B0BH%3B0KZ%3BABQRF%3BAFLLJ%3BAFOKG%3BARAPS%3BBGLVJ%3BCBPLH%3BEU9%3BG20%3BHCIFZ%3BM8-%3BPHGZM%3BPHGZT%3BPKEHL%3BPQEST%3BPQGLB%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Lihm%2C+Jayon&rft.date=2013-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781303807114&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=3289904181
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781303807114/sc.gif&client=summon&freeimage=true