Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results

Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and patho...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society Ročník 2012; s. 1258 - 1261
Hlavní autoři: Reumann, M., Makalic, E., Goudey, B. W., Inouye, M., Bickerstaffe, A., Bui, M., Park, D. J., Kapuscinski, M. K., Schmidt, D. F., Zhou, Z., Qian, G., Zobel, J., Wagner, J., Hopper, J. L.
Médium: Konferenční příspěvek Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.01.2012
Témata:
ISBN:1424441196, 9781424441198
ISSN:1094-687X, 1557-170X, 2694-0604, 2694-0604
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.
AbstractList Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.
Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.
Author Schmidt, D. F.
Makalic, E.
Wagner, J.
Goudey, B. W.
Bickerstaffe, A.
Bui, M.
Qian, G.
Kapuscinski, M. K.
Zobel, J.
Zhou, Z.
Hopper, J. L.
Reumann, M.
Inouye, M.
Park, D. J.
Author_xml – sequence: 1
  givenname: M.
  surname: Reumann
  fullname: Reumann, M.
  email: mreumann@ieee.org
  organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia
– sequence: 2
  givenname: E.
  surname: Makalic
  fullname: Makalic, E.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 3
  givenname: B. W.
  surname: Goudey
  fullname: Goudey, B. W.
  organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia
– sequence: 4
  givenname: M.
  surname: Inouye
  fullname: Inouye, M.
  organization: Dept. of Pathology, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 5
  givenname: A.
  surname: Bickerstaffe
  fullname: Bickerstaffe, A.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 6
  givenname: M.
  surname: Bui
  fullname: Bui, M.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 7
  givenname: D. J.
  surname: Park
  fullname: Park, D. J.
– sequence: 8
  givenname: M. K.
  surname: Kapuscinski
  fullname: Kapuscinski, M. K.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 9
  givenname: D. F.
  surname: Schmidt
  fullname: Schmidt, D. F.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 10
  givenname: Z.
  surname: Zhou
  fullname: Zhou, Z.
  organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia
– sequence: 11
  givenname: G.
  surname: Qian
  fullname: Qian, G.
  organization: Dept. of Math. & Stat., Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 12
  givenname: J.
  surname: Zobel
  fullname: Zobel, J.
  organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 13
  givenname: J.
  surname: Wagner
  fullname: Wagner, J.
  organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia
– sequence: 14
  givenname: J. L.
  surname: Hopper
  fullname: Hopper, J. L.
  organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia
BackLink https://www.ncbi.nlm.nih.gov/pubmed/23366127$$D View this record in MEDLINE/PubMed
BookMark eNo9kU9v1DAQxQ0totuyHwAhVT72ksVjO_7TW7sqLVIRSIDEbeXE49ZV4mzjBNhvj0W3ncs8zfz09DRzTA7TkJCQ98BWAMx-vPpyuV5xBnylhFSg1CuytNqArLUGbTS8JguurKyYYvKAHIPkUkoAqw7JohjIShn964gsc35gpQwYweRbcsSFUAq4XpDx-7zFsR367TzFdEcxuab7L_7euzlP8TfSPLkpFtm6jrrkul2OmQ6B3mEaeqR_okfqch7aWLghFX72O-rd5M7ptxG72Mfkxh0dMc_dlN-RN8F1GZf7fkJ-frr6sb6pbr9ef15f3FZRMJgqo5X1MrTYBMMcamisxNoHIaxs6mC8UN4qFLx2TDaNRlHzILwIdSizBsQJOXvy3Y7D44x52vQxt9h1LuEw5w1wIzQ31oqCnu7RuenRb7Zj7EvizfOdCvDhCYiI-LLef0X8A2VHfeg
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1109/EMBC.2012.6346166
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE
IEEE Proceedings Order Plans (POP) 1998-present
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781457717871
1457717875
EISSN 2694-0604
EndPage 1261
ExternalDocumentID 23366127
6346166
Genre orig-research
Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID 6IE
6IF
6IH
AAJGR
ACGFS
AFFNX
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIO
RNS
29F
29G
6IK
6IM
CGR
CUY
CVF
ECM
EIF
IPLJI
NPM
7X8
ID FETCH-LOGICAL-i301t-8769d4fcebf80ae71b94e5df3394b5f8d36d96e325a04bb7e352f3d3f5f325b13
IEDL.DBID RIE
ISBN 1424441196
9781424441198
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000313296501130&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1094-687X
1557-170X
2694-0604
IngestDate Thu Oct 02 20:28:34 EDT 2025
Thu Jan 02 22:16:24 EST 2025
Wed Aug 27 02:44:20 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i301t-8769d4fcebf80ae71b94e5df3394b5f8d36d96e325a04bb7e352f3d3f5f325b13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 23366127
PQID 1283728993
PQPubID 23479
PageCount 4
ParticipantIDs ieee_primary_6346166
pubmed_primary_23366127
proquest_miscellaneous_1283728993
PublicationCentury 2000
PublicationDate 2012-01-01
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – month: 01
  year: 2012
  text: 2012-01-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society
PublicationTitleAbbrev EMBC
PublicationTitleAlternate Conf Proc IEEE Eng Med Biol Soc
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000818304
ssj0020051
ssj0061641
ssib061542107
ssib053545923
ssib042469959
Score 1.9222589
Snippet Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to...
SourceID proquest
pubmed
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 1258
SubjectTerms Bayes Theorem
Bayesian methods
Bioinformatics
Computational Biology - methods
Computer Simulation
Genome-Wide Association Study - methods
Genomics
Humans
Monte Carlo Method
Neoplasms - genetics
Phenotype
Polymorphism, Single Nucleotide
Runtime
Title Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results
URI https://ieeexplore.ieee.org/document/6346166
https://www.ncbi.nlm.nih.gov/pubmed/23366127
https://www.proquest.com/docview/1283728993
Volume 2012
WOSCitedRecordID wos000313296501130&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9swDCaSYIf2skdf2SPQgB7nNrZsWd5xRYsdtiDAOiC3QLIoNEDqBE68dv9-pOykO3SH3WTBtmSRlkhR30eAc-1dWZDbEHlSnyi1FiNbuDhKUpdIr8fleByAwt_yyUTPZsW0B5_2WBhEDIfP8IKLIZbvVmXDW2WXSqYqVqoP_TxXLVZrv5_C1GySXYvO2WJtC5FO6ofS-WwH6krjuOXzY66n7lp34U66-fL6-5crPvGVXHStMV2wlLSMJbsMLP82RsOidPPy_z7nFRw_ofvEdL9uvYYeVm_g8C9iwiOofzRrrMuQ8YEqBDLCKhQe75goiKZIwVCkwPJslsJ01CZi5QWzvt6jeFg4FOZJ-CIQ2Qo-kfqZmsdlSCdW_xbk7zfL7eYYft5c3159jbrsDNGCJoUtT6OFS32JlkRqMI9tkWLmvJRFajOvnVSuUCiTzIxJB3IkU89LJ33mqc7G8gQG1arCMxCZzDHWhjwvMhaM1tagSr1VmZPGZaYYwhGP3nzdEnDMu4EbwsedHOb0U3Ckw1S4ajbzmDl92JWUQzhtBbR_eCfMt8-_9B0csPTbXZb3MNjWDX6AF-UvGtJ6RJo306OgeX8A8SPRxg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VgkS5FGgpCwWMxJG0Sew4DsdWrYrYripRpL1FdjxWKy27Vbrh498z42S3HMqBm2MlseOZ2DMevzcAH0zwTUVuQxJIfRLlHCau8lmSK5_LYNImTSNQeFxOJmY6rS424OMaC4OI8fAZHnAxxvL9oul4q-xQS6UzrR_Aw0KpPO3RWusdFSZnk-xcDO4W61uMdVJPtCmnK1iXyrKe0Y_ZnoZrMwQ86ebDk_OjYz7zlR8M7TFhsJS0kOWrHCz_NkfjsnS6_X8f9BR27_B94mK9cj2DDZw_hyd_URPuQPu1u8G2iTkfqEIgY6xi4dcVUwXRJCkYjBR5nu1M2IHcRCyCYN7X7yh-XnsU9k78IlLZCj6T-omax1lMKNb-FuTxd7Pl7S58Oz25PD5LhvwMyTVNC0ueSCuvQoOOhGqxzFylsPBBykq5Ihgvta80yrywKWlBiWTsBellKALVuUy-gM35Yo4vQRSyxMxY8r3IXLDGOItaBacLL60vbDWCHR69-qan4KiHgRvB-5UcavotONZh57jobuuMWX3YmZQj2OsFtH54JcxX97_0HTw-uzwf1-PPky-vYYs1od9z2YfNZdvhG3jU_KDhbd9G_fsDFA3UJQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+Annual+International+Conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society&rft.atitle=Supercomputing+enabling+exhaustive+statistical+analysis+of+genome+wide+association+study+data%3A+Preliminary+results&rft.au=Reumann%2C+M.&rft.au=Makalic%2C+E.&rft.au=Goudey%2C+B.+W.&rft.au=Inouye%2C+M.&rft.date=2012-01-01&rft.pub=IEEE&rft.isbn=9781424441198&rft.issn=1094-687X&rft.spage=1258&rft.epage=1261&rft_id=info:doi/10.1109%2FEMBC.2012.6346166&rft_id=info%3Apmid%2F23366127&rft.externalDocID=6346166
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1094-687X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1094-687X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1094-687X&client=summon