Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results
Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and patho...
Uloženo v:
| Vydáno v: | 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society Ročník 2012; s. 1258 - 1261 |
|---|---|
| Hlavní autoři: | , , , , , , , , , , , , , |
| Médium: | Konferenční příspěvek Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
IEEE
01.01.2012
|
| Témata: | |
| ISBN: | 1424441196, 9781424441198 |
| ISSN: | 1094-687X, 1557-170X, 2694-0604, 2694-0604 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics. |
|---|---|
| AbstractList | Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics. Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics.Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to utilize supercomputing resources to apply complex statistical techniques to the world's accumulating GWAS, epidemiology, survival and pathology data to uncover more information about genetic and environmental risk, biology and aetiology. We performed the Bayesian Posterior Probability test on a pseudo data set with 500,000 single nucleotide polymorphism and 100 samples as proof of principle. We carried out strong scaling simulations on 2 to 4,096 processing cores with factor 2 increments in partition size. On two processing cores, the run time is 317h, i.e. almost two weeks, compared to less than 10 minutes on 4,096 processing cores. The speedup factor is 2,020 that is very close to the theoretical value of 2,048. This work demonstrates the feasibility of performing exhaustive higher order analysis of GWAS studies using independence testing for contingency tables. We are now in a position to employ supercomputers with hundreds of thousands of threads for higher order analysis of GWAS data using complex statistics. |
| Author | Schmidt, D. F. Makalic, E. Wagner, J. Goudey, B. W. Bickerstaffe, A. Bui, M. Qian, G. Kapuscinski, M. K. Zobel, J. Zhou, Z. Hopper, J. L. Reumann, M. Inouye, M. Park, D. J. |
| Author_xml | – sequence: 1 givenname: M. surname: Reumann fullname: Reumann, M. email: mreumann@ieee.org organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia – sequence: 2 givenname: E. surname: Makalic fullname: Makalic, E. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 3 givenname: B. W. surname: Goudey fullname: Goudey, B. W. organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia – sequence: 4 givenname: M. surname: Inouye fullname: Inouye, M. organization: Dept. of Pathology, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 5 givenname: A. surname: Bickerstaffe fullname: Bickerstaffe, A. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 6 givenname: M. surname: Bui fullname: Bui, M. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 7 givenname: D. J. surname: Park fullname: Park, D. J. – sequence: 8 givenname: M. K. surname: Kapuscinski fullname: Kapuscinski, M. K. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 9 givenname: D. F. surname: Schmidt fullname: Schmidt, D. F. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 10 givenname: Z. surname: Zhou fullname: Zhou, Z. organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia – sequence: 11 givenname: G. surname: Qian fullname: Qian, G. organization: Dept. of Math. & Stat., Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 12 givenname: J. surname: Zobel fullname: Zobel, J. organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 13 givenname: J. surname: Wagner fullname: Wagner, J. organization: IBM Res. Collaboratory for Life Sci.-Melbourne, Carlton, VIC, Australia – sequence: 14 givenname: J. L. surname: Hopper fullname: Hopper, J. L. organization: Melbourne Sch. of Population Health, Univ. of Melbourne, Melbourne, VIC, Australia |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/23366127$$D View this record in MEDLINE/PubMed |
| BookMark | eNo9kU9v1DAQxQ0totuyHwAhVT72ksVjO_7TW7sqLVIRSIDEbeXE49ZV4mzjBNhvj0W3ncs8zfz09DRzTA7TkJCQ98BWAMx-vPpyuV5xBnylhFSg1CuytNqArLUGbTS8JguurKyYYvKAHIPkUkoAqw7JohjIShn964gsc35gpQwYweRbcsSFUAq4XpDx-7zFsR367TzFdEcxuab7L_7euzlP8TfSPLkpFtm6jrrkul2OmQ6B3mEaeqR_okfqch7aWLghFX72O-rd5M7ptxG72Mfkxh0dMc_dlN-RN8F1GZf7fkJ-frr6sb6pbr9ef15f3FZRMJgqo5X1MrTYBMMcamisxNoHIaxs6mC8UN4qFLx2TDaNRlHzILwIdSizBsQJOXvy3Y7D44x52vQxt9h1LuEw5w1wIzQ31oqCnu7RuenRb7Zj7EvizfOdCvDhCYiI-LLef0X8A2VHfeg |
| ContentType | Conference Proceeding Journal Article |
| DBID | 6IE 6IH CBEJK RIE RIO CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1109/EMBC.2012.6346166 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE IEEE Proceedings Order Plans (POP) 1998-present Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781457717871 1457717875 |
| EISSN | 2694-0604 |
| EndPage | 1261 |
| ExternalDocumentID | 23366127 6346166 |
| Genre | orig-research Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | 6IE 6IF 6IH AAJGR ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIO RNS 29F 29G 6IK 6IM CGR CUY CVF ECM EIF IPLJI NPM 7X8 |
| ID | FETCH-LOGICAL-i301t-8769d4fcebf80ae71b94e5df3394b5f8d36d96e325a04bb7e352f3d3f5f325b13 |
| IEDL.DBID | RIE |
| ISBN | 1424441196 9781424441198 |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000313296501130&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1094-687X 1557-170X 2694-0604 |
| IngestDate | Thu Oct 02 20:28:34 EDT 2025 Thu Jan 02 22:16:24 EST 2025 Wed Aug 27 02:44:20 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i301t-8769d4fcebf80ae71b94e5df3394b5f8d36d96e325a04bb7e352f3d3f5f325b13 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 23366127 |
| PQID | 1283728993 |
| PQPubID | 23479 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_6346166 pubmed_primary_23366127 proquest_miscellaneous_1283728993 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-01-01 |
| PublicationDateYYYYMMDD | 2012-01-01 |
| PublicationDate_xml | – month: 01 year: 2012 text: 2012-01-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society |
| PublicationTitleAbbrev | EMBC |
| PublicationTitleAlternate | Conf Proc IEEE Eng Med Biol Soc |
| PublicationYear | 2012 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000818304 ssj0020051 ssj0061641 ssib061542107 ssib053545923 ssib042469959 |
| Score | 1.9222589 |
| Snippet | Most published GWAS do not examine SNP interactions due to the high computational complexity of computing p-values for the interaction terms. Our aim is to... |
| SourceID | proquest pubmed ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 1258 |
| SubjectTerms | Bayes Theorem Bayesian methods Bioinformatics Computational Biology - methods Computer Simulation Genome-Wide Association Study - methods Genomics Humans Monte Carlo Method Neoplasms - genetics Phenotype Polymorphism, Single Nucleotide Runtime |
| Title | Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results |
| URI | https://ieeexplore.ieee.org/document/6346166 https://www.ncbi.nlm.nih.gov/pubmed/23366127 https://www.proquest.com/docview/1283728993 |
| Volume | 2012 |
| WOSCitedRecordID | wos000313296501130&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9swDCaSYIf2skdf2SPQgB7nNrZsWd5xRYsdtiDAOiC3QLIoNEDqBE68dv9-pOykO3SH3WTBtmSRlkhR30eAc-1dWZDbEHlSnyi1FiNbuDhKUpdIr8fleByAwt_yyUTPZsW0B5_2WBhEDIfP8IKLIZbvVmXDW2WXSqYqVqoP_TxXLVZrv5_C1GySXYvO2WJtC5FO6ofS-WwH6krjuOXzY66n7lp34U66-fL6-5crPvGVXHStMV2wlLSMJbsMLP82RsOidPPy_z7nFRw_ofvEdL9uvYYeVm_g8C9iwiOofzRrrMuQ8YEqBDLCKhQe75goiKZIwVCkwPJslsJ01CZi5QWzvt6jeFg4FOZJ-CIQ2Qo-kfqZmsdlSCdW_xbk7zfL7eYYft5c3159jbrsDNGCJoUtT6OFS32JlkRqMI9tkWLmvJRFajOvnVSuUCiTzIxJB3IkU89LJ33mqc7G8gQG1arCMxCZzDHWhjwvMhaM1tagSr1VmZPGZaYYwhGP3nzdEnDMu4EbwsedHOb0U3Ckw1S4ajbzmDl92JWUQzhtBbR_eCfMt8-_9B0csPTbXZb3MNjWDX6AF-UvGtJ6RJo306OgeX8A8SPRxg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VgkS5FGgpCwWMxJG0Sew4DsdWrYrYripRpL1FdjxWKy27Vbrh498z42S3HMqBm2MlseOZ2DMevzcAH0zwTUVuQxJIfRLlHCau8lmSK5_LYNImTSNQeFxOJmY6rS424OMaC4OI8fAZHnAxxvL9oul4q-xQS6UzrR_Aw0KpPO3RWusdFSZnk-xcDO4W61uMdVJPtCmnK1iXyrKe0Y_ZnoZrMwQ86ebDk_OjYz7zlR8M7TFhsJS0kOWrHCz_NkfjsnS6_X8f9BR27_B94mK9cj2DDZw_hyd_URPuQPu1u8G2iTkfqEIgY6xi4dcVUwXRJCkYjBR5nu1M2IHcRCyCYN7X7yh-XnsU9k78IlLZCj6T-omax1lMKNb-FuTxd7Pl7S58Oz25PD5LhvwMyTVNC0ueSCuvQoOOhGqxzFylsPBBykq5Ihgvta80yrywKWlBiWTsBellKALVuUy-gM35Yo4vQRSyxMxY8r3IXLDGOItaBacLL60vbDWCHR69-qan4KiHgRvB-5UcavotONZh57jobuuMWX3YmZQj2OsFtH54JcxX97_0HTw-uzwf1-PPky-vYYs1od9z2YfNZdvhG3jU_KDhbd9G_fsDFA3UJQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+Annual+International+Conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society&rft.atitle=Supercomputing+enabling+exhaustive+statistical+analysis+of+genome+wide+association+study+data%3A+Preliminary+results&rft.au=Reumann%2C+M.&rft.au=Makalic%2C+E.&rft.au=Goudey%2C+B.+W.&rft.au=Inouye%2C+M.&rft.date=2012-01-01&rft.pub=IEEE&rft.isbn=9781424441198&rft.issn=1094-687X&rft.spage=1258&rft.epage=1261&rft_id=info:doi/10.1109%2FEMBC.2012.6346166&rft_id=info%3Apmid%2F23366127&rft.externalDocID=6346166 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1094-687X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1094-687X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1094-687X&client=summon |

