Fast numerical optimization for genome sequencing data in population biobanks
Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics (Oxford, England) Jg. 37; H. 22; S. 4148 - 4155 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
Oxford University Press
18.11.2021
Oxford Publishing Limited (England) |
| Schlagworte: | |
| ISSN: | 1367-4803, 1367-4811, 1367-4811 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Abstract
Motivation
Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.
Results
We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory.
Availability and implementation
https://github.com/rivas-lab/snpnet/tree/compact. |
|---|---|
| AbstractList | Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. Availability and implementation https://github.com/rivas-lab/snpnet/tree/compact. Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. Availability and implementation https://github.com/rivas-lab/snpnet/tree/compact. Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. https://github.com/rivas-lab/snpnet/tree/compact. Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.MOTIVATIONLarge-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory.RESULTSWe develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory.https://github.com/rivas-lab/snpnet/tree/compact.AVAILABILITY AND IMPLEMENTATIONhttps://github.com/rivas-lab/snpnet/tree/compact. |
| Author | Hastie, Trevor Rivas, Manuel A Li, Ruilin Chang, Christopher Tibshirani, Robert Tanigawa, Yosuke Narasimhan, Balasubramanian |
| Author_xml | – sequence: 1 givenname: Ruilin orcidid: 0000-0002-5152-7086 surname: Li fullname: Li, Ruilin email: ruilinli@stanford.edu – sequence: 2 givenname: Christopher surname: Chang fullname: Chang, Christopher – sequence: 3 givenname: Yosuke orcidid: 0000-0001-9759-157X surname: Tanigawa fullname: Tanigawa, Yosuke – sequence: 4 givenname: Balasubramanian surname: Narasimhan fullname: Narasimhan, Balasubramanian – sequence: 5 givenname: Trevor surname: Hastie fullname: Hastie, Trevor – sequence: 6 givenname: Robert surname: Tibshirani fullname: Tibshirani, Robert – sequence: 7 givenname: Manuel A orcidid: 0000-0003-1457-9925 surname: Rivas fullname: Rivas, Manuel A email: mrivas@stanford.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34146108$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkUtLxTAQhYMovv_CpeDGzdWkSXsbEEHEF1xxo-uQJtNrtE1qkgr6643eB-pGVxnId86cmdlB69ZZQGhE8BHBnB7XxhnbON_JaFQ4rqOsWZGvoW1Cy8mYVYSsr2pMt9BOCE8Y4wIX5SbaooywkuBqG91eyhAzO3TgjZJt5vpoOvOeXJ3Nkn82A-s6yAK8DGCVsbNMyygzY7Pe9UM7B1OaWtrnsIc2GtkG2F-8u-jh8uL-_Ho8vbu6OT-bjhWrWBxzppqCVzlvGsk0xnWuJeWAoa40IVzrQpUNAGWaQSUbRRnDE2Dwyeoyjb-LTue-_VB3oBXY6GUrem866d-Ek0b8_LHmUczcq-A5LgtOksHhwsC7NFiIojNBQdtKC24IIi9YakomnCb04Bf65AZv03iCkrzChPF8kqjR90SrKMtNJ-BkDijvQvDQCGXi1_ZSQNMKgsXnYcXPw4rFYZO8_CVfdvhTSOZCN_T_1XwAKv7Dqg |
| CitedBy_id | crossref_primary_10_1007_s11030_024_10947_0 crossref_primary_10_1016_j_ajhg_2024_09_008 crossref_primary_10_1038_s41467_024_48654_x crossref_primary_10_1093_bioinformatics_btaf067 crossref_primary_10_1371_journal_pgen_1010105 crossref_primary_10_1016_j_ajhg_2022_08_003 crossref_primary_10_1016_j_ajhg_2023_09_013 crossref_primary_10_1016_j_jrras_2025_101685 crossref_primary_10_1016_j_jpdc_2024_104989 crossref_primary_10_1093_nar_gkad373 crossref_primary_10_3389_fgene_2023_1213907 |
| Cites_doi | 10.1093/bioinformatics/btaa1029 10.1111/j.1467-9868.2005.00532.x 10.1371/journal.pgen.1009141 10.1016/j.ajhg.2019.07.001 10.1371/journal.pmed.1001779 10.1038/s41467-019-09718-5 10.18637/jss.v039.i05 10.1038/ng.3190 10.1038/s41588-020-00757-z 10.1002/cpa.20042 10.1038/s41467-019-12653-0 10.1186/s13742-015-0047-8 10.1111/j.2517-6161.1972.tb00899.x 10.1038/s41467-018-03910-9 10.18637/jss.v033.i01 10.1111/j.2517-6161.1996.tb02080.x 10.1137/080716542 10.1111/j.1467-9868.2005.00503.x |
| ContentType | Journal Article |
| Copyright | The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com |
| Copyright_xml | – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021 – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM |
| DOI | 10.1093/bioinformatics/btab452 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | Materials Research Database MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| EndPage | 4155 |
| ExternalDocumentID | PMC9206591 34146108 10_1093_bioinformatics_btab452 10.1093/bioinformatics/btab452 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: R01 HG010140 – fundername: NHGRI NIH HHS grantid: U01 HG009080 – fundername: NIBIB NIH HHS grantid: R01 EB001988 – fundername: ; – fundername: ; grantid: R01HG010140 – fundername: ; grantid: 19 DMS1208164; DMS-1407548 – fundername: ; grantid: 5R01 EB001988-16 – fundername: ; grantid: 5R01 EB 001988-21 – fundername: ; grantid: 24983 – fundername: ; grantid: 5U01 HG009080 |
| GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX CITATION ROX CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM |
| ID | FETCH-LOGICAL-c484t-94cf59829ffa4d00b2da39e0eb8d119dd5c6fee34d4e8afc34407e4e4d00d6093 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000733835900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Thu Aug 21 14:10:15 EDT 2025 Thu Jul 10 23:01:22 EDT 2025 Mon Oct 06 17:31:09 EDT 2025 Thu Apr 03 07:07:24 EDT 2025 Sat Nov 29 03:49:21 EST 2025 Tue Nov 18 21:59:24 EST 2025 Wed Apr 02 07:04:06 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 22 |
| Language | English |
| License | This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c484t-94cf59829ffa4d00b2da39e0eb8d119dd5c6fee34d4e8afc34407e4e4d00d6093 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-5152-7086 0000-0001-9759-157X 0000-0003-1457-9925 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/9206591 |
| PMID | 34146108 |
| PQID | 3128014927 |
| PQPubID | 36124 |
| PageCount | 8 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9206591 proquest_miscellaneous_2543441793 proquest_journals_3128014927 pubmed_primary_34146108 crossref_citationtrail_10_1093_bioinformatics_btab452 crossref_primary_10_1093_bioinformatics_btab452 oup_primary_10_1093_bioinformatics_btab452 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-11-18 |
| PublicationDateYYYYMMDD | 2021-11-18 |
| PublicationDate_xml | – month: 11 year: 2021 text: 2021-11-18 day: 18 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England – name: Oxford |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2021 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Privé (2023051701223885200_btab452-B17) 2020; 36 Ge (2023051701223885200_btab452-B9) 2019; 10 Beck (2023051701223885200_btab452-B2) 2009; 2 Tibshirani (2023051701223885200_btab452-B23) 1996; 58 Nesterov (2023051701223885200_btab452-B15) 1983; 269 Chang (2023051701223885200_btab452-B4) 2015; 4 DeBoever (2023051701223885200_btab452-B7) 2018; 9 Szustakowski (2023051701223885200_btab452-B22) 2020 Morton (2023051701223885200_btab452-B14) 1966 Privé (2023051701223885200_btab452-B16) 2018; 34 Yuan (2023051701223885200_btab452-B25) 2006; 68 Zou (2023051701223885200_btab452-B26) 2005; 67 Simon (2023051701223885200_btab452-B19) 2011; 39 Aguirre (2023051701223885200_btab452-B1) 2019; 105 Daubechies (2023051701223885200_btab452-B6) 2004; 57 Hastie (2023051701223885200_btab452-B10) 1986; 1 Loh (2023051701223885200_btab452-B13) 2015; 47 Buluç (2023051701223885200_btab452-B3) 2009 Sinnott-Armstrong (2023051701223885200_btab452-B20) 2021; 53 Sudlow (2023051701223885200_btab452-B21) 2015; 12 Cox (2023051701223885200_btab452-B5) 1972; 34 Lloyd-Jones (2023051701223885200_btab452-B12) 2019; 10 Friedman (2023051701223885200_btab452-B8) 2010; 33 Li (2023051701223885200_btab452-B11) 2020 Qian (2023051701223885200_btab452-B18) 2020; 16 Venkataraman (2023051701223885200_btab452-B24) 2020 |
| References_xml | – volume: 36 start-page: 5424 year: 2020 ident: 2023051701223885200_btab452-B17 article-title: LDpred2: better, faster, stronger publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa1029 – volume: 68 start-page: 49 year: 2006 ident: 2023051701223885200_btab452-B25 article-title: Model selection and estimation in regression with grouped variables publication-title: J. R. Stat. Soc. Series B doi: 10.1111/j.1467-9868.2005.00532.x – volume: 16 start-page: e1009141 year: 2020 ident: 2023051701223885200_btab452-B18 article-title: A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK biobank publication-title: PLoS Genet doi: 10.1371/journal.pgen.1009141 – volume: 105 start-page: 373 year: 2019 ident: 2023051701223885200_btab452-B1 article-title: Phenome-wide burden of copy-number variation in the UK biobank publication-title: Am. J. Hum. Genet doi: 10.1016/j.ajhg.2019.07.001 – year: 2020 ident: 2023051701223885200_btab452-B11 article-title: Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank publication-title: Biostatistics. – volume: 12 start-page: e1001779 year: 2015 ident: 2023051701223885200_btab452-B21 article-title: UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age publication-title: PLoS Medicine doi: 10.1371/journal.pmed.1001779 – volume: 10 start-page: 1776 year: 2019 ident: 2023051701223885200_btab452-B9 article-title: Polygenic prediction via Bayesian regression and continuous shrinkage priors publication-title: Nat. Commun doi: 10.1038/s41467-019-09718-5 – volume: 39 start-page: 1 year: 2011 ident: 2023051701223885200_btab452-B19 article-title: Regularization paths for cox’s proportional hazards model via coordinate descent publication-title: J. Stat. Software doi: 10.18637/jss.v039.i05 – start-page: 233 year: 2009 ident: 2023051701223885200_btab452-B3 – volume: 1 start-page: 297 year: 1986 ident: 2023051701223885200_btab452-B10 article-title: Generalized additive models publication-title: Stat. Sci – volume: 47 start-page: 284 year: 2015 ident: 2023051701223885200_btab452-B13 article-title: Efficient Bayesian mixed-model analysis increases association power in large cohorts publication-title: Nat. Genet doi: 10.1038/ng.3190 – volume: 53 start-page: 185 year: 2021 ident: 2023051701223885200_btab452-B20 article-title: Genetics of 38 blood and urine biomarkers in the UK biobank publication-title: Nat. Genet doi: 10.1038/s41588-020-00757-z – year: 2020 ident: 2023051701223885200_btab452-B22 – volume: 57 start-page: 1413 year: 2004 ident: 2023051701223885200_btab452-B6 article-title: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint publication-title: Commun. Pure Appl. Math doi: 10.1002/cpa.20042 – volume: 269 start-page: 543 year: 1983 ident: 2023051701223885200_btab452-B15 article-title: A method for solving the convex programming problem with convergence rate publication-title: Proc. USSR Acad. Sci – volume: 10 start-page: 5086 year: 2019 ident: 2023051701223885200_btab452-B12 article-title: Improved polygenic prediction by Bayesian multiple regression on summary statistics publication-title: Nat. Commun doi: 10.1038/s41467-019-12653-0 – volume: 4 year: 2015 ident: 2023051701223885200_btab452-B4 article-title: Second-generation plink: rising to the challenge of larger and richer datasets publication-title: GigaScience doi: 10.1186/s13742-015-0047-8 – volume: 34 start-page: 187 year: 1972 ident: 2023051701223885200_btab452-B5 article-title: Regression models and life-tables publication-title: J. R. Stat. Soc. Series B doi: 10.1111/j.2517-6161.1972.tb00899.x – volume: 9 start-page: 1 year: 2018 ident: 2023051701223885200_btab452-B7 article-title: Medical relevance of protein-truncating variants across 337,205 individuals in the UK biobank study publication-title: Nat. Commun doi: 10.1038/s41467-018-03910-9 – volume: 33 start-page: 1 year: 2010 ident: 2023051701223885200_btab452-B8 article-title: Regularization paths for generalized linear models via coordinate descent publication-title: J. Stat. Software doi: 10.18637/jss.v033.i01 – volume: 34 start-page: 2781 year: 2018 ident: 2023051701223885200_btab452-B16 article-title: Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr publication-title: Bioinformatics (Oxford, England) – volume-title: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing year: 1966 ident: 2023051701223885200_btab452-B14 – volume: 58 start-page: 267 year: 1996 ident: 2023051701223885200_btab452-B23 article-title: Regression shrinkage and selection via the Lasso publication-title: J. R. Stat. Soc. Series B (Methodological) doi: 10.1111/j.2517-6161.1996.tb02080.x – volume: 2 start-page: 183 year: 2009 ident: 2023051701223885200_btab452-B2 article-title: A fast iterative shrinkage-thresholding algorithm for linear inverse problems publication-title: SIAM J. Img. Sci doi: 10.1137/080716542 – volume: 67 start-page: 301 year: 2005 ident: 2023051701223885200_btab452-B26 article-title: Regularization and variable selection via the elastic net publication-title: J. R. Stat. Soc. Series B (Statistical Methodology) doi: 10.1111/j.1467-9868.2005.00503.x – year: 2020 ident: 2023051701223885200_btab452-B24 |
| SSID | ssj0005056 |
| Score | 2.4598014 |
| Snippet | Abstract
Motivation
Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not... Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of... Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal... |
| SourceID | pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 4148 |
| SubjectTerms | Algorithms Biological Specimen Banks Chromosome Mapping Cognitive ability Computer applications Floating point arithmetic Gene sequencing Genetic diversity Genetic variance Genome Genomes Humans Least-Squares Analysis Optimization Original Papers Representations Software Solvers Sparse matrices Sparsity Whole genome sequencing |
| Title | Fast numerical optimization for genome sequencing data in population biobanks |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34146108 https://www.proquest.com/docview/3128014927 https://www.proquest.com/docview/2543441793 https://pubmed.ncbi.nlm.nih.gov/PMC9206591 |
| Volume | 37 |
| WOSCitedRecordID | wos000733835900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZS8QwEB50UfDF-6gXEXwSyjZNuk0fRVx88XhQ2LeSJikurt1l2xX892babt0K4vGcyX3MhJn5PoBza9JLqwmU6_uJcDmT3BWp4a6mvmRKCKvURUk2Ed7dicEgelgCOs-F-erCj1g3GY5rEFEELu4mhUx4gK8uDQRyFjzeDz6DOrySrxVxyFwuPDbPCf62mZY6aqW4LViaXwMmFzRQf-MfY9-E9drcJJfV-diCJZNtw2pFQPm-A7d9mRckm1VumxEZ2wfktc7MJLZBghCur4bUAddWzREMKSXDjEwa5i9iu09k9pLvwlP_-vHqxq0JFlzFBS_ciKsUAfyiNJVce17ia8ki45lEaEojrQPVS41hXHMjZKoYt98_ww3K6p6d3x50snFmDoAk3NNRT2kaGB9JrCULqbayFCPoJJUOBPN1jlWNPo4kGKO48oKzuL1Ucb1UDnSbepMKf-PHGhd2G38tfDzf7bi-vHnMrM7Gn6MfOnDWFNtrh74UmZnxLI8RQ6Bkb2MO7FeHo-nSGgaIYi8cCFvHphFASO92STZ8LqG9Ix_93PTwL3M4gjUfA20wNlEcQ6eYzswJrKi3YphPT2E5HIjT8pZ8AN_OGw4 |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+numerical+optimization+for+genome+sequencing+data+in+population+biobanks&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Li%2C+Ruilin&rft.au=Chang%2C+Christopher&rft.au=Tanigawa%2C+Yosuke&rft.au=Narasimhan%2C+Balasubramanian&rft.date=2021-11-18&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=37&rft.issue=22&rft.spage=4148&rft.epage=4155&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab452&rft.externalDocID=10.1093%2Fbioinformatics%2Fbtab452 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |