Fast numerical optimization for genome sequencing data in population biobanks

Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) Jg. 37; H. 22; S. 4148 - 4155
Hauptverfasser: Li, Ruilin, Chang, Christopher, Tanigawa, Yosuke, Narasimhan, Balasubramanian, Hastie, Trevor, Tibshirani, Robert, Rivas, Manuel A
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 18.11.2021
Oxford Publishing Limited (England)
Schlagworte:
ISSN:1367-4803, 1367-4811, 1367-4811
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. Availability and implementation https://github.com/rivas-lab/snpnet/tree/compact.
AbstractList Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. Availability and implementation https://github.com/rivas-lab/snpnet/tree/compact.
Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. Availability and implementation https://github.com/rivas-lab/snpnet/tree/compact.
Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory. https://github.com/rivas-lab/snpnet/tree/compact.
Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.MOTIVATIONLarge-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory.RESULTSWe develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0,1,2,NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact two-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1 000 000 variants and almost 100 000 individuals within 10 min and using less than 32GB of memory.https://github.com/rivas-lab/snpnet/tree/compact.AVAILABILITY AND IMPLEMENTATIONhttps://github.com/rivas-lab/snpnet/tree/compact.
Author Hastie, Trevor
Rivas, Manuel A
Li, Ruilin
Chang, Christopher
Tibshirani, Robert
Tanigawa, Yosuke
Narasimhan, Balasubramanian
Author_xml – sequence: 1
  givenname: Ruilin
  orcidid: 0000-0002-5152-7086
  surname: Li
  fullname: Li, Ruilin
  email: ruilinli@stanford.edu
– sequence: 2
  givenname: Christopher
  surname: Chang
  fullname: Chang, Christopher
– sequence: 3
  givenname: Yosuke
  orcidid: 0000-0001-9759-157X
  surname: Tanigawa
  fullname: Tanigawa, Yosuke
– sequence: 4
  givenname: Balasubramanian
  surname: Narasimhan
  fullname: Narasimhan, Balasubramanian
– sequence: 5
  givenname: Trevor
  surname: Hastie
  fullname: Hastie, Trevor
– sequence: 6
  givenname: Robert
  surname: Tibshirani
  fullname: Tibshirani, Robert
– sequence: 7
  givenname: Manuel A
  orcidid: 0000-0003-1457-9925
  surname: Rivas
  fullname: Rivas, Manuel A
  email: mrivas@stanford.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34146108$$D View this record in MEDLINE/PubMed
BookMark eNqNkUtLxTAQhYMovv_CpeDGzdWkSXsbEEHEF1xxo-uQJtNrtE1qkgr6643eB-pGVxnId86cmdlB69ZZQGhE8BHBnB7XxhnbON_JaFQ4rqOsWZGvoW1Cy8mYVYSsr2pMt9BOCE8Y4wIX5SbaooywkuBqG91eyhAzO3TgjZJt5vpoOvOeXJ3Nkn82A-s6yAK8DGCVsbNMyygzY7Pe9UM7B1OaWtrnsIc2GtkG2F-8u-jh8uL-_Ho8vbu6OT-bjhWrWBxzppqCVzlvGsk0xnWuJeWAoa40IVzrQpUNAGWaQSUbRRnDE2Dwyeoyjb-LTue-_VB3oBXY6GUrem866d-Ek0b8_LHmUczcq-A5LgtOksHhwsC7NFiIojNBQdtKC24IIi9YakomnCb04Bf65AZv03iCkrzChPF8kqjR90SrKMtNJ-BkDijvQvDQCGXi1_ZSQNMKgsXnYcXPw4rFYZO8_CVfdvhTSOZCN_T_1XwAKv7Dqg
CitedBy_id crossref_primary_10_1007_s11030_024_10947_0
crossref_primary_10_1016_j_ajhg_2024_09_008
crossref_primary_10_1038_s41467_024_48654_x
crossref_primary_10_1093_bioinformatics_btaf067
crossref_primary_10_1371_journal_pgen_1010105
crossref_primary_10_1016_j_ajhg_2022_08_003
crossref_primary_10_1016_j_ajhg_2023_09_013
crossref_primary_10_1016_j_jrras_2025_101685
crossref_primary_10_1016_j_jpdc_2024_104989
crossref_primary_10_1093_nar_gkad373
crossref_primary_10_3389_fgene_2023_1213907
Cites_doi 10.1093/bioinformatics/btaa1029
10.1111/j.1467-9868.2005.00532.x
10.1371/journal.pgen.1009141
10.1016/j.ajhg.2019.07.001
10.1371/journal.pmed.1001779
10.1038/s41467-019-09718-5
10.18637/jss.v039.i05
10.1038/ng.3190
10.1038/s41588-020-00757-z
10.1002/cpa.20042
10.1038/s41467-019-12653-0
10.1186/s13742-015-0047-8
10.1111/j.2517-6161.1972.tb00899.x
10.1038/s41467-018-03910-9
10.18637/jss.v033.i01
10.1111/j.2517-6161.1996.tb02080.x
10.1137/080716542
10.1111/j.1467-9868.2005.00503.x
ContentType Journal Article
Copyright The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021
The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Copyright_xml – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021
– notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
– notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
DOI 10.1093/bioinformatics/btab452
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList Materials Research Database

MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
EndPage 4155
ExternalDocumentID PMC9206591
34146108
10_1093_bioinformatics_btab452
10.1093/bioinformatics/btab452
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R01 HG010140
– fundername: NHGRI NIH HHS
  grantid: U01 HG009080
– fundername: NIBIB NIH HHS
  grantid: R01 EB001988
– fundername: ;
– fundername: ;
  grantid: R01HG010140
– fundername: ;
  grantid: 19 DMS1208164; DMS-1407548
– fundername: ;
  grantid: 5R01 EB001988-16
– fundername: ;
  grantid: 5R01 EB 001988-21
– fundername: ;
  grantid: 24983
– fundername: ;
  grantid: 5U01 HG009080
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
ROX
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
ID FETCH-LOGICAL-c484t-94cf59829ffa4d00b2da39e0eb8d119dd5c6fee34d4e8afc34407e4e4d00d6093
IEDL.DBID TOX
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000733835900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 14:10:15 EDT 2025
Thu Jul 10 23:01:22 EDT 2025
Mon Oct 06 17:31:09 EDT 2025
Thu Apr 03 07:07:24 EDT 2025
Sat Nov 29 03:49:21 EST 2025
Tue Nov 18 21:59:24 EST 2025
Wed Apr 02 07:04:06 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 22
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c484t-94cf59829ffa4d00b2da39e0eb8d119dd5c6fee34d4e8afc34407e4e4d00d6093
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-5152-7086
0000-0001-9759-157X
0000-0003-1457-9925
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/9206591
PMID 34146108
PQID 3128014927
PQPubID 36124
PageCount 8
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9206591
proquest_miscellaneous_2543441793
proquest_journals_3128014927
pubmed_primary_34146108
crossref_citationtrail_10_1093_bioinformatics_btab452
crossref_primary_10_1093_bioinformatics_btab452
oup_primary_10_1093_bioinformatics_btab452
PublicationCentury 2000
PublicationDate 2021-11-18
PublicationDateYYYYMMDD 2021-11-18
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-11-18
  day: 18
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2021
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Privé (2023051701223885200_btab452-B17) 2020; 36
Ge (2023051701223885200_btab452-B9) 2019; 10
Beck (2023051701223885200_btab452-B2) 2009; 2
Tibshirani (2023051701223885200_btab452-B23) 1996; 58
Nesterov (2023051701223885200_btab452-B15) 1983; 269
Chang (2023051701223885200_btab452-B4) 2015; 4
DeBoever (2023051701223885200_btab452-B7) 2018; 9
Szustakowski (2023051701223885200_btab452-B22) 2020
Morton (2023051701223885200_btab452-B14) 1966
Privé (2023051701223885200_btab452-B16) 2018; 34
Yuan (2023051701223885200_btab452-B25) 2006; 68
Zou (2023051701223885200_btab452-B26) 2005; 67
Simon (2023051701223885200_btab452-B19) 2011; 39
Aguirre (2023051701223885200_btab452-B1) 2019; 105
Daubechies (2023051701223885200_btab452-B6) 2004; 57
Hastie (2023051701223885200_btab452-B10) 1986; 1
Loh (2023051701223885200_btab452-B13) 2015; 47
Buluç (2023051701223885200_btab452-B3) 2009
Sinnott-Armstrong (2023051701223885200_btab452-B20) 2021; 53
Sudlow (2023051701223885200_btab452-B21) 2015; 12
Cox (2023051701223885200_btab452-B5) 1972; 34
Lloyd-Jones (2023051701223885200_btab452-B12) 2019; 10
Friedman (2023051701223885200_btab452-B8) 2010; 33
Li (2023051701223885200_btab452-B11) 2020
Qian (2023051701223885200_btab452-B18) 2020; 16
Venkataraman (2023051701223885200_btab452-B24) 2020
References_xml – volume: 36
  start-page: 5424
  year: 2020
  ident: 2023051701223885200_btab452-B17
  article-title: LDpred2: better, faster, stronger
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa1029
– volume: 68
  start-page: 49
  year: 2006
  ident: 2023051701223885200_btab452-B25
  article-title: Model selection and estimation in regression with grouped variables
  publication-title: J. R. Stat. Soc. Series B
  doi: 10.1111/j.1467-9868.2005.00532.x
– volume: 16
  start-page: e1009141
  year: 2020
  ident: 2023051701223885200_btab452-B18
  article-title: A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK biobank
  publication-title: PLoS Genet
  doi: 10.1371/journal.pgen.1009141
– volume: 105
  start-page: 373
  year: 2019
  ident: 2023051701223885200_btab452-B1
  article-title: Phenome-wide burden of copy-number variation in the UK biobank
  publication-title: Am. J. Hum. Genet
  doi: 10.1016/j.ajhg.2019.07.001
– year: 2020
  ident: 2023051701223885200_btab452-B11
  article-title: Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank
  publication-title: Biostatistics.
– volume: 12
  start-page: e1001779
  year: 2015
  ident: 2023051701223885200_btab452-B21
  article-title: UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age
  publication-title: PLoS Medicine
  doi: 10.1371/journal.pmed.1001779
– volume: 10
  start-page: 1776
  year: 2019
  ident: 2023051701223885200_btab452-B9
  article-title: Polygenic prediction via Bayesian regression and continuous shrinkage priors
  publication-title: Nat. Commun
  doi: 10.1038/s41467-019-09718-5
– volume: 39
  start-page: 1
  year: 2011
  ident: 2023051701223885200_btab452-B19
  article-title: Regularization paths for cox’s proportional hazards model via coordinate descent
  publication-title: J. Stat. Software
  doi: 10.18637/jss.v039.i05
– start-page: 233
  year: 2009
  ident: 2023051701223885200_btab452-B3
– volume: 1
  start-page: 297
  year: 1986
  ident: 2023051701223885200_btab452-B10
  article-title: Generalized additive models
  publication-title: Stat. Sci
– volume: 47
  start-page: 284
  year: 2015
  ident: 2023051701223885200_btab452-B13
  article-title: Efficient Bayesian mixed-model analysis increases association power in large cohorts
  publication-title: Nat. Genet
  doi: 10.1038/ng.3190
– volume: 53
  start-page: 185
  year: 2021
  ident: 2023051701223885200_btab452-B20
  article-title: Genetics of 38 blood and urine biomarkers in the UK biobank
  publication-title: Nat. Genet
  doi: 10.1038/s41588-020-00757-z
– year: 2020
  ident: 2023051701223885200_btab452-B22
– volume: 57
  start-page: 1413
  year: 2004
  ident: 2023051701223885200_btab452-B6
  article-title: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
  publication-title: Commun. Pure Appl. Math
  doi: 10.1002/cpa.20042
– volume: 269
  start-page: 543
  year: 1983
  ident: 2023051701223885200_btab452-B15
  article-title: A method for solving the convex programming problem with convergence rate
  publication-title: Proc. USSR Acad. Sci
– volume: 10
  start-page: 5086
  year: 2019
  ident: 2023051701223885200_btab452-B12
  article-title: Improved polygenic prediction by Bayesian multiple regression on summary statistics
  publication-title: Nat. Commun
  doi: 10.1038/s41467-019-12653-0
– volume: 4
  year: 2015
  ident: 2023051701223885200_btab452-B4
  article-title: Second-generation plink: rising to the challenge of larger and richer datasets
  publication-title: GigaScience
  doi: 10.1186/s13742-015-0047-8
– volume: 34
  start-page: 187
  year: 1972
  ident: 2023051701223885200_btab452-B5
  article-title: Regression models and life-tables
  publication-title: J. R. Stat. Soc. Series B
  doi: 10.1111/j.2517-6161.1972.tb00899.x
– volume: 9
  start-page: 1
  year: 2018
  ident: 2023051701223885200_btab452-B7
  article-title: Medical relevance of protein-truncating variants across 337,205 individuals in the UK biobank study
  publication-title: Nat. Commun
  doi: 10.1038/s41467-018-03910-9
– volume: 33
  start-page: 1
  year: 2010
  ident: 2023051701223885200_btab452-B8
  article-title: Regularization paths for generalized linear models via coordinate descent
  publication-title: J. Stat. Software
  doi: 10.18637/jss.v033.i01
– volume: 34
  start-page: 2781
  year: 2018
  ident: 2023051701223885200_btab452-B16
  article-title: Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr
  publication-title: Bioinformatics (Oxford, England)
– volume-title: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing
  year: 1966
  ident: 2023051701223885200_btab452-B14
– volume: 58
  start-page: 267
  year: 1996
  ident: 2023051701223885200_btab452-B23
  article-title: Regression shrinkage and selection via the Lasso
  publication-title: J. R. Stat. Soc. Series B (Methodological)
  doi: 10.1111/j.2517-6161.1996.tb02080.x
– volume: 2
  start-page: 183
  year: 2009
  ident: 2023051701223885200_btab452-B2
  article-title: A fast iterative shrinkage-thresholding algorithm for linear inverse problems
  publication-title: SIAM J. Img. Sci
  doi: 10.1137/080716542
– volume: 67
  start-page: 301
  year: 2005
  ident: 2023051701223885200_btab452-B26
  article-title: Regularization and variable selection via the elastic net
  publication-title: J. R. Stat. Soc. Series B (Statistical Methodology)
  doi: 10.1111/j.1467-9868.2005.00503.x
– year: 2020
  ident: 2023051701223885200_btab452-B24
SSID ssj0005056
Score 2.4598014
Snippet Abstract Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not...
Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal in terms of...
Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General-purpose optimization tools are usually not optimal...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 4148
SubjectTerms Algorithms
Biological Specimen Banks
Chromosome Mapping
Cognitive ability
Computer applications
Floating point arithmetic
Gene sequencing
Genetic diversity
Genetic variance
Genome
Genomes
Humans
Least-Squares Analysis
Optimization
Original Papers
Representations
Software
Solvers
Sparse matrices
Sparsity
Whole genome sequencing
Title Fast numerical optimization for genome sequencing data in population biobanks
URI https://www.ncbi.nlm.nih.gov/pubmed/34146108
https://www.proquest.com/docview/3128014927
https://www.proquest.com/docview/2543441793
https://pubmed.ncbi.nlm.nih.gov/PMC9206591
Volume 37
WOSCitedRecordID wos000733835900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZS8QwEB50UfDF-6gXEXwSyjZNuk0fRVx88XhQ2LeSJikurt1l2xX892babt0K4vGcyX3MhJn5PoBza9JLqwmU6_uJcDmT3BWp4a6mvmRKCKvURUk2Ed7dicEgelgCOs-F-erCj1g3GY5rEFEELu4mhUx4gK8uDQRyFjzeDz6DOrySrxVxyFwuPDbPCf62mZY6aqW4LViaXwMmFzRQf-MfY9-E9drcJJfV-diCJZNtw2pFQPm-A7d9mRckm1VumxEZ2wfktc7MJLZBghCur4bUAddWzREMKSXDjEwa5i9iu09k9pLvwlP_-vHqxq0JFlzFBS_ciKsUAfyiNJVce17ia8ki45lEaEojrQPVS41hXHMjZKoYt98_ww3K6p6d3x50snFmDoAk3NNRT2kaGB9JrCULqbayFCPoJJUOBPN1jlWNPo4kGKO48oKzuL1Ucb1UDnSbepMKf-PHGhd2G38tfDzf7bi-vHnMrM7Gn6MfOnDWFNtrh74UmZnxLI8RQ6Bkb2MO7FeHo-nSGgaIYi8cCFvHphFASO92STZ8LqG9Ix_93PTwL3M4gjUfA20wNlEcQ6eYzswJrKi3YphPT2E5HIjT8pZ8AN_OGw4
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+numerical+optimization+for+genome+sequencing+data+in+population+biobanks&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Li%2C+Ruilin&rft.au=Chang%2C+Christopher&rft.au=Tanigawa%2C+Yosuke&rft.au=Narasimhan%2C+Balasubramanian&rft.date=2021-11-18&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=37&rft.issue=22&rft.spage=4148&rft.epage=4155&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab452&rft.externalDocID=10.1093%2Fbioinformatics%2Fbtab452
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon