Secure discovery of genetic relatives across large-scale and distributed genomic data sets

Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challen...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Genome research Ročník 34; číslo 9; s. 1312
Hlavní autori: Hong, Matthew M, Froelicher, David, Magner, Ricky, Popic, Victoria, Berger, Bonnie, Cho, Hyunghoon
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 01.09.2024
Predmet:
ISSN:1549-5469, 1549-5469
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.
AbstractList Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.
Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.
Author Froelicher, David
Cho, Hyunghoon
Magner, Ricky
Popic, Victoria
Hong, Matthew M
Berger, Bonnie
Author_xml – sequence: 1
  givenname: Matthew M
  orcidid: 0009-0003-0969-0140
  surname: Hong
  fullname: Hong, Matthew M
  organization: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
– sequence: 2
  givenname: David
  surname: Froelicher
  fullname: Froelicher, David
  organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
– sequence: 3
  givenname: Ricky
  surname: Magner
  fullname: Magner, Ricky
  organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
– sequence: 4
  givenname: Victoria
  surname: Popic
  fullname: Popic, Victoria
  email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu
  organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA; vpopic@broadinstitute.org bab@mit.edu hoon.cho@yale.edu
– sequence: 5
  givenname: Bonnie
  surname: Berger
  fullname: Berger, Bonnie
  email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu
  organization: Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
– sequence: 6
  givenname: Hyunghoon
  orcidid: 0000-0002-2713-0150
  surname: Cho
  fullname: Cho, Hyunghoon
  email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu
  organization: Department of Biomedical Informatics and Data Science, Yale University, New Haven, Connecticut 06510, USA vpopic@broadinstitute.org bab@mit.edu hoon.cho@yale.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39111815$$D View this record in MEDLINE/PubMed
BookMark eNpNkLtPwzAYxC1URB8wsiKPLCn-_Eo8ooqXVIkBWFgix_5SBaVJsZ1K_e9JRZGY7ob7nXQ3J5Ou75CQa2BLAAZ3m7DkuWEqXwKXZ2QGSppMSW0m__yUzGP8YowJWRQXZCoMABSgZuTzDd0QkPomun6P4UD7mm6ww9Q4GrC1qdljpNaFPkba2rDBLDrbIrWdP1IpNNWQ0B-hfjtC3iZLI6Z4Sc5r20a8OumCfDw-vK-es_Xr08vqfp05CTplBWfM6UooIzAvjK6F1xxQSTmOqqWyFUjNtNeFyh1oya2uQMhceFGDMZYvyO1v7y703wPGVG7HMdi2tsN-iKVghmnBZa7H6M0pOlRb9OUuNFsbDuXfH_wHBqNiug
CitedBy_id crossref_primary_10_1038_s41588_025_02110_8
crossref_primary_10_1101_gr_280036_124
crossref_primary_10_1038_s41588_025_02109_1
ContentType Journal Article
Copyright 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press.
Copyright_xml – notice: 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1101/gr.279057.124
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
Biology
EISSN 1549-5469
ExternalDocumentID 39111815
Genre Journal Article
GrantInformation_xml – fundername: NIH HHS
  grantid: OT2 OD026555
– fundername: NIH HHS
  grantid: OT2 OD026557
– fundername: NIH HHS
  grantid: DP5 OD029574
– fundername: NIH HHS
  grantid: OT2 OD025315
– fundername: NIH HHS
  grantid: OT2 OD026551
– fundername: NIH HHS
  grantid: OT2 OD025276
– fundername: NHGRI NIH HHS
  grantid: R01 HG010959
– fundername: NIH HHS
  grantid: OT2 OD026553
– fundername: NIH HHS
  grantid: OT2 OD026549
– fundername: NIH HHS
  grantid: OT2 OD025337
GroupedDBID ---
18M
29H
2WC
39C
4.4
53G
5GY
5RE
5VS
AAZTW
ABDIX
ABDNZ
ACGFO
ACLKE
ADBBV
ADNWM
AEILP
AENEX
AHPUY
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BTFSW
CGR
CS3
CUY
CVF
DIK
DU5
E3Z
EBS
ECM
EIF
F5P
FRP
GX1
H13
HYE
IH2
K-O
KQ8
MV1
NPM
R.V
RCX
RHI
RNS
RPM
RXW
SJN
TAE
TR2
W8F
WOQ
YKV
7X8
ID FETCH-LOGICAL-c416t-8200c6b3593e7896f3d621e544057f45ab14606d6857c1642a6b13473d3f199a2
IEDL.DBID 7X8
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001345945200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1549-5469
IngestDate Fri Sep 05 07:42:58 EDT 2025
Mon Jul 21 06:02:20 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c416t-8200c6b3593e7896f3d621e544057f45ab14606d6857c1642a6b13473d3f199a2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0009-0003-0969-0140
0000-0002-2713-0150
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC11529841
PMID 39111815
PQID 3090632476
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3090632476
pubmed_primary_39111815
PublicationCentury 2000
PublicationDate 2024-09-01
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Genome research
PublicationTitleAlternate Genome Res
PublicationYear 2024
SSID ssj0003488
Score 2.4815369
Snippet Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1312
SubjectTerms Algorithms
Computer Security
Genomics - methods
Humans
Pedigree
Title Secure discovery of genetic relatives across large-scale and distributed genomic data sets
URI https://www.ncbi.nlm.nih.gov/pubmed/39111815
https://www.proquest.com/docview/3090632476
Volume 34
WOSCitedRecordID wos001345945200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UKnrx0fqoL1YQb9sm2WSTnKQWixdLDwrFS9nsbkTQpDa10H_vzCaxJ0HwEnJZSGZnZ76ZnZmPkGtw0jp0jGaxkAEEKHHCpJSciUhyE0nXT73Ekk2Ew2E0HsejKuFWVGWVtU20hlrnCnPkXe7EDo4WD8Xt9JMhaxTerlYUGuukwQHKoFaH49W0cO5b3kmcQsYCiAN_Zmy63ddZx8PZVGHHxWb339Cl9TKDvf9-3z7ZrfAl7ZUKcUDWTNYkrV4GsfXHkt5QW_FpU-lNsnVXv233a963FnmxKXhDsV8X6zuXNE8pqBl2O9Ky9WVhCiqtf6XvWEnOCthpQ2WmcVXJoWU0LsKeZ4pVqLQw8-KQPA_un_oPrGJgYAqA2pwBPHCUSHgQcxNGsUi5Fp5rAh9hXuoHMgFD6wgtoiBUEHh5UiTYm8o1T904lt4R2cjyzJwQKmQqFJdKgYh87XApolQGroEANVRgVtrkqpbrBH4Yry1kZvKvYrKSbJscl5szmZajOCYcbXXkBqd_WH1GdjxAJGWB2DlppHC-zQXZVIv5WzG7tKoDz-Ho8RtDqM0P
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Secure+discovery+of+genetic+relatives+across+large-scale+and+distributed+genomic+data+sets&rft.jtitle=Genome+research&rft.au=Hong%2C+Matthew+M&rft.au=Froelicher%2C+David&rft.au=Magner%2C+Ricky&rft.au=Popic%2C+Victoria&rft.date=2024-09-01&rft.issn=1549-5469&rft.eissn=1549-5469&rft.volume=34&rft.issue=9&rft.spage=1312&rft_id=info:doi/10.1101%2Fgr.279057.124&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-5469&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-5469&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-5469&client=summon