Secure discovery of genetic relatives across large-scale and distributed genomic data sets
Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challen...
Uložené v:
| Vydané v: | Genome research Ročník 34; číslo 9; s. 1312 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
01.09.2024
|
| Predmet: | |
| ISSN: | 1549-5469, 1549-5469 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and
data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets. |
|---|---|
| AbstractList | Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets. Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets. |
| Author | Froelicher, David Cho, Hyunghoon Magner, Ricky Popic, Victoria Hong, Matthew M Berger, Bonnie |
| Author_xml | – sequence: 1 givenname: Matthew M orcidid: 0009-0003-0969-0140 surname: Hong fullname: Hong, Matthew M organization: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA – sequence: 2 givenname: David surname: Froelicher fullname: Froelicher, David organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA – sequence: 3 givenname: Ricky surname: Magner fullname: Magner, Ricky organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA – sequence: 4 givenname: Victoria surname: Popic fullname: Popic, Victoria email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu organization: Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA; vpopic@broadinstitute.org bab@mit.edu hoon.cho@yale.edu – sequence: 5 givenname: Bonnie surname: Berger fullname: Berger, Bonnie email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu organization: Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA – sequence: 6 givenname: Hyunghoon orcidid: 0000-0002-2713-0150 surname: Cho fullname: Cho, Hyunghoon email: vpopic@broadinstitute.org, bab@mit.edu, hoon.cho@yale.edu organization: Department of Biomedical Informatics and Data Science, Yale University, New Haven, Connecticut 06510, USA vpopic@broadinstitute.org bab@mit.edu hoon.cho@yale.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39111815$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkLtPwzAYxC1URB8wsiKPLCn-_Eo8ooqXVIkBWFgix_5SBaVJsZ1K_e9JRZGY7ob7nXQ3J5Ou75CQa2BLAAZ3m7DkuWEqXwKXZ2QGSppMSW0m__yUzGP8YowJWRQXZCoMABSgZuTzDd0QkPomun6P4UD7mm6ww9Q4GrC1qdljpNaFPkba2rDBLDrbIrWdP1IpNNWQ0B-hfjtC3iZLI6Z4Sc5r20a8OumCfDw-vK-es_Xr08vqfp05CTplBWfM6UooIzAvjK6F1xxQSTmOqqWyFUjNtNeFyh1oya2uQMhceFGDMZYvyO1v7y703wPGVG7HMdi2tsN-iKVghmnBZa7H6M0pOlRb9OUuNFsbDuXfH_wHBqNiug |
| CitedBy_id | crossref_primary_10_1038_s41588_025_02110_8 crossref_primary_10_1101_gr_280036_124 crossref_primary_10_1038_s41588_025_02109_1 |
| ContentType | Journal Article |
| Copyright | 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press. |
| Copyright_xml | – notice: 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1101/gr.279057.124 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Anatomy & Physiology Chemistry Biology |
| EISSN | 1549-5469 |
| ExternalDocumentID | 39111815 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NIH HHS grantid: OT2 OD026555 – fundername: NIH HHS grantid: OT2 OD026557 – fundername: NIH HHS grantid: DP5 OD029574 – fundername: NIH HHS grantid: OT2 OD025315 – fundername: NIH HHS grantid: OT2 OD026551 – fundername: NIH HHS grantid: OT2 OD025276 – fundername: NHGRI NIH HHS grantid: R01 HG010959 – fundername: NIH HHS grantid: OT2 OD026553 – fundername: NIH HHS grantid: OT2 OD026549 – fundername: NIH HHS grantid: OT2 OD025337 |
| GroupedDBID | --- 18M 29H 2WC 39C 4.4 53G 5GY 5RE 5VS AAZTW ABDIX ABDNZ ACGFO ACLKE ADBBV ADNWM AEILP AENEX AHPUY ALMA_UNASSIGNED_HOLDINGS BAWUL BTFSW CGR CS3 CUY CVF DIK DU5 E3Z EBS ECM EIF F5P FRP GX1 H13 HYE IH2 K-O KQ8 MV1 NPM R.V RCX RHI RNS RPM RXW SJN TAE TR2 W8F WOQ YKV 7X8 |
| ID | FETCH-LOGICAL-c416t-8200c6b3593e7896f3d621e544057f45ab14606d6857c1642a6b13473d3f199a2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001345945200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1549-5469 |
| IngestDate | Fri Sep 05 07:42:58 EDT 2025 Mon Jul 21 06:02:20 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | 2024 Hong et al.; Published by Cold Spring Harbor Laboratory Press. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c416t-8200c6b3593e7896f3d621e544057f45ab14606d6857c1642a6b13473d3f199a2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0009-0003-0969-0140 0000-0002-2713-0150 |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC11529841 |
| PMID | 39111815 |
| PQID | 3090632476 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3090632476 pubmed_primary_39111815 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-09-01 |
| PublicationDateYYYYMMDD | 2024-09-01 |
| PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Genome research |
| PublicationTitleAlternate | Genome Res |
| PublicationYear | 2024 |
| SSID | ssj0003488 |
| Score | 2.4815369 |
| Snippet | Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1312 |
| SubjectTerms | Algorithms Computer Security Genomics - methods Humans Pedigree |
| Title | Secure discovery of genetic relatives across large-scale and distributed genomic data sets |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39111815 https://www.proquest.com/docview/3090632476 |
| Volume | 34 |
| WOSCitedRecordID | wos001345945200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UKnrx0fqoL1YQb9sm2WSTnKQWixdLDwrFS9nsbkTQpDa10H_vzCaxJ0HwEnJZSGZnZ76ZnZmPkGtw0jp0jGaxkAEEKHHCpJSciUhyE0nXT73Ekk2Ew2E0HsejKuFWVGWVtU20hlrnCnPkXe7EDo4WD8Xt9JMhaxTerlYUGuukwQHKoFaH49W0cO5b3kmcQsYCiAN_Zmy63ddZx8PZVGHHxWb339Cl9TKDvf9-3z7ZrfAl7ZUKcUDWTNYkrV4GsfXHkt5QW_FpU-lNsnVXv233a963FnmxKXhDsV8X6zuXNE8pqBl2O9Ky9WVhCiqtf6XvWEnOCthpQ2WmcVXJoWU0LsKeZ4pVqLQw8-KQPA_un_oPrGJgYAqA2pwBPHCUSHgQcxNGsUi5Fp5rAh9hXuoHMgFD6wgtoiBUEHh5UiTYm8o1T904lt4R2cjyzJwQKmQqFJdKgYh87XApolQGroEANVRgVtrkqpbrBH4Yry1kZvKvYrKSbJscl5szmZajOCYcbXXkBqd_WH1GdjxAJGWB2DlppHC-zQXZVIv5WzG7tKoDz-Ho8RtDqM0P |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Secure+discovery+of+genetic+relatives+across+large-scale+and+distributed+genomic+data+sets&rft.jtitle=Genome+research&rft.au=Hong%2C+Matthew+M&rft.au=Froelicher%2C+David&rft.au=Magner%2C+Ricky&rft.au=Popic%2C+Victoria&rft.date=2024-09-01&rft.issn=1549-5469&rft.eissn=1549-5469&rft.volume=34&rft.issue=9&rft.spage=1312&rft_id=info:doi/10.1101%2Fgr.279057.124&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-5469&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-5469&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-5469&client=summon |