Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression
Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression...
Uložené v:
| Vydané v: | Journal of proteome research Ročník 23; číslo 9; s. 4043 |
|---|---|
| Hlavní autori: | , , , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
06.09.2024
|
| Predmet: | |
| ISSN: | 1535-3907, 1535-3907 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression
= 4,479, healthy control
= 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation. |
|---|---|
| AbstractList | Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation. Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression = 4,479, healthy control = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation. |
| Author | Kang, Lijun Ma, Simeng Liu, Zhongchun Li, Ruiling Gong, Qian Deng, Zipeng Yang, Jun Yao, Lihua Xiang, Dan Lv, Honggang Wang, Beibei |
| Author_xml | – sequence: 1 givenname: Simeng orcidid: 0000-0002-1775-5202 surname: Ma fullname: Ma, Simeng organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 2 givenname: Ruiling surname: Li fullname: Li, Ruiling organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 3 givenname: Qian surname: Gong fullname: Gong, Qian organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 4 givenname: Honggang surname: Lv fullname: Lv, Honggang organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 5 givenname: Zipeng surname: Deng fullname: Deng, Zipeng organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 6 givenname: Beibei surname: Wang fullname: Wang, Beibei organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 7 givenname: Lihua surname: Yao fullname: Yao, Lihua organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 8 givenname: Lijun surname: Kang fullname: Kang, Lijun organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 9 givenname: Dan surname: Xiang fullname: Xiang, Dan organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China – sequence: 10 givenname: Jun surname: Yang fullname: Yang, Jun organization: School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China – sequence: 11 givenname: Zhongchun surname: Liu fullname: Liu, Zhongchun organization: Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan 430071, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39150755$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNUMtOwzAQtFARpYVPAPnIJcWx46Q5lpaXVEEl6DnaOOviksTFTovg67FokbjsrLSzOzszIL3WtkjIRcxGMePxNSg_Wm-c7dA2OEoUY2KcH5HTWAoZiZxlvX99nwy8XzMWy4yJE9IXeSxZJuUp-V56067oDDqIZs7ssKWTemWd6d4aTz8D0Dm4FUYvCmqkixp8A3Sx1zXqd5F2ls6MV3aHjj6FWtMbYxtw7-g81daFKaxau1fCjUPvjW3PyLGG2uP5AYdkeXf7On2I5s_3j9PJPAIheBeVWamqUlSgcp0qlioc8yoYE0oGG1mZpRhLxVOdJlrLtFTjKmMcQcs4AaWBD8nV_m5I62OLviua8CzWNbRot74QLBcyl4IngXp5oG7LBqti40yw8VX85cV_ACSddDY |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1021/acs.jproteome.4c00389 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Chemistry |
| EISSN | 1535-3907 |
| ExternalDocumentID | 39150755 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | --- 4.4 53G 55A 5GY 5VS 7~N AABXI AAHBH ABBLG ABJNI ABLBI ABMVS ABQRX ABUCX ACGFS ACS ADHLV AEESW AENEX AFEFF AHGAQ ALMA_UNASSIGNED_HOLDINGS AQSVZ BAANH CGR CS3 CUPRZ CUY CVF DU5 EBS ECM ED~ EIF F5P GGK GNL IH9 IHE JG~ NPM P2P RNS ROL UI2 VF5 VG9 W1F 7X8 |
| ID | FETCH-LOGICAL-a332t-b7bcdb3dac9f6c06ce82d5353c53917b76e15c26f64ff56bc8d702eaf514acfa2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001293278600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1535-3907 |
| IngestDate | Wed Oct 01 14:51:43 EDT 2025 Tue Aug 19 01:31:02 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Keywords | Biomarkers Depression CatBoost Proteomic |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a332t-b7bcdb3dac9f6c06ce82d5353c53917b76e15c26f64ff56bc8d702eaf514acfa2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-1775-5202 |
| PMID | 39150755 |
| PQID | 3093595324 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3093595324 pubmed_primary_39150755 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-09-06 |
| PublicationDateYYYYMMDD | 2024-09-06 |
| PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-06 day: 06 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Journal of proteome research |
| PublicationTitleAlternate | J Proteome Res |
| PublicationYear | 2024 |
| SSID | ssj0015703 |
| Score | 2.455399 |
| Snippet | Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 4043 |
| SubjectTerms | Algorithms Area Under Curve Biomarkers - blood Blood Proteins - analysis Blood Proteins - metabolism Depression - blood Depression - diagnosis Female Humans Machine Learning Male Proteome - analysis Proteome - metabolism Proteomics - methods ROC Curve |
| Title | Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39150755 https://www.proquest.com/docview/3093595324 |
| Volume | 23 |
| WOSCitedRecordID | wos001293278600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELZ4VKIXoLzKqzISVy-7iZ3HCcEuiEO7WolW2tvKnowhaDehm5QDv56xk4hbhcTFh0RO4vHr88yX-Rg7N0kfrDJ9kQBaITGyIkGUNK9UMrDKSlSN2EQ8HifTaTppHW5VS6vs1kS_UGclOB_5hY_YpYr2_8vnv8KpRrnoaiuhscrWQ4IyjtIVT9-jCC67VJMvVQk628fdHzzB4EJD1XvyqRDKBfYkuAjZf1Cm321utz77ndtss8WZ_KoZGN_YChY7bGPYybvtsldPFuAjXWsxWro1j1_NH-hJ9eOi4s49y386lri4p15EPiGUvdB80rQlB1-R1yUf5RU4GigfUznn13m5cIyfZcUJDtNdz-Tzb-o4t8Ue-3N783t4J1ohBqHDMKiFiQ1kJsw0pDaCfgSYBBkZNQQV0nHPxBEOFASRjaS1KjKQZHE_QG0JjWmwOthna0VZ4HfGMdQyoIsK0lQqabVMMCWUZ0zotNDwkJ11Zp2RQVz0QhdY_qtm74Y9ZAdN38yem4wcM5flnrCPOvpA7WP2NSBg4nli0QlbtzTN8ZR9gZc6r5Y__Aiicjz59QZRadRD |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+Data-Driven+Algorithms+with+Large-Scale+Plasma+Proteomic+Data+to+Discover+Novel+Biomarkers+for+Diagnosing+Depression&rft.jtitle=Journal+of+proteome+research&rft.au=Ma%2C+Simeng&rft.au=Li%2C+Ruiling&rft.au=Gong%2C+Qian&rft.au=Lv%2C+Honggang&rft.date=2024-09-06&rft.issn=1535-3907&rft.eissn=1535-3907&rft.volume=23&rft.issue=9&rft.spage=4043&rft_id=info:doi/10.1021%2Facs.jproteome.4c00389&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1535-3907&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1535-3907&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1535-3907&client=summon |