Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression

Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of proteome research Ročník 23; číslo 9; s. 4043
Hlavní autoři: Ma, Simeng, Li, Ruiling, Gong, Qian, Lv, Honggang, Deng, Zipeng, Wang, Beibei, Yao, Lihua, Kang, Lijun, Xiang, Dan, Yang, Jun, Liu, Zhongchun
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 06.09.2024
Témata:
ISSN:1535-3907, 1535-3907
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression = 4,479, healthy control = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
AbstractList Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression = 4,479, healthy control = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.
Author Kang, Lijun
Ma, Simeng
Liu, Zhongchun
Li, Ruiling
Gong, Qian
Deng, Zipeng
Yang, Jun
Yao, Lihua
Xiang, Dan
Lv, Honggang
Wang, Beibei
Author_xml – sequence: 1
  givenname: Simeng
  orcidid: 0000-0002-1775-5202
  surname: Ma
  fullname: Ma, Simeng
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 2
  givenname: Ruiling
  surname: Li
  fullname: Li, Ruiling
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 3
  givenname: Qian
  surname: Gong
  fullname: Gong, Qian
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 4
  givenname: Honggang
  surname: Lv
  fullname: Lv, Honggang
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 5
  givenname: Zipeng
  surname: Deng
  fullname: Deng, Zipeng
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 6
  givenname: Beibei
  surname: Wang
  fullname: Wang, Beibei
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 7
  givenname: Lihua
  surname: Yao
  fullname: Yao, Lihua
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 8
  givenname: Lijun
  surname: Kang
  fullname: Kang, Lijun
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 9
  givenname: Dan
  surname: Xiang
  fullname: Xiang, Dan
  organization: Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
– sequence: 10
  givenname: Jun
  surname: Yang
  fullname: Yang, Jun
  organization: School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
– sequence: 11
  givenname: Zhongchun
  surname: Liu
  fullname: Liu, Zhongchun
  organization: Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan 430071, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39150755$$D View this record in MEDLINE/PubMed
BookMark eNpNUMtOwzAQtFARpYVPAPnIJcWx46Q5lpaXVEEl6DnaOOviksTFTovg67FokbjsrLSzOzszIL3WtkjIRcxGMePxNSg_Wm-c7dA2OEoUY2KcH5HTWAoZiZxlvX99nwy8XzMWy4yJE9IXeSxZJuUp-V56067oDDqIZs7ssKWTemWd6d4aTz8D0Dm4FUYvCmqkixp8A3Sx1zXqd5F2ls6MV3aHjj6FWtMbYxtw7-g81daFKaxau1fCjUPvjW3PyLGG2uP5AYdkeXf7On2I5s_3j9PJPAIheBeVWamqUlSgcp0qlioc8yoYE0oGG1mZpRhLxVOdJlrLtFTjKmMcQcs4AaWBD8nV_m5I62OLviua8CzWNbRot74QLBcyl4IngXp5oG7LBqti40yw8VX85cV_ACSddDY
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1021/acs.jproteome.4c00389
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Chemistry
EISSN 1535-3907
ExternalDocumentID 39150755
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
4.4
53G
55A
5GY
5VS
7~N
AABXI
AAHBH
ABBLG
ABJNI
ABLBI
ABMVS
ABQRX
ABUCX
ACGFS
ACS
ADHLV
AEESW
AENEX
AFEFF
AHGAQ
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
BAANH
CGR
CS3
CUPRZ
CUY
CVF
DU5
EBS
ECM
ED~
EIF
F5P
GGK
GNL
IH9
IHE
JG~
NPM
P2P
RNS
ROL
UI2
VF5
VG9
W1F
7X8
ID FETCH-LOGICAL-a332t-b7bcdb3dac9f6c06ce82d5353c53917b76e15c26f64ff56bc8d702eaf514acfa2
IEDL.DBID 7X8
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001293278600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1535-3907
IngestDate Wed Oct 01 14:51:43 EDT 2025
Tue Aug 19 01:31:02 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords Biomarkers
Depression
CatBoost
Proteomic
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a332t-b7bcdb3dac9f6c06ce82d5353c53917b76e15c26f64ff56bc8d702eaf514acfa2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-1775-5202
PMID 39150755
PQID 3093595324
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3093595324
pubmed_primary_39150755
PublicationCentury 2000
PublicationDate 2024-09-06
PublicationDateYYYYMMDD 2024-09-06
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-06
  day: 06
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of proteome research
PublicationTitleAlternate J Proteome Res
PublicationYear 2024
SSID ssj0015703
Score 2.455399
Snippet Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 4043
SubjectTerms Algorithms
Area Under Curve
Biomarkers - blood
Blood Proteins - analysis
Blood Proteins - metabolism
Depression - blood
Depression - diagnosis
Female
Humans
Machine Learning
Male
Proteome - analysis
Proteome - metabolism
Proteomics - methods
ROC Curve
Title Using Data-Driven Algorithms with Large-Scale Plasma Proteomic Data to Discover Novel Biomarkers for Diagnosing Depression
URI https://www.ncbi.nlm.nih.gov/pubmed/39150755
https://www.proquest.com/docview/3093595324
Volume 23
WOSCitedRecordID wos001293278600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAIsHC-1FeMhKrS5uHk0yotFQMEFUCpG6Vc7EhqElKExj49ZydRN0QEouXxEl8Z58_3325I-TKQRiqeBcYOLbNHMEVE8pXuOIlzhCnF3tCmWITXhj6k0kwrh1uRU2rbGyiMdRxDtpHfm0idoGL-__N_IPpqlE6ulqX0FglLRuhjKZ0eZNlFEFnl6rypboMz_Ze8weP1bsWUHTeTSqEPJUdB3SE7BeUaXab0fZ_v3OHbNU4k_aribFLVmS2RzYGTXm3ffJtyAJ0KErBhgtt82h_9opPKt_Sgmr3LH3QLHH2hFqUdIwoOxV0XI0lAdORljkdJgVoGigNsZ3R2yRPNeNnUVCEw3jVMPnMmxrObXZAXkZ3z4N7VhdiYMK2rZJFXgRxZMcCAsWhy0H6VoxCtcG18bgXeVz2XLBQ6Y5SLo_Aj72uJYVCNCZACeuQrGV5Jo8JVR7eGAifiyjSWDCIOQoMzaYMIuA-tMllI9YpCkRHL0Qm889iuhRsmxxVupnOq4wcU53lHrGPe_KH3qdk00JgYnhi_Iy0FC5zeU7W4atMisWFmUHYhuPHH_R-08k
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+Data-Driven+Algorithms+with+Large-Scale+Plasma+Proteomic+Data+to+Discover+Novel+Biomarkers+for+Diagnosing+Depression&rft.jtitle=Journal+of+proteome+research&rft.au=Ma%2C+Simeng&rft.au=Li%2C+Ruiling&rft.au=Gong%2C+Qian&rft.au=Lv%2C+Honggang&rft.date=2024-09-06&rft.eissn=1535-3907&rft.volume=23&rft.issue=9&rft.spage=4043&rft_id=info:doi/10.1021%2Facs.jproteome.4c00389&rft_id=info%3Apmid%2F39150755&rft_id=info%3Apmid%2F39150755&rft.externalDocID=39150755
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1535-3907&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1535-3907&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1535-3907&client=summon