Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”),...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of clinical epidemiology Ročník 133; s. 140 - 151
Hlavní autoři:	Thomas, James, McDonald, Steve, Noel-Storr, Anna, Shemilt, Ian, Elliott, Julian, Mavergames, Chris, Marshall, Iain J.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States Elsevier Inc 01.05.2021 Elsevier Limited Elsevier
Témata:	Algorithms Automation Bibliographic data bases Calibration Classification Classifiers Clinical trials Cochrane Library Confidence intervals Cost control Crowdsourcing Databases, Bibliographic - standards Databases, Bibliographic - statistics & numerical data Datasets Efficiency Epidemiology Evaluation Humans Information retrieval Information Storage and Retrieval - methods Information Storage and Retrieval - standards Information Storage and Retrieval - statistics & numerical data Internal Medicine Learning algorithms Machine Learning Methods/methodology Original Production methods Randomized controlled trials Randomized Controlled Trials as Topic - classification Randomized Controlled Trials as Topic - methods Randomized Controlled Trials as Topic - standards Randomized Controlled Trials as Topic - statistics & numerical data Recall Searching Study classifiers Systematic review Systematic reviews Systematic Reviews as Topic - methods Systematic Reviews as Topic - standards Validity Workflow Workload Workload - statistics & numerical data Workloads Crowdsourcing Methods/methodology Automation Searching Cochrane Library Machine learning Systematic reviews Study classifiers Information retrieval Randomized controlled trials
ISSN:	0895-4356, 1878-5921, 1878-5921
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98–0.99) and precision of 0.08 (95% confidence interval 0.06–0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production. •Systematic review processes need to become more efficient.•Machine learning is sufficiently mature for real-world use.•A machine learning classifier was built using data from Cochrane Crowd.•It was calibrated to achieve very high recall.•It is now live and in use in Cochrane review production systems.
Bibliografie:	ObjectType-Article-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-3 ObjectType-Evidence Based Healthcare-1 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0895-4356 1878-5921 1878-5921
DOI:	10.1016/j.jclinepi.2020.11.003