A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus

In recent years, the development of diagnostics using artificial intelligence (AI) has been remarkable. AI algorithms can go beyond human reasoning and build diagnostic models from a number of complex combinations. Using next-generation sequencing technology, we identified hepatitis C virus (HCV) va...

Full description

Saved in:
Bibliographic Details
Published in:PloS one Vol. 15; no. 11; p. e0242028
Main Authors: Haga, Hiroaki, Sato, Hidenori, Koseki, Ayumi, Saito, Takafumi, Okumoto, Kazuo, Hoshikawa, Kyoko, Katsumi, Tomohiro, Mizuno, Kei, Nishina, Taketo, Ueno, Yoshiyuki
Format: Journal Article
Language:English
Published: United States Public Library of Science 05.11.2020
Public Library of Science (PLoS)
Subjects:
ISSN:1932-6203, 1932-6203
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, the development of diagnostics using artificial intelligence (AI) has been remarkable. AI algorithms can go beyond human reasoning and build diagnostic models from a number of complex combinations. Using next-generation sequencing technology, we identified hepatitis C virus (HCV) variants resistant to directing-acting antivirals (DAA) by whole genome sequencing of full-length HCV genomes, and applied these variants to various machine-learning algorithms to evaluate a preliminary predictive model. HCV genomic RNA was extracted from serum from 173 patients (109 with subsequent sustained virological response [SVR] and 64 without) before DAA treatment. HCV genomes from the 109 SVR and 64 non-SVR patients were randomly divided into a training data set (57 SVR and 29 non-SVR) and a validation-data set (52 SVR and 35 non-SVR). The training data set was subject to nine machine-learning algorithms selected to identify the optimized combination of functional variants in relation to SVR status following DAA therapy. Subsequently, the prediction model was tested by the validation-data set. The most accurate learning method was the support vector machine (SVM) algorithm (validation accuracy, 0.95; kappa statistic, 0.90; F-value, 0.94). The second-most accurate learning algorithm was Multi-layer perceptron. Unfortunately, Decision Tree, and Naive Bayes algorithms could not be fitted with our data set due to low accuracy (< 0.8). Conclusively, with an accuracy rate of 95.4% in the generalization performance evaluation, SVM was identified as the best algorithm. Analytical methods based on genomic analysis and the construction of a predictive model by machine-learning may be applicable to the selection of the optimal treatment for other viral infections and cancer.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: NO authors have competing interests
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0242028