Managing uncertainty in imputing missing symptom value for healthcare of rural India

Purpose In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Health information science and systems Jg. 7; H. 1; S. 5 - 15
Hauptverfasser: Das, Sayan, Sil, Jaya
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Cham Springer International Publishing 18.02.2019
BioMed Central Ltd
Springer Nature B.V
Schlagworte:
ISSN:2047-2501, 2047-2501
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Purpose In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO 2 ) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set. Methods The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation ( dCor ) techniques, showing that the simulated data sets are well correlated with the real data set. Results Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts. Conclusions It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2047-2501
2047-2501
DOI:10.1007/s13755-019-0066-4