Managing uncertainty in imputing missing symptom value for healthcare of rural India
Purpose In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (...
Gespeichert in:
| Veröffentlicht in: | Health information science and systems Jg. 7; H. 1; S. 5 - 15 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Cham
Springer International Publishing
18.02.2019
BioMed Central Ltd Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 2047-2501, 2047-2501 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Purpose
In India, 67% of the total population live in remote area, where providing primary healthcare is a real challenge due to the scarcity of doctors. Health kiosks are deployed in remote villages and basic health data like blood pressure, pulse rate, height–weight, BMI, Oxygen saturation level (SpO
2
) etc. are collected. The acquired data is often imprecise due to measurement error and contains missing value. The paper proposes a comprehensive framework to impute missing symptom values by managing uncertainty present in the data set.
Methods
The data sets are fuzzified to manage uncertainty and fuzzy c-means clustering algorithm has been applied to group the symptom feature vectors into different disease classes. The missing symptom values corresponding to each disease are imputed using multiple fuzzy based regression model. Relations between different symptoms are framed with the help of experts and medical literature. Blood pressure symptom has been dealt with using a novel approach due to its characteristics and different from other symptoms. Patients’ records obtained from the kiosks are not adequate, so relevant data are simulated by the Monte Carlo method to avoid over-fitting problem while imputing missing values of the symptoms. The generated datasets are verified using Kulberk–Leiber (K–L) distance and distance correlation (
dCor
) techniques, showing that the simulated data sets are well correlated with the real data set.
Results
Using the data sets, the proposed model is built and new patients are provisionally diagnosed using Softmax cost function. Multiple class labels as diseases are determined by achieving about 98% accuracy and verified with the ground truth provided by the experts.
Conclusions
It is worth to mention that the system is for primary healthcare and in emergency cases, patients are referred to the experts. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2047-2501 2047-2501 |
| DOI: | 10.1007/s13755-019-0066-4 |