Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.

Uložené v:
Podrobná bibliografia
Názov: Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.
Autori: Golovko VV; Canadian Nuclear Laboratories, 286 Plant Road, Chalk River, ON K0J 1J0, Canada.
Zdroj: Biomolecules [Biomolecules] 2025 May 12; Vol. 15 (5). Date of Electronic Publication: 2025 May 12.
Spôsob vydávania: Journal Article
Jazyk: English
Informácie o časopise: Publisher: MDPI Country of Publication: Switzerland NLM ID: 101596414 Publication Model: Electronic Cited Medium: Internet ISSN: 2218-273X (Electronic) Linking ISSN: 2218273X NLM ISO Abbreviation: Biomolecules Subsets: MEDLINE
Imprint Name(s): Original Publication: Basel, Switzerland : MDPI, 2011-
Výrazy zo slovníka MeSH: Confidence Intervals*, Algorithms ; Datasets as Topic
Abstrakt: Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the 109 Ag(n, 2n) 108m Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.
References: Anal Chem. 2017 Jul 18;89(14):7447-7454. (PMID: 28640594)
BMC Genomics. 2011 Nov 03;12:547. (PMID: 22053771)
Brief Bioinform. 2018 Sep 28;19(5):776-792. (PMID: 28334202)
Sensors (Basel). 2022 Aug 11;22(16):. (PMID: 36015755)
Biophys J. 2022 Oct 4;121(19):3586-3599. (PMID: 36059196)
Appl Radiat Isot. 2004 Feb-Apr;60(2-4):317-23. (PMID: 14987661)
J Comput Aided Mol Des. 2014 Sep;28(9):887-918. (PMID: 24899109)
BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):562. (PMID: 33371881)
Entropy (Basel). 2023 Feb 13;25(2):. (PMID: 36832712)
Appl Radiat Isot. 2018 Jun;136:101-103. (PMID: 29490286)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
Sensors (Basel). 2023 Oct 31;23(21):. (PMID: 37960554)
Sensors (Basel). 2025 Feb 14;25(4):. (PMID: 40006412)
Biomolecules. 2024 Nov 18;14(11):. (PMID: 39595644)
J Anim Ecol. 2015 Jul;84(4):892-7. (PMID: 26074184)
Appl Radiat Isot. 2024 Jan;203:111111. (PMID: 38000165)
Contributed Indexing: Keywords: fast-neutron activation cross-section of the 109Ag(n, 2n)108mAg reaction; half-life of the 108mAg; hybrid parametric bootstrapping; most frequent value; robust statistical method
Entry Date(s): Date Created: 20250528 Date Completed: 20250528 Latest Revision: 20250604
Update Code: 20250604
PubMed Central ID: PMC12109080
DOI: 10.3390/biom15050704
PMID: 40427597
Databáza: MEDLINE
Popis
Abstrakt:Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner's most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method's robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the <sup>109</sup> Ag(n, 2n) <sup>108m</sup> Ag reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV-hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset's complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method's ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.
ISSN:2218-273X
DOI:10.3390/biom15050704