Confirmation Bias in Gaussian Mixture Models

Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational data do not support them. This issue is especially...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on information theory Ročník 71; číslo 11; s. 8871 - 8898
Hlavní autori: Balanov, Amnon, Bendory, Tamir, Huleihel, Wasim
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: IEEE 01.11.2025
Predmet:
ISSN:0018-9448, 1557-9654
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational data do not support them. This issue is especially critical in scientific fields involving highly noisy observations, such as cryo-electron microscopy. This study investigates confirmation bias in Gaussian mixture models. We consider the following experiment: A team of scientists assumes they are analyzing data drawn from a Gaussian mixture model with known signals (hypotheses) as centroids. However, in reality, the observations consist entirely of noise without any informative structure. The researchers use a single iteration of the K -means or expectation-maximization algorithms, two popular algorithms to estimate the centroids. Despite the observations being pure noise, we show that these algorithms yield biased estimates that resemble the initial hypotheses, contradicting the unbiased expectation that averaging these noise observations would converge to zero. Namely, the algorithms generate estimates that mirror the postulated model, although the hypotheses (the presumed centroids of the Gaussian mixture) are not evident in the observations. Specifically, among other results, we prove a positive correlation between the estimates produced by the algorithms and the corresponding hypotheses. We also derive explicit closed-form expressions of the estimates for a finite and infinite number of hypotheses. Furthermore, we provide theoretical and empirical results for multi-iteration K -means and expectation-maximization, showing that the bias is persistent even after hundreds of iterations of these algorithms. This study underscores the risks of confirmation bias in low signal-to-noise environments, provides insights into potential pitfalls in scientific methodologies, and highlights the importance of prudent data interpretation.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2025.3603619