HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation

Background Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. D...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC bioinformatics Jg. 26; H. 1; S. 47 - 16
Hauptverfasser:	Schlumbohm, Simon, Neumann, Julia E., Neumann, Philipp
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London BioMed Central 11.02.2025 BioMed Central Ltd BMC
Schlagworte:	Algorithms Analysis Batch effects Big data Bioinformatics Biomedical and Life Sciences Computational Biology - methods Computational Biology/Bioinformatics Computational efficiency Computer Appl. in Life Sciences Dataset integration Electronic data processing Genomics Life Sciences Methods Microarrays Proteomics RNA sequencing Software Germany Big data Computational efficiency Dataset integration Batch effects Proteomics
ISSN:	1471-2105, 1471-2105
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Background Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment. Results In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a “unique removal” strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets. Conclusion The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-025-06073-9