HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation

Background Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. D...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	BMC bioinformatics Ročník 26; číslo 1; s. 47 - 16
Hlavní autoři:	Schlumbohm, Simon, Neumann, Julia E., Neumann, Philipp
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London BioMed Central 11.02.2025 BioMed Central Ltd BMC
Témata:	Algorithms Analysis Batch effects Big data Bioinformatics Biomedical and Life Sciences Computational Biology - methods Computational Biology/Bioinformatics Computational efficiency Computer Appl. in Life Sciences Dataset integration Electronic data processing Genomics Life Sciences Methods Microarrays Proteomics RNA sequencing Software Germany Big data Computational efficiency Dataset integration Batch effects Proteomics
ISSN:	1471-2105, 1471-2105
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Background Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment. Results In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a “unique removal” strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets. Conclusion The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-025-06073-9