Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat

Background Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:BMC bioinformatics Ročník 16; číslo 1; s. 63
Hlavní autoři: Stein, Caleb K, Qu, Pingping, Epstein, Joshua, Buros, Amy, Rosenthal, Adam, Crowley, John, Morgan, Gareth, Barlogie, Bart
Médium: Journal Article
Jazyk:angličtina
Vydáno: London BioMed Central 25.02.2015
BioMed Central Ltd
Témata:
ISSN:1471-2105, 1471-2105
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Background Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects are often unavoidable in long-term studies and meta-analysis. In order to reduce the impact of batch effects on microarray data, Johnson, Rabinovic, and Li developed ComBat for use when combining batches of gene expression microarray data. We propose a modification to ComBat that centers data to the location and scale of a pre-determined, ‘gold-standard’ batch. This modified ComBat (M-Combat) is designed specifically in the context of meta-analysis and batch effect adjustment for use with predictive models that are validated and fixed on historical data from a ‘gold-standard’ batch. Results We combined data from MIRT across two batches (‘Old’ and ‘New’ Kit sample preparation) as well as external data sets from the HOVON-65/GMMG-HD4 and MRC-IX trials into a combined set, first without transformation and then with both ComBat and M-ComBat transformations. Fixed and validated gene risk signatures developed at MIRT on the Old Kit standard (GEP5, GEP70, and GEP80 risk scores) were compared across these combined data sets. Both ComBat and M-ComBat eliminated all of the differences among probes caused by systematic batch effects (over 98 % of all untransformed probes were significantly different by ANOVA with 0.01 q-value threshold reduced to zero significant probes with ComBat and M-ComBat). The agreement in mean and distribution of risk scores, as well as the proportion of high-risk subjects identified, coincided with the ‘gold-standard’ batch more with M-ComBat than with ComBat. The performance of risk scores improved overall using either ComBat or M-Combat; however, using M-ComBat and the original, optimal risk cutoffs allowed for greater ability in our study to identify smaller cohorts of high-risk subjects. Conclusion M-ComBat is a practical modification to an accepted method that offers greater power to control the location and scale of batch-effect adjusted data. M-ComBat allows for historical models to function as intended on future samples despite known, often unavoidable systematic changes to gene expression data.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-015-0478-3