Statistical issues in the analysis of Illumina data.
Saved in:
| Title: | Statistical issues in the analysis of Illumina data. |
|---|---|
| Authors: | Dunning MJ; Department of Oncology, University of Cambridge, CRUK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK. md392@cam.ac.uk, Barbosa-Morais NL, Lynch AG, Tavaré S, Ritchie ME |
| Source: | BMC bioinformatics [BMC Bioinformatics] 2008 Feb 06; Vol. 9, pp. 85. Date of Electronic Publication: 2008 Feb 06. |
| Publication Type: | Journal Article; Research Support, Non-U.S. Gov't |
| Language: | English |
| Journal Info: | Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: [London] : BioMed Central, 2000- |
| MeSH Terms: | Algorithms* , Artifacts* , Data Interpretation, Statistical* , Databases, Genetic*, Information Storage and Retrieval/*methods , Oligonucleotide Array Sequence Analysis/*methods, Database Management Systems ; Reproducibility of Results ; Sensitivity and Specificity |
| Abstract: | Background: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology. Results: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation. Conclusion: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays. |
| References: | Nucleic Acids Res. 2007 Jul;35(Web Server issue):W43-6. (PMID: 17452344) Nucleic Acids Res. 2005 Nov 10;33(20):e175. (PMID: 16284200) Genome Res. 2004 May;14(5):870-7. (PMID: 15078854) Genome Res. 2002 Oct;12(10):1611-8. (PMID: 12368254) Nucleic Acids Res. 1996 Nov 15;24(22):4501-5. (PMID: 8948641) Bioinformatics. 2007 Oct 15;23(20):2700-7. (PMID: 17720982) Bioinformatics. 2004 Feb 12;20(3):323-31. (PMID: 14960458) J Comput Biol. 2006 May;13(4):996-1003. (PMID: 16761924) Genome Res. 2004 Nov;14(11):2347-56. (PMID: 15520296) Genome Biol. 2004;5(10):R80. (PMID: 15461798) Nucleic Acids Res. 2007 Jan;35(Database issue):D668-73. (PMID: 17142222) Nucleic Acids Res. 2005 Oct 19;33(18):5914-23. (PMID: 16237126) Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21. (PMID: 11309499) Proc Natl Acad Sci U S A. 2003 Jun 24;100(13):7575-80. (PMID: 12808153) J Comput Biol. 2005 Jul-Aug;12(6):882-93. (PMID: 16108723) Stat Appl Genet Mol Biol. 2004;3:Article3. (PMID: 16646809) |
| Entry Date(s): | Date Created: 20080208 Date Completed: 20080506 Latest Revision: 20211020 |
| Update Code: | 20250114 |
| PubMed Central ID: | PMC2291044 |
| DOI: | 10.1186/1471-2105-9-85 |
| PMID: | 18254947 |
| Database: | MEDLINE |
| Abstract: | Background: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology.<br />Results: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation.<br />Conclusion: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays. |
|---|---|
| ISSN: | 1471-2105 |
| DOI: | 10.1186/1471-2105-9-85 |
Full Text Finder
Nájsť tento článok vo Web of Science