‘Big data’, Hadoop and cloud computing in genomics

[Display omitted] •Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.•Biology is now one of the fastest growing fields of big data science.•Cloud computing and big data technologies can be used to deal with biology’s big data sets.•The...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of biomedical informatics Ročník 46; číslo 5; s. 774 - 781
Hlavní autoři:	O’Driscoll, Aisling, Daugelaite, Jurate, Sleator, Roy D.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States Elsevier Inc 01.10.2013
Témata:	Big data Bioinformatics Cloud computing Genomics Hadoop Human Genome Project Humans Internet Software Cloud computing Big data Bioinformatics Genomics Hadoop
ISSN:	1532-0464, 1532-0480, 1532-0480
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	[Display omitted] •Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.•Biology is now one of the fastest growing fields of big data science.•Cloud computing and big data technologies can be used to deal with biology’s big data sets.•The Apache Hadoop project, which provides distributed and parallelised data processing are presented.•Challenges associated with cloud computing and big data technologies in biology are discussed. Since the completion of the Human Genome project at the turn of the Century, there has been an unprecedented proliferation of genomic sequence data. A consequence of this is that the medical discoveries of the future will largely depend on our ability to process and analyse large genomic data sets, which continue to expand as the cost of sequencing decreases. Herein, we provide an overview of cloud computing and big data technologies, and discuss how such expertise can be used to deal with biology’s big data sets. In particular, big data technologies such as the Apache Hadoop project, which provides distributed and parallelised data processing and analysis of petabyte (PB) scale data sets will be discussed, together with an overview of the current usage of Hadoop within the bioinformatics community.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0464 1532-0480 1532-0480
DOI:	10.1016/j.jbi.2013.07.001