On-Demand Indexing for Referential Compression of DNA Sequences

The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evoluti...

Full description

Saved in:
Bibliographic Details
Published in:PloS one Vol. 10; no. 7; p. e0132460
Main Authors: Alves, Fernando, Cogo, Vinicius, Wandelt, Sebastian, Leser, Ulf, Bessani, Alysson
Format: Journal Article
Language:English
Published: United States Public Library of Science 06.07.2015
Public Library of Science (PLoS)
Subjects:
ISSN:1932-6203, 1932-6203
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.
AbstractList The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.
Audience Academic
Author Cogo, Vinicius
Wandelt, Sebastian
Leser, Ulf
Bessani, Alysson
Alves, Fernando
AuthorAffiliation 1 LaSIGE, University of Lisbon, Lisbon, Portugal
Centro de Investigación y de Estudios Avanzados del IPN, MEXICO
2 WBI, Humboldt-Universität zu Berlin, Berlin, Germany
AuthorAffiliation_xml – name: 1 LaSIGE, University of Lisbon, Lisbon, Portugal
– name: Centro de Investigación y de Estudios Avanzados del IPN, MEXICO
– name: 2 WBI, Humboldt-Universität zu Berlin, Berlin, Germany
Author_xml – sequence: 1
  givenname: Fernando
  surname: Alves
  fullname: Alves, Fernando
– sequence: 2
  givenname: Vinicius
  surname: Cogo
  fullname: Cogo, Vinicius
– sequence: 3
  givenname: Sebastian
  surname: Wandelt
  fullname: Wandelt, Sebastian
– sequence: 4
  givenname: Ulf
  surname: Leser
  fullname: Leser, Ulf
– sequence: 5
  givenname: Alysson
  surname: Bessani
  fullname: Bessani, Alysson
BackLink https://www.ncbi.nlm.nih.gov/pubmed/26146838$$D View this record in MEDLINE/PubMed
BookMark eNqNk22L1DAQx4uceHer30C0IIi-2DVPfYgvlGXPh4XDhTv1bUiT6W6ONlmTVs5vb9btyvY4REppmP7-_8kMM-fJiXUWkuQpRjNMC_zmxvXeyma2jeEZwpSwHD1IzjCnZJoTRE-OzqfJeQg3CGW0zPNHySnJMctLWp4l71d2egGttDpdWg23xq7T2vn0CmrwYDsjm3Th2q2HEIyzqavTiy_z9Bp-9GAVhMfJw1o2AZ4M30ny7eOHr4vP08vVp-VifjlVOSfdNNO8KAApRUDXrJZS8rrMsGK8JpWuCqCKSkVKDopVSsdTpkuaI2AaY840nSTP977bxgUx1B4EzjnLcIE5isRyT2gnb8TWm1b6X8JJI_4EnF8L6TujGhBcaUIxzVSFEGN5zAicAisxIYyWuIpe74ZsfdWCVrERXjYj0_EfazZi7X4KxjjF8Z0krwYD72KnQidaExQ0jbTg-v29eU4ZIxF9cQe9v7qBWstYgLG1i3nVzlTMGUEFKViZRWp2DxUfDa1RcVBqE-MjweuRIDId3HZr2YcgltdX_8-uvo_Zl0fsBmTTbYJr-i7OUBiDz447_bfFhwmNwNs9oLwLwUMtlOnkzieWZhqBkditw6FpYrcOYliHKGZ3xAf_f8p-A_i7DLA
CitedBy_id crossref_primary_10_1186_s12859_022_04825_5
crossref_primary_10_1186_s12859_018_2230_2
crossref_primary_10_1002_cpe_6339
crossref_primary_10_1109_TC_2020_2994774
crossref_primary_10_1371_journal_pone_0232942
Cites_doi 10.1186/1748-7188-7-30
10.1186/1748-7188-8-25
10.1109/TCBB.2013.122
10.1126/science.1197891
10.1145/792548.611988
10.1016/0306-4573(94)90014-0
10.1093/bioinformatics/btn582
10.1038/ng.877
10.1093/bioinformatics/btr505
10.1186/1471-2164-9-517
10.1371/journal.pone.0109384
10.1093/hmg/ddq416
10.1093/bioinformatics/btp319
ContentType Journal Article
Copyright COPYRIGHT 2015 Public Library of Science
2015 Alves et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2015 Alves et al 2015 Alves et al
Copyright_xml – notice: COPYRIGHT 2015 Public Library of Science
– notice: 2015 Alves et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2015 Alves et al 2015 Alves et al
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
IOV
ISR
3V.
7QG
7QL
7QO
7RV
7SN
7SS
7T5
7TG
7TM
7U9
7X2
7X7
7XB
88E
8AO
8C1
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
KB.
KB0
KL.
L6V
LK8
M0K
M0S
M1P
M7N
M7P
M7S
NAPCQ
P5Z
P62
P64
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PYCSY
RC3
7X8
5PM
DOA
DOI 10.1371/journal.pone.0132460
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Opposing Viewpoints
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
Nursing & Allied Health Database
Ecology Abstracts
Entomology Abstracts (Full archive)
Immunology Abstracts
Meteorological & Geoastrophysical Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Agricultural Science Collection
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability (subscription)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Database
ProQuest Central
Technology collection
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
ProQuest SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database
Nursing & Allied Health Database (Alumni Edition)
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest Engineering Collection
ProQuest Biological Science Collection
Agricultural Science Database
Health & Medical Collection (Alumni Edition)
PML(ProQuest Medical Library)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biological Science Database
Engineering Database
Nursing & Allied Health Premium
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Environmental Science Database
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering collection
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Agricultural Science Database
Publicly Available Content Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Meteorological & Geoastrophysical Abstracts
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
Virology and AIDS Abstracts
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
Entomology Abstracts
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest Nursing & Allied Health Source (Alumni)
Engineering Research Database
ProQuest One Academic
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
ProQuest Engineering Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
Materials Science Database
ProQuest Materials Science Collection
ProQuest Public Health
ProQuest Nursing & Allied Health Source
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Animal Behavior Abstracts
Materials Science & Engineering Collection
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList

MEDLINE
Agricultural Science Database

MEDLINE - Academic


Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals (DOAJ)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Computer Science
DocumentTitleAlternate On-Demand Indexing for Referential Compression of DNA Sequences
EISSN 1932-6203
EndPage e0132460
ExternalDocumentID 1694517190
oai_doaj_org_article_9cd23135cb004464bce93e481224381b
PMC4493149
3736055111
A420727485
26146838
10_1371_journal_pone_0132460
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
123
29O
2WC
53G
5VS
7RV
7X2
7X7
7XC
88E
8AO
8C1
8CJ
8FE
8FG
8FH
8FI
8FJ
A8Z
AAFWJ
AAUCC
AAWOE
AAYXX
ABDBF
ABIVO
ABJCF
ABUWG
ACCTH
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
AEAQA
AENEX
AEUYN
AFFHD
AFKRA
AFPKN
AFRAH
AHMBA
ALMA_UNASSIGNED_HOLDINGS
AOIJS
APEBS
ARAPS
ATCPS
BAIFH
BAWUL
BBNVY
BBTPI
BCNDV
BENPR
BGLVJ
BHPHI
BKEYQ
BPHCQ
BVXVI
BWKFM
CCPQU
CITATION
CS3
D1I
D1J
D1K
DIK
DU5
E3Z
EAP
EAS
EBD
EMOBN
ESX
EX3
F5P
FPL
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IEA
IGS
IHR
IHW
INH
INR
IOV
IPY
ISE
ISR
ITC
K6-
KB.
KQ8
L6V
LK5
LK8
M0K
M1P
M48
M7P
M7R
M7S
M~E
NAPCQ
O5R
O5S
OK1
OVT
P2P
P62
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PV9
PYCSY
RNS
RPM
RZL
SV3
TR2
UKHRP
WOQ
WOW
~02
~KM
ALIPV
CGR
CUY
CVF
ECM
EIF
IPNFZ
NPM
RIG
BBORY
3V.
7QG
7QL
7QO
7SN
7SS
7T5
7TG
7TM
7U9
7XB
8FD
8FK
AZQEC
C1K
DWQXO
ESTFP
FR3
GNUQQ
H94
K9.
KL.
M7N
P64
PKEHL
PQEST
PQUKI
PRINS
RC3
7X8
PUEGO
5PM
-
02
AAPBV
ABPTK
ADACO
BBAFP
KM
ID FETCH-LOGICAL-c692t-5d977e0cc2edf4faaa9f851c49f2bdb7e3c3ac289ec4bcdc285d8360e4d1194d3
IEDL.DBID 7RV
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000358157600275&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1932-6203
IngestDate Fri Nov 26 17:12:56 EST 2021
Mon Nov 10 04:31:39 EST 2025
Tue Nov 04 01:47:20 EST 2025
Thu Oct 02 11:00:16 EDT 2025
Mon Oct 20 01:23:46 EDT 2025
Sat Nov 29 13:02:28 EST 2025
Sat Nov 29 10:14:06 EST 2025
Wed Nov 26 08:55:48 EST 2025
Wed Nov 26 09:40:15 EST 2025
Thu May 22 21:24:07 EDT 2025
Mon Jul 21 06:03:54 EDT 2025
Sat Nov 29 02:25:34 EST 2025
Tue Nov 18 22:27:13 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c692t-5d977e0cc2edf4faaa9f851c49f2bdb7e3c3ac289ec4bcdc285d8360e4d1194d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Conceived and designed the experiments: FA VC AB. Performed the experiments: FA VC. Analyzed the data: FA VC SW UL AB. Contributed reagents/materials/analysis tools: FA SW. Wrote the paper: FA VC SW UL AB.
Current address: LaSIGE, DI, FC/UL, Room C6.3.35, Campo Grande, 1749—016, Lisbon, Portugal
Competing Interests: The authors have declared that no competing interests exist.
OpenAccessLink https://www.proquest.com/docview/1694517190?pq-origsite=%requestingapplication%
PMID 26146838
PQID 1694517190
PQPubID 1436336
ParticipantIDs plos_journals_1694517190
doaj_primary_oai_doaj_org_article_9cd23135cb004464bce93e481224381b
pubmedcentral_primary_oai_pubmedcentral_nih_gov_4493149
proquest_miscellaneous_1694963442
proquest_journals_1694517190
gale_infotracmisc_A420727485
gale_infotracacademiconefile_A420727485
gale_incontextgauss_ISR_A420727485
gale_incontextgauss_IOV_A420727485
gale_healthsolutions_A420727485
pubmed_primary_26146838
crossref_citationtrail_10_1371_journal_pone_0132460
crossref_primary_10_1371_journal_pone_0132460
PublicationCentury 2000
PublicationDate 2015-07-06
PublicationDateYYYYMMDD 2015-07-06
PublicationDate_xml – month: 07
  year: 2015
  text: 2015-07-06
  day: 06
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco
– name: San Francisco, CA USA
PublicationTitle PloS one
PublicationTitleAlternate PLoS One
PublicationYear 2015
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References S Deorowicz (ref17) 2011; 27
SD Kahn (ref3) 2011; 331
ref23
EE Schadt (ref1) 2010; 19
ref20
S Deorowicz (ref7) 2013; 8
S Kurtz (ref16) 2008; 9
S Wandelt (ref22) 2012; 7
ref2
A Danek (ref11) 2014; 9
S Wandelt (ref15) 2013; 10
S Kuruppu (ref13) 2011
S Wandelt (ref9) 2013
ref18
MR Wick (ref19) 2003; 35
S Gottipati (ref21) 2011; 43
MC Brandon (ref10) 2009; 25
ref6
ref5
S Grumbach (ref8) 1994; 30
J Zhang (ref4) 2011
S Christley (ref12) 2009; 25
S Deorowicz (ref14) 2011; 27
24252160 - Algorithms Mol Biol. 2013 Nov 18;8(1):25
20858600 - Hum Mol Genet. 2010 Oct 15;19(R2):R227-40
25289699 - PLoS One. 2014;9(10):e109384
24524158 - IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1275-88
18996942 - Bioinformatics. 2009 Jan 15;25(2):274-5
21775991 - Nat Genet. 2011 Aug;43(8):741-3
21930502 - Database (Oxford). 2011;2011:bar026
21311016 - Science. 2011 Feb 11;331(6018):728-9
21896510 - Bioinformatics. 2011 Nov 1;27(21):2979-86
18976482 - BMC Genomics. 2008;9:517
19447783 - Bioinformatics. 2009 Jul 15;25(14):1731-8
23146997 - Algorithms Mol Biol. 2012 Nov 12;7(1):30
References_xml – year: 2013
  ident: ref9
  article-title: Trends in Genome Compression
  publication-title: Current Bioinformatics
– volume: 7
  start-page: 30
  year: 2012
  ident: ref22
  article-title: Adaptive efficient compression of genomes
  publication-title: Algorithms for Molecular Biology
  doi: 10.1186/1748-7188-7-30
– ident: ref2
– start-page: 91
  year: 2011
  ident: ref13
  article-title: Proceedings of the Thirty-Fourth Australasian Computer Science Conference—Volume 113. ACSC’11
– ident: ref5
– ident: ref6
– volume: 8
  start-page: 25
  issue: 1
  year: 2013
  ident: ref7
  article-title: Data compression for sequencing data
  publication-title: Algorithms for Molecular Biology
  doi: 10.1186/1748-7188-8-25
– volume: 10
  start-page: 1275
  issue: 5
  year: 2013
  ident: ref15
  article-title: FRESCO: Referential Compression of Highly Similar Sequences
  publication-title: Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  doi: 10.1109/TCBB.2013.122
– ident: ref20
– year: 2011
  ident: ref4
  article-title: International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data
  publication-title: Database
– volume: 331
  start-page: 728
  issue: 6018
  year: 2011
  ident: ref3
  article-title: On the Future of Genomic Data
  publication-title: Science
  doi: 10.1126/science.1197891
– ident: ref23
– volume: 35
  start-page: 283
  issue: 1
  year: 2003
  ident: ref19
  article-title: An Object-oriented Refactoring of Huffman Encoding Using the Java Collections Framework
  publication-title: SIGCSE Bull
  doi: 10.1145/792548.611988
– volume: 30
  start-page: 875
  issue: 6
  year: 1994
  ident: ref8
  article-title: A new challenge for compression algorithms: genetic sequences
  publication-title: Inf Process Manage
  doi: 10.1016/0306-4573(94)90014-0
– volume: 25
  start-page: 274
  issue: 2
  year: 2009
  ident: ref12
  article-title: Human genomes as email attachments
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn582
– volume: 43
  start-page: 741
  issue: 8
  year: 2011
  ident: ref21
  article-title: Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing
  publication-title: Nature genetics
  doi: 10.1038/ng.877
– volume: 27
  start-page: 2979
  issue: 21
  year: 2011
  ident: ref14
  article-title: Robust relative compression of genomes with random access
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr505
– volume: 9
  start-page: 517
  issue: 1
  year: 2008
  ident: ref16
  article-title: A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-9-517
– volume: 9
  start-page: e109384
  issue: 10
  year: 2014
  ident: ref11
  article-title: Indexes of large genome collections on a PC
  publication-title: PloS one
  doi: 10.1371/journal.pone.0109384
– ident: ref18
– volume: 19
  start-page: R227
  issue: R2
  year: 2010
  ident: ref1
  article-title: A window into third-generation sequencing
  publication-title: Human molecular genetics
  doi: 10.1093/hmg/ddq416
– volume: 25
  start-page: 1731
  issue: 14
  year: 2009
  ident: ref10
  article-title: Data structures and compression algorithms for genomic sequence data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp319
– volume: 27
  start-page: 2979
  issue: 21
  year: 2011
  ident: ref17
  article-title: Robust relative compression of genomes with random access
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr505
– reference: 24524158 - IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1275-88
– reference: 20858600 - Hum Mol Genet. 2010 Oct 15;19(R2):R227-40
– reference: 18976482 - BMC Genomics. 2008;9:517
– reference: 19447783 - Bioinformatics. 2009 Jul 15;25(14):1731-8
– reference: 25289699 - PLoS One. 2014;9(10):e109384
– reference: 21896510 - Bioinformatics. 2011 Nov 1;27(21):2979-86
– reference: 21311016 - Science. 2011 Feb 11;331(6018):728-9
– reference: 24252160 - Algorithms Mol Biol. 2013 Nov 18;8(1):25
– reference: 18996942 - Bioinformatics. 2009 Jan 15;25(2):274-5
– reference: 21775991 - Nat Genet. 2011 Aug;43(8):741-3
– reference: 21930502 - Database (Oxford). 2011;2011:bar026
– reference: 23146997 - Algorithms Mol Biol. 2012 Nov 12;7(1):30
SSID ssj0053866
Score 2.196047
Snippet The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of...
SourceID plos
doaj
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e0132460
SubjectTerms Algorithms
Bioinformatics
Biological evolution
Chromosomes
Compression
Computer science
Data compression
Data Compression - methods
Demand
Deoxyribonucleic acid
DNA
DNA sequencing
Gene sequencing
Genomes
Genomics
Genomics - methods
Humans
Indexing
Indexing (Content analysis)
Molecular biology
Nucleotide sequence
Ratios
Sequence Analysis, DNA - methods
Software
Storage
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELbQigMXRHk1UMAgJOCQdhPbiX1CC6Wily2igHqzHNuBSkuyanb5_czETmhQpXLgtlpPVptvHp6Rx98Q8tIJqaSxJs3qsk55JXyqTFWkXlrphTDGl64fNlEul_LsTH26NOoLe8ICPXAA7kBZBykIEzhJHkoXXlmvmOcST4Rgt6kw-s5LNRRTIQaDFxdFvCjHyuwg6mV_3TZ-H08XeE9J-Wcj6vn6x6g8W6_a7qqU8-_OyUtb0dEdcjvmkHQR_vsOueGbu2QnemlHX0cq6Tf3yNuTJj30P03j6DHSIsI-RSFLpZFeFrx7RTEihGbYhrY1PVwu6OnQX32ffD368OX9xzSOTEhtofJNKhzkc35ube5dzWtjjKohp7Jc1XmFTMrMMmOhyPIWkHTwSTi8xuG5yzLFHXtAZg2AtEtobbNKOhDhHlHPlRFW2LlnmQBVeJYQNuCnbeQTx7EWK90fkpVQVwQ4NKKuI-oJScen1oFP4xr5d6iaURbZsPsvwEZ0tBF9nY0k5BkqVoerpaNP6wUH04GyXIqEvOglkBGjwZab72bbdfr45Ns_CJ1-ngi9ikJ1C3BYE685wDsh09ZEcm8iCX5tJ8u7aIYDKp3OCsVFVkIGB08Opnn18vNxGX8U2-ga326DDERczvOEPAyWPCILpTQvJJMJKSc2PoF-utKc_-gJyTlXDCrtR_9DV4_JLchJRd8RXeyR2eZi65-Qm_bX5ry7eNp7-W9rBVbv
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Public Library of Science (PLoS) Journals Open Access
  dbid: FPL
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELbQwoELpeXRQAGDkIBDyiZ-xD6hhbKiEtpWFFBvluMHVFqSVbPL72ecOIFUrYBbFH-OnLFnPKMZf0bouWVCCm10mvnCp7RkLpW65KkTRjjGtHaFbS-bKBYLcXoqj38Hihcy-KTIXkeZ7q_qyu2HzADlEKJfzwnnIdiaH3_sLS_oLufxeNxVPUfbT8vSP9jiyWpZN5c5mhfrJf_YgOZb_zv02-hWdDXxrFsb2-iaq3bQVn-NA45avYO241ODX0YS6ld30JujKj1wP3Rl8WEgVIQdDoN_iyMxLdiFJQ6f6spoK1x7fLCY4ZO-Mvsu-jJ___ndhzRetpAaLvN1yix4gm5qTO6sp15rLT14Y4ZKn5eBg5kYog2EZ87Q0lh4YjYcAHHUZpmkltxDkwr-cxdhb7JSWIBQxwGbS80MM1NHMgZK7kiCSD8HykQm8nAhxlK16bUCIpJOQioITkXBJSgdeq06Jo6_4N-G6R2wgUe7fQEzpKJaKmksOLiEmbLNbMNonSSOipBvBF-mTNCTsDhUdyh1sAZqRvMpeH5UsAQ9axGBS6MKxTrf9KZp1OHR138AnXwagV5EkK9BHEbHAxLwT4Gja4TcGyHBIphR825Yyr1UGpVxSVlWgO8HPfvlfXnz06E5fDQU4FWu3nQYsNWU5gm632nDIFkIwikXRCSoGOnJSPTjlurse0tlTqkkEKM_uHrED9FN8FFZWyHN99Bkfb5xj9AN83N91pw_bvX_F9m3WrI
  priority: 102
  providerName: Public Library of Science
Title On-Demand Indexing for Referential Compression of DNA Sequences
URI https://www.ncbi.nlm.nih.gov/pubmed/26146838
https://www.proquest.com/docview/1694517190
https://www.proquest.com/docview/1694963442
https://pubmed.ncbi.nlm.nih.gov/PMC4493149
https://doaj.org/article/9cd23135cb004464bce93e481224381b
http://dx.doi.org/10.1371/journal.pone.0132460
Volume 10
WOSCitedRecordID wos000358157600275&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals (DOAJ)
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M~E
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Agricultural Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M0K
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/agriculturejournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M7P
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M7S
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Environmental Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: PATMY
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/environmentalscience
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7X7
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Materials Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KB.
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/materialsscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Nursing & Allied Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7RV
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/nahs
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest advanced technologies & aerospace journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: P5Z
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: BENPR
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Public Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8C1
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/publichealth
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: PIMPY
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVATS
  databaseName: Public Library of Science (PLoS) Journals Open Access
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: FPL
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.plos.org/publications/
  providerName: Public Library of Science
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwELeg44EXxsbHAqMEhAQ8pGsSO4mfpnZbRTXWRS1MHS-RYztjUklK0_L3c3acjKAJkHg5RfE5cs7n89k-_w6hN4JENGKcOW4WZg5OiXQoSwNHRjyShDAmQ6GTTYSTSTSf09hsuJUmrLK2idpQi4KrPfIDN6CYuCHMX4fL747KGqVOV00Kjbtoy1W-MehzOL2oLTGM5SAw1-X80D0wvdNbFrnsqTMGrIEpb6Yjjdrf2ObOclGUtzmev8dP_jIhjbb_91ceogfGFbUHle7soDsy30XbdZoH24z6XbRjnkr7nQGpfv8IHZ7nzrH8xnJhjxXgIsyANvi_tgGuBbuxsNWnqjDb3C4y-3gysGd15PZj9Hl08unog2OSMTg8oN7aIQI8Rdnn3JMiwxljjGbgrXFMMy9VGM0-9xmH5ZvkOOUCnohQF0QkFq5LsfCfoE4Ogt9DdsbdNBLAgmUAvB5lhBPel75LwAhI30J-3ScJN0jlKmHGItHHbyGsWCoJJaonE9OTFnKaWssKqeMv_EPV3Q2vwtnWL4rVVWKGbUK5AAfYJzzVJ9_QWkl9iSN1Hgm-Tmqhl0pZkurSamMtkgH2-uAZ4ohY6LXmUFgbuQrmuWKbskzG5xf_wDSbtpjeGqasAHFwZi5QwD8pDK8W536LEywGbxXvKdWupVImNwoJNWuVvb34VVOsPqoC9HJZbCoesOUYexZ6Wo2ORrKwSMdB5EcWClvjpiX6dkl-_VVDnWNMfVjDP_tzs56j--DHEh1FHeyjznq1kS_QPf5jfV2uutomKDoPNY2ARkduF20NTybxtKu3YYCO4o9AT4c9oGf9U0XDWNMZ0Jh8gRrx-Cy-_Ak18Xkn
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VBQkulJZHA4UaBAIOaTeJncQHVC2UqquWbUULqrgEx3ZKpSVZNrsg_hS_kXHipARVwKUHbtF6YiWTmW_GOy-Ax4rFPBZSuF4WZS5NmXa5SENXxzLWjAmhI1UNm4hGo_j4mB8swI-mFsakVTaYWAG1KqT5j3zDCzllXoT2a3PyxTVTo0x0tRmhUYvFrv7-DY9s5YvhFn7fJ76__fro1Y5rpwq4MuT-zGUKXR7dl9LXKqOZEIJn6HZIyjM_Nc2GAxkIiecQLWkqFV4xZSodNFUenvhVgPtegsuU-n2jRQfsQ4P8iB1haMvzgsjbsNKwPilyvW5iGrRqhHlm_qopAa0t6E3GRXmeo_t7vuYvBnB78X9j3Q24bl1tMqh1YwkWdL4Mi80YC2JRbRmW7FVJntkm3M9vwuZ-7m7pzyJXZGgaSqKFJ-jfE9uYF3FxTMxWdRpxToqMbI0G5LDJTL8F7y7k3W5DL8cPvQIkk14aKyShOkRanwsmmezrwGMIcjpwIGhkIJG2E7sZCDJOqvBihCeymkOJkZzESo4DbnvXpO5E8hf6l0a8WlrTR7z6oZieJBaWEi4VOvgBk2kV2cen1TzQNDbxVvTlUgfWjHAmdVFui4bJAKUfPV8aMwceVRSml0hukpVOxLwsk-H--38gOnzbIXpqibIC2SGFLRDBdzI9yjqUqx1KRETZWV4xqtRwpUzOFADvbFTk_OWH7bLZ1CQg5rqY1zRoq1DrHbhTa2PLWT809YtB7EDU0dMO67sr-emnqpU7pTzwKL_758dag6s7R2_2kr3haPceXEOfnVUZ4-Eq9GbTub4PV-TX2Wk5fVDhEYGPF63FPwH2vM1F
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3Nb9MwFLdGQYgLY-NjgcEMAgGHrE1iJ_EBTYVSUQ11FQM0cQmO7YxJJSlNC-Jf46_jvcTpCJqAyw7covoXK3l5n_X7IOSh5rGIpZKul0WZy1JuXCHT0DWxig3nUppIV8MmovE4PjoSkzXyo6mFwbTKRidWiloXCv8j73qhYNyLwH51M5sWMRkM92ZfXJwghSetzTiNmkX2zfdvEL6Vz0YD-NaPfH_48u2LV66dMOCqUPgLl2twf0xPKd_ojGVSSpGBC6KYyPwUGw8HKpAKYhKjWKo0XHGNVQ-GaQ-ifx3AvhfIxQhiTEwnnPAPjRUAPRKGtlQviLyu5YzdWZGbXTzfYFVTzFNTWE0MWNmFzmxalGc5vb_nbv5iDIfr_zMZr5Gr1gWn_VpmNsiayTfJejPeglptt0k27FVJn9jm3E-vk72D3B2YzzLXdISNJsHyU_D7qW3YC_pySnGrOr04p0VGB-M-PWwy1m-Qd-fybjdJJ4ePvkVoprw01gBhJgSsLyRXXPVM4HFQfiZwSNDwQ6Jsh3YcFDJNqmPHCCK1mkIJclFiucgh7uquWd2h5C_458hqKyz2F69-KObHiVVXiVAaHP-Aq7Q68YenNSIwLMZzWPDxUofsIKMmdbHuSksmfeb3wCNmMXfIgwqBPUZyZLNjuSzLZHTw_h9Ah29aoMcWlBVADiVt4Qi8E_YuayG3W0jQlKq1vIVi1VClTE6FAe5sxOXs5furZdwUExNzUyxrDNgwxnyH3Kolc0VZP8S6xiB2SNSS2Rbp2yv5yaeqxTtjIvCYuP3nx9ohl0F4k9ej8f4dcgVceV4lkofbpLOYL81dckl9XZyU83uVaqLk43kL8U9NyNYP
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On-Demand+Indexing+for+Referential+Compression+of+DNA+Sequences&rft.jtitle=PloS+one&rft.au=Alves%2C+Fernando&rft.au=Cogo%2C+Vinicius&rft.au=Wandelt%2C+Sebastian&rft.au=Leser%2C+Ulf&rft.date=2015-07-06&rft.issn=1932-6203&rft.eissn=1932-6203&rft.volume=10&rft.issue=7&rft.spage=e0132460&rft_id=info:doi/10.1371%2Fjournal.pone.0132460&rft.externalDBID=n%2Fa&rft.externalDocID=10_1371_journal_pone_0132460
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon