GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one Jg. 9; H. 8; S. e103833
Hauptverfasser: Suzuki, Shuji, Kakuta, Masanori, Ishida, Takashi, Akiyama, Yutaka
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States Public Library of Science 06.08.2014
Public Library of Science (PLoS)
Schlagworte:
ISSN:1932-6203, 1932-6203
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.
AbstractList DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131–165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.
DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.
DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at
Audience Academic
Author Ishida, Takashi
Akiyama, Yutaka
Kakuta, Masanori
Suzuki, Shuji
AuthorAffiliation American University in Cairo, Egypt
Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
AuthorAffiliation_xml – name: American University in Cairo, Egypt
– name: Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
Author_xml – sequence: 1
  givenname: Shuji
  surname: Suzuki
  fullname: Suzuki, Shuji
– sequence: 2
  givenname: Masanori
  surname: Kakuta
  fullname: Kakuta, Masanori
– sequence: 3
  givenname: Takashi
  surname: Ishida
  fullname: Ishida, Takashi
– sequence: 4
  givenname: Yutaka
  surname: Akiyama
  fullname: Akiyama, Yutaka
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25099887$$D View this record in MEDLINE/PubMed
BookMark eNqNk1Fr2zAQx83oWNts32BshsHYHpJJli1bfRiEbmsChbClHXsTF1lyFGwrk-zSfPvJjTPiUsbwg83d7_6n-1t3HpzUppZB8BqjCSYp_rQxra2hnGx9eIIwIhkhz4IzzEg0phEiJ0ffp8G5cxuEEpJR-iI4jRLEWJalZ0F1NVssb35dhNM6nFdba-5kHi7l71bWQoYzU5nSFDsfASvW4bQsjNXNugpvna6LEMLvrbQ-3Sql78OptbALoc594gs0sAInB7mXwXMFpZOv-vcouP329eZyNr5eXM0vp9djkaGoGSdpFquMqiimUklFqRICUqZWOM3xSlGC_Rx5DFkCIosYzRGjjK6QTBKcylVMRsHbve62NI73TjmOfZ4y5tU9Md8TuYEN31pdgd1xA5o_BIwtONhGi1JyIIlIZZ4zpFjsHQWUpxGKCSDCAMtO63PfrV1VMheybiyUA9FhptZrXpg7HuOIMP9TRsGHXsAa77xreKWdkGUJtTTtw7kjkqIk6Xq9e4Q-PV1PFeAH0LUyvq_oRPk0xlmKUpJ1bSdPUP7JZaWFv1ZK-_ig4OOgwDONvG8KaJ3j8-WP_2cXP4fs-yN2LaFs1s6UbaNN7Ybgm2On_1p8uM8euNgDwhrnrFRc6AY6HT-aLjlGvFueg2m8Wx7eL48vjh8VH_T_WfYHrysb8w
CitedBy_id crossref_primary_10_1186_s12864_017_3504_1
crossref_primary_10_1186_s12864_023_09495_y
crossref_primary_10_1093_bioinformatics_btae397
crossref_primary_10_1186_s40168_018_0460_1
crossref_primary_10_1093_nar_gkv1070
crossref_primary_10_3390_biology9090295
crossref_primary_10_3390_genes12111656
crossref_primary_10_1038_s41598_020_65277_6
crossref_primary_10_1016_j_toxicon_2023_107556
crossref_primary_10_3389_fmolb_2023_1137303
crossref_primary_10_1016_j_jmb_2015_11_006
crossref_primary_10_1080_09168451_2018_1476122
crossref_primary_10_3389_fmicb_2022_955032
crossref_primary_10_3390_biology13110952
crossref_primary_10_1038_s41598_021_94059_x
crossref_primary_10_1111_1462_2920_14730
crossref_primary_10_1093_nar_gky1013
crossref_primary_10_1002_pro_3711
crossref_primary_10_1016_j_ygeno_2020_10_015
crossref_primary_10_1016_j_mimet_2020_105860
crossref_primary_10_1007_s12088_016_0629_x
crossref_primary_10_3390_ijms18102124
crossref_primary_10_1007_s13258_017_0629_1
crossref_primary_10_1128_msystems_00949_23
crossref_primary_10_1177_03009858211052662
crossref_primary_10_1111_ppl_70306
crossref_primary_10_1016_j_csbj_2019_07_011
crossref_primary_10_1016_j_scitotenv_2019_07_140
crossref_primary_10_1016_j_jbiotec_2017_02_020
crossref_primary_10_1186_s12859_024_05766_x
crossref_primary_10_1371_journal_pone_0192898
crossref_primary_10_1038_s41598_019_46610_0
crossref_primary_10_1099_ijsem_0_005268
crossref_primary_10_1159_000524437
crossref_primary_10_1186_s12859_021_04425_9
crossref_primary_10_3389_fpls_2018_00902
crossref_primary_10_1128_aem_00272_23
crossref_primary_10_1128_AEM_02068_20
crossref_primary_10_3389_fmicb_2024_1414422
crossref_primary_10_3390_genes12091455
crossref_primary_10_1186_s13068_015_0387_8
crossref_primary_10_1016_j_imu_2020_100323
crossref_primary_10_1128_Spectrum_00166_21
crossref_primary_10_1038_srep29043
crossref_primary_10_1371_journal_pone_0157338
crossref_primary_10_1093_nar_gkw1092
crossref_primary_10_1016_j_gene_2023_148045
crossref_primary_10_3389_fgene_2022_839453
crossref_primary_10_1016_j_envpol_2021_117774
Cites_doi 10.1186/1471-2105-12-159
10.1093/bioinformatics/btr595
10.1016/0888-7543(91)90071-L
10.1111/j.1742-4658.2005.04945.x
10.1093/nar/28.1.27
10.1093/bioinformatics/btq644
10.1093/dnares/dsm018
10.1093/nar/gkr988
10.1038/nature11234
10.1038/nature05414
10.1016/S0022-2836(05)80360-2
10.1101/gr.229202. Article published online before March 2002
10.1093/nar/25.17.3389
ContentType Journal Article
Copyright COPYRIGHT 2014 Public Library of Science
2014 Suzuki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2014 Suzuki et al 2014 Suzuki et al
Copyright_xml – notice: COPYRIGHT 2014 Public Library of Science
– notice: 2014 Suzuki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2014 Suzuki et al 2014 Suzuki et al
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
IOV
ISR
3V.
7QG
7QL
7QO
7RV
7SN
7SS
7T5
7TG
7TM
7U9
7X2
7X7
7XB
88E
8AO
8C1
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
KB.
KB0
KL.
L6V
LK8
M0K
M0S
M1P
M7N
M7P
M7S
NAPCQ
P5Z
P62
P64
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PYCSY
RC3
7X8
5PM
DOA
DOI 10.1371/journal.pone.0103833
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Opposing Viewpoints
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
Nursing & Allied Health Database
Ecology Abstracts
Entomology Abstracts (Full archive)
Immunology Abstracts
Meteorological & Geoastrophysical Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Agricultural Science Collection
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability (subscription)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
Agricultural & Environmental Science Collection
ProQuest Central Essentials
ProQuest : Biological Science Collection journals [unlimited simultaneous users]
ProQuest Central (ProQuest)
Technology collection
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One
ProQuest Materials Science Collection
ProQuest Central
Engineering Research Database
Health Research Premium Collection (ProQuest)
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database (ProQuest)
Nursing & Allied Health Database (Alumni Edition)
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest Engineering Collection
ProQuest Biological Science Collection
Agriculture Science Database
Health & Medical Collection (Alumni Edition)
PML(ProQuest Medical Library)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biological Science Database
Engineering Database
Nursing & Allied Health Premium
ProQuest advanced technologies & aerospace journals
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Environmental Science Database
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Agricultural Science Database
Publicly Available Content Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Meteorological & Geoastrophysical Abstracts
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
Virology and AIDS Abstracts
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
Entomology Abstracts
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest Nursing & Allied Health Source (Alumni)
Engineering Research Database
ProQuest One Academic
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
ProQuest Engineering Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
Materials Science Database
ProQuest Materials Science Collection
ProQuest Public Health
ProQuest Nursing & Allied Health Source
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Animal Behavior Abstracts
Materials Science & Engineering Collection
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList Agricultural Science Database
MEDLINE
MEDLINE - Academic





Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Engineering
DocumentTitleAlternate GHOSTX: An Improved Sequence Homology Search Algorithm
EISSN 1932-6203
ExternalDocumentID 1551699784
oai_doaj_org_article_a35c7edd90f94932a0d72043a039a1e4
PMC4123905
3395002881
A418707385
25099887
10_1371_journal_pone_0103833
Genre Research Support, Non-U.S. Gov't
Journal Article
GeographicLocations Japan
GeographicLocations_xml – name: Japan
GroupedDBID ---
123
29O
2WC
53G
5VS
7RV
7X2
7X7
7XC
88E
8AO
8C1
8CJ
8FE
8FG
8FH
8FI
8FJ
A8Z
AAFWJ
AAUCC
AAWOE
AAYXX
ABDBF
ABIVO
ABJCF
ABUWG
ACCTH
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
AEAQA
AENEX
AEUYN
AFFHD
AFKRA
AFPKN
AFRAH
AHMBA
ALMA_UNASSIGNED_HOLDINGS
AOIJS
APEBS
ARAPS
ATCPS
BAWUL
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BKEYQ
BPHCQ
BVXVI
BWKFM
CCPQU
CITATION
CS3
D1I
D1J
D1K
DIK
DU5
E3Z
EAP
EAS
EBD
EMOBN
ESX
EX3
F5P
FPL
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IEA
IGS
IHR
IHW
INH
INR
IOV
IPY
ISE
ISR
ITC
K6-
KB.
KQ8
L6V
LK5
LK8
M0K
M1P
M48
M7P
M7R
M7S
M~E
NAPCQ
O5R
O5S
OK1
OVT
P2P
P62
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PYCSY
RNS
RPM
SV3
TR2
UKHRP
WOQ
WOW
~02
~KM
ALIPV
CGR
CUY
CVF
ECM
EIF
IPNFZ
NPM
PV9
RIG
RZL
BBORY
3V.
7QG
7QL
7QO
7SN
7SS
7T5
7TG
7TM
7U9
7XB
8FD
8FK
AZQEC
C1K
DWQXO
ESTFP
FR3
GNUQQ
H94
K9.
KL.
M7N
P64
PKEHL
PQEST
PQUKI
PRINS
RC3
7X8
PUEGO
5PM
-
02
AAPBV
ABPTK
ADACO
BBAFP
KM
ID FETCH-LOGICAL-c802t-5784f86f246efef66fcca79fb17d1bf631386d4a85ac8296d09696b0e5517eb43
IEDL.DBID FPL
ISICitedReferencesCount 63
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000339995100035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1932-6203
IngestDate Fri Nov 26 17:12:39 EST 2021
Fri Oct 03 12:31:52 EDT 2025
Tue Nov 04 01:54:20 EST 2025
Sat Sep 27 21:29:21 EDT 2025
Tue Oct 07 07:19:01 EDT 2025
Sat Nov 29 13:01:01 EST 2025
Sat Nov 29 10:16:47 EST 2025
Wed Nov 26 10:35:21 EST 2025
Wed Nov 26 10:28:54 EST 2025
Thu May 22 21:22:34 EDT 2025
Mon Jul 21 06:04:23 EDT 2025
Sat Nov 29 06:10:49 EST 2025
Tue Nov 18 22:33:08 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c802t-5784f86f246efef66fcca79fb17d1bf631386d4a85ac8296d09696b0e5517eb43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
Conceived and designed the experiments: SS MK TI YA. Performed the experiments: SS. Analyzed the data: SS. Contributed reagents/materials/analysis tools: SS MK TI. Contributed to the writing of the manuscript: SS MK TI YA.
OpenAccessLink http://dx.doi.org/10.1371/journal.pone.0103833
PMID 25099887
PQID 1551699784
PQPubID 1436336
ParticipantIDs plos_journals_1551699784
doaj_primary_oai_doaj_org_article_a35c7edd90f94932a0d72043a039a1e4
pubmedcentral_primary_oai_pubmedcentral_nih_gov_4123905
proquest_miscellaneous_1552370554
proquest_journals_1551699784
gale_infotracmisc_A418707385
gale_infotracacademiconefile_A418707385
gale_incontextgauss_ISR_A418707385
gale_incontextgauss_IOV_A418707385
gale_healthsolutions_A418707385
pubmed_primary_25099887
crossref_citationtrail_10_1371_journal_pone_0103833
crossref_primary_10_1371_journal_pone_0103833
PublicationCentury 2000
PublicationDate 2014-08-06
PublicationDateYYYYMMDD 2014-08-06
PublicationDate_xml – month: 08
  year: 2014
  text: 2014-08-06
  day: 06
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco
– name: San Francisco, USA
PublicationTitle PloS one
PublicationTitleAlternate PLoS One
PublicationYear 2014
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References PJ Turnbaugh (ref2) 2006; 444
M Kanehisa (ref12) 2012; 40
Y Zhao (ref7) 2012; 28
M Ghodsi (ref10) 2009; 2009
SF Altschul (ref4) 1997; 25
ref8
PD Vouzis (ref9) 2011; 27
WR Pearson (ref14) 1991; 11
Y Ye (ref6) 2011; 12
M Kanehisa (ref11) 2000; 28
(ref13) 2012; 486
ref3
WJ Kent (ref5) 2002; 12
SF Altschul (ref15) 2005; 272
K Kurokawa (ref1) 2007; 14
References_xml – volume: 12
  start-page: 159
  year: 2011
  ident: ref6
  article-title: RAPSearch: a fast protein similarity search tool for short reads
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-12-159
– volume: 2009
  start-page: 83
  year: 2009
  ident: ref10
  article-title: Inexact Local Alignment Search over Suffix Arrays
  publication-title: Proceedings IEEE International Conference on Bioinformatics and Biomedicine
– volume: 28
  start-page: 125
  year: 2012
  ident: ref7
  article-title: RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr595
– volume: 11
  start-page: 635
  year: 1991
  ident: ref14
  article-title: Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms
  publication-title: Genomics
  doi: 10.1016/0888-7543(91)90071-L
– volume: 272
  start-page: 5101
  year: 2005
  ident: ref15
  article-title: Protein database searches using compositionally adjusted substitution matrices
  publication-title: The FEBS journal
  doi: 10.1111/j.1742-4658.2005.04945.x
– volume: 28
  start-page: 27
  year: 2000
  ident: ref11
  article-title: KEGG: Kyoto Encyclopedia of Genes and Genomes
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/28.1.27
– volume: 27
  start-page: 182
  year: 2011
  ident: ref9
  article-title: GPU-BLAST: using graphics processors to accelerate protein sequence alignment
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq644
– volume: 14
  start-page: 169
  year: 2007
  ident: ref1
  article-title: Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes
  publication-title: DNA Research : an international journal for rapid publication of reports on genes and genomes
  doi: 10.1093/dnares/dsm018
– volume: 40
  start-page: D109
  year: 2012
  ident: ref12
  article-title: KEGG for integration and interpretation of large-scale molecular data sets
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/gkr988
– ident: ref8
– volume: 486
  start-page: 207
  year: 2012
  ident: ref13
  article-title: Structure, function and diversity of the healthy human microbiome
  publication-title: Nature
  doi: 10.1038/nature11234
– volume: 444
  start-page: 1027
  year: 2006
  ident: ref2
  article-title: An obesity-associated gut microbiome with increased capacity for energy harvest
  publication-title: Nature
  doi: 10.1038/nature05414
– ident: ref3
  doi: 10.1016/S0022-2836(05)80360-2
– volume: 12
  start-page: 656
  year: 2002
  ident: ref5
  article-title: BLAT---The BLAST-Like Alignment Tool
  publication-title: Genome Research
  doi: 10.1101/gr.229202. Article published online before March 2002
– volume: 25
  start-page: 3389
  year: 1997
  ident: ref4
  article-title: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/25.17.3389
SSID ssj0053866
Score 2.4246883
Snippet DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for...
SourceID plos
doaj
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e103833
SubjectTerms Algorithms
Alternating current
Analysis
Arrays
Bioinformatics
Biology and Life Sciences
Computer applications
Cost analysis
Data bases
Data processing
Databases, Genetic
Deoxyribonucleic acid
DNA
Downloading
Engineering
Gene sequencing
Genomes
Homology
Information science
Nucleotide sequence
Protein families
Proteins
Queries
Research and Analysis Methods
Search algorithms
Seeds
Sensitivity
Sensitivity analysis
Sequence Alignment - methods
Sequence Homology
Software
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9NAEF6hqAcuiJZHUwosCAk4uLW9G6_NrSBKkVCLaEG5Wet9JJYSO7ITBP-emfXGqlGlcuCYzGcrmZmdnbFnvyHklc007Dsc17eJA66YCYqYiwAKMSQbgZJEuYPCX8T5eTqdZl-vjfrCnrCOHrhT3LFkEyWM1lloMw7Jhgw1zlVhMmSZjIxjAg1Fti2muhgMqzhJ_EE5JqJjb5ejVV2ZI5xskDI22IgcX38flUerRd3elHL-3Tl5bSs6vU_u-RySnnS_fZfcMdUe2fWrtKVvPJX02wdk-ens4vJq-o7Kipbu8YHRdNs9Tef10j1Up527U7mY1U25ni8pdsPPqKQAbEC8sbb8RWXTyN9wJw0CbCzFDXAge0i-n368-nAW-AELgUrDeB3AauU2TWzME2ONTRIL9hSZLSKho8ImLAI9ai7TiVRpnCU6RC6dIjSQZglTcPaIjCpQ6T6hcRpCJoNsdazgKlLZJLRSJ_DByjjWZkzYVtu58uzjOARjkbtXagKqkE55Odoo9zYak6C_atWxb9yCf4-G7LHIne2-AI_KvUflt3nUmDxHN8i7g6h9BMhPeATBDdl_xuSlQyB_RoUNOjO5adv888WPfwBdfhuAXnuQrUEdSvpDEfCfkJdrgDwcICEKqIF4H512q5U2796AZmBhuHLryDeLX_RivCk23VWm3jhMzJBtCTCPO7_vNQuZMxTqqRgTMVgRA9UPJVU5d_TlHJIliAQH_8NWT8hdyGC568hMDslo3WzMU7Kjfq7LtnnmYsIfMK1mGg
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Nursing & Allied Health Database
  dbid: 7RV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9NAEF5B4ACHQsujgQILQgIObm3vxg8uKDxKkFALTVvlZq33kUZK7DROEPx7ZtYbU6MKkDg6M7bjee3s7uw3hDw3qYJxh6N_69DjkmkvD3nswUQMwUZgSiLtQeHP8cFBMhqlX9yCW-XKKtcx0QZqVUpcI9-rd3RgzsPfzM897BqFu6uuhcZVci3A3BjsOT46XUdi8OUocsflWBzsOe3szstC72J_g4Sx1nBkUfub2NyZT8vqssTz9_rJCwPS_q3__ZTbZMOlorRf284muaKLLXLzAkDhFtl0rl_Rlw6f-tUdMvs4OBwej17TfkHrNQmt6NCVZNNBObMr9bQuZKb96RjevTybUVudQAX9utILIK-MmXyHty_EDyoKBYT3YilwVG3R7pKT_Q_H7wae69rgycQPlx6EAG6SyIQ80kabKDJgJHFq8iBWQW4iFoBaFBdJT8gkTCPlI0BP7muQSqxzzu6RTgEa2iY0THxIjxACj-VcBjLt-UaoCC6MCEOlu4StlZdJB2mOnTWmmd2ni2FqU4syQ5VnTuVd4jV3zWtIj7_wv0W7aHgRkNv-UC7GmfPvTLCejLVSqW9SDjmx8BW2_2HCZ6kINO-SJ2hVWX26tQkrWZ8HEDERUqhLnlkOBOUosOpnLFZVlX06PP0HpuFRi-mFYzIliEMKd9ICvgnBvlqcOy1OCC2yRd5GH1hLpcp-WS7cubbty8lPGzI-FCv5Cl2uLE_IEMIJeO7XbtRIFtJxmP0ncZfELQdrib5NKSZnFhOdQwYG4eXBn__WQ3IDEl5uCzijHdJZLlb6Ebkuvy0n1eKxDR4_ARj6d7k
  priority: 102
  providerName: ProQuest
Title GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
URI https://www.ncbi.nlm.nih.gov/pubmed/25099887
https://www.proquest.com/docview/1551699784
https://www.proquest.com/docview/1552370554
https://pubmed.ncbi.nlm.nih.gov/PMC4123905
https://doaj.org/article/a35c7edd90f94932a0d72043a039a1e4
http://dx.doi.org/10.1371/journal.pone.0103833
Volume 9
WOSCitedRecordID wos000339995100035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M~E
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Agriculture Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M0K
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/agriculturejournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M7P
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M7S
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Environmental Science Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: PATMY
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/environmentalscience
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7X7
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Materials Science Database (ProQuest)
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KB.
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/materialsscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Nursing & Allied Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7RV
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/nahs
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest advanced technologies & aerospace journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: P5Z
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central (ProQuest)
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: BENPR
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Public Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8C1
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/publichealth
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: PIMPY
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVATS
  databaseName: Public Library of Science (PLoS) Journals Open Access
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: FPL
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.plos.org/publications/
  providerName: Public Library of Science
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwELdYxwM8ABsfK4xiEBLsISWJXTvhrR0rnbZ1oR1T4SVyEnur1CZVPxD895ydNCzTJuDlpPbOTXr2ne_s888IvVV-AvMO1fYtXYvGRFqRS7kFiZgGG4GUJDYHhY95v--NRn7wJ1G8toNPuPOh0GlzlqWyqW8l8AjZQJsuYUwnW93geO15wXYZK47H3dayMv0YlP7SF9dmk2xxU6B5vV7yygTUffi_r_4IPShCTdzOx8YWuiPTbXT_CgDhNtoqTHuB3xf403uP0fRz73R4NvqI2ynO1xxkgodFyTXuZVOzEo_zQmXcnlxk8_HycopN9QEW-MtKzoG9Umr8E54-F7-wSBNgfBJLoWfNCu8J-to9ONvvWcWtDFbs2e7SAhOnymPKpUwqqRhTMAi4ryKHJ06kGHGgGxIqvJaIPddnia0BeCJbQmzGZUTJU1RLQSE7CLueDeGPhrgjEY2d2G_ZSiQMPijhuomsI7LurDAuIMv1zRmT0OzDcUhdclWGWsNhoeE6sspWsxyy4y_yHT0OSlkNuG2-gK4MC_sNBWnFXCaJbyufQswr7ERf70OETXzhSFpHr_QoCvPTq6XbCNvUAY-oIYPq6I2R0KAbqa7quRCrxSI8PD3_B6HhoCL0rhBSGagjFsVJCvhPGsyrIrlbkQTXEVfYO3rMr7WyCPNtUx96GFqu7eBm9uuSrX9UV-qlMlsZGZdoiCaQeZabTalZCLchu_d4HfGKQVVUX-Wk40uDeU4hwgL38fz2N36B7kEwS01xJttFteV8JV-iu_GP5Xgxb6ANPjjXdMQN9YB6-04DbXYO-sGgYdZiGsadAD3qNIGe2Eea8sDQIdCg9R1aBIcnwbff5XVyUQ
linkProvider Public Library of Science
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEF6VgAQcgJZHA4UuCAQc3NrejR9ICAVKSdTQFhJQxMWs7d00UmKHOAH6p_iNzKwf1KgCLj1wTGbseCfz9M5-Q8hD5ccQdzjat7QNHjFphDZ3DSjEEGwESpJIHxTuufv73nDoH66QH-VZGGyrLH2idtRxGuE78u18RwdqHv5i9sXAqVG4u1qO0MjVYk8ef4OSLXve3YH_95Ft774evOoYxVQBI_JMe2GAinLlOcrmjlRSOY6CRbi-Ci03tkLlMIt5TsyF1xKRZ_tObCKATGhKeABXhpzBfc-R85zbJlrRYetT6fnBdzhOcTyPudZ2oQ1bszSRWzhPwWOsFv70lIAqFjRmkzQ7LdH9vV_zRADcvfq_ie4auVKk2rSd28YqWZHJGrl8AoBxjawWri2jTwr87afXyfRN56A_GD6j7YTm71xkTPtFyzntpFO9E0HzRm3anoxgrYujKdXdF1TQd0s5B_JSqfF3-PW5OKYiiYGwIxYCs4Ya7Qb5cCZCuEkaCWjEOqG2Z0L6hxB_LOSRFfktU4nYgQ9K2HYsm4SVyhJEBWQ7Tg6ZBHof0oXSLRdlgCoWFCrWJEZ11SyHLPkL_0vUw4oXAcf1F-l8FBT-KxCsFbkyjn1T-RxyfmHGON6ICZP5wpK8STZRi4P89G7lNoM2tyAiIGRSkzzQHAg6kmBX00gssyzoHnz8B6b--xrT44JJpSCOSBQnSWBNCGZW49yocYLrjGrkdbS5UipZ8MtS4MrSlk4n36_IeFPsVExkutQ8NkOIKuC5lZttJVkoN3wfonqTuDWDrom-TknGRxrznUOGCe7z9p8fa5Nc7Aze9oJed3_vDrkEyT3XzarOBmks5kt5l1yIvi7G2fyedlyUfD5rc_8JRYbUxA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEF6VgBAcgJZHA4UuCAQc3NjejR9ICAVCSNSqTUlBERezsXfTSIkd4gToX-PXMWOvTY0q4NIDx2TGjncyT-_sN4Q8Vn4EcYejfUvb4CGTxsjmrgGFGIKNQEkSZgeF99z9fW849Ptr5EdxFgbbKgufmDnqKAnxHXkj39GBmoc3lG6L6Lc7r-ZfDJwghTutxTiNXEV25ck3KN_Sl702_NdPbLvz9uhN19ATBozQM-2lAerKlecomztSSeU4Chbk-mpkuZE1Ug6zmOdEXHhNEXq270QmgsmMTAkP48oRZ3DfC-SiCzUmthP2m5-KKAB-xHH0UT3mWg2tGTvzJJY7OFvBY6wSCrOJAWVcqM2nSXpW0vt77-apYNi5_j-L8Qa5plNw2sptZp2syXiDXD0FzLhB1rXLS-kzjcv9_CaZveseDI6GL2grpvm7GBnRgW5Fp91klu1Q0LyBm7amY1jr8nhGs64MKujhSi6AvFJq8h1-fSFOqIgjILTFUmA2UaHdIh_ORQi3SS0G7dgk1PZMSAsR-o-NeGiFftNUInLggxK2Hck6YYXiBKGGcseJItMg2590oaTLRRmgugVa3erEKK-a51Amf-F_jTpZ8iIQefZFshgH2q8FgjVDV0aRbyqfQy0gzAjHHjFhMl9YktfJNmp0kJ_qLd1p0OIWRAqEUqqTRxkHgpHEqI9jsUrToHfw8R-YBu8rTE81k0pAHKHQJ0xgTQhyVuHcqnCCSw0r5E20v0IqafDLauDKwq7OJj8syXhT7GCMZbLKeGyG0FXAcyc34VKyUIb4PkT7OnErxl0RfZUST44zLHgOmSe41bt_fqxtchmsPNjr7e_eI1cg5-dZD6uzRWrLxUreJ5fCr8tJuniQ-TBKPp-3tf8E1D7djg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GHOSTX%3A+An+Improved+Sequence+Homology+Search+Algorithm+Using+a+Query+Suffix+Array+and+a+Database+Suffix+Array&rft.jtitle=PloS+one&rft.au=Suzuki%2C+Shuji&rft.au=Kakuta%2C+Masanori&rft.au=Ishida%2C+Takashi&rft.au=Akiyama%2C+Yutaka&rft.date=2014-08-06&rft.pub=Public+Library+of+Science&rft.issn=1932-6203&rft.eissn=1932-6203&rft.volume=9&rft.issue=8&rft_id=info:doi/10.1371%2Fjournal.pone.0103833&rft.externalDBID=ISR&rft.externalDocID=A418707385
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon