Weighted minimizer sampling improves long read mapping

Abstract Motivation In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics (Oxford, England) Ročník 36; číslo Supplement_1; s. i111 - i118
Hlavní autoři: Jain, Chirag, Rhie, Arang, Zhang, Haowen, Chu, Claudia, Walenz, Brian P, Koren, Sergey, Phillippy, Adam M
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 01.07.2020
Témata:
ISSN:1367-4803, 1367-4811, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Abstract Motivation In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions. Results We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. Availability and implementation Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.
AbstractList Abstract Motivation In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions. Results We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. Availability and implementation Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.
In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.MOTIVATIONIn this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes.RESULTSWe introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes.Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.AVAILABILITY AND IMPLEMENTATIONWinnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.
In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions. We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.
Author Chu, Claudia
Koren, Sergey
Rhie, Arang
Phillippy, Adam M
Jain, Chirag
Zhang, Haowen
Walenz, Brian P
AuthorAffiliation b2 College of Computing, Georgia Institute of Technology , Atlanta, GA 30332, USA
b1 National Human Genome Research Institute, National Institutes of Health , Bethesda, MD 20892, USA
AuthorAffiliation_xml – name: b2 College of Computing, Georgia Institute of Technology , Atlanta, GA 30332, USA
– name: b1 National Human Genome Research Institute, National Institutes of Health , Bethesda, MD 20892, USA
Author_xml – sequence: 1
  givenname: Chirag
  surname: Jain
  fullname: Jain, Chirag
  email: chirag.jain@nih.gov
  organization: National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
– sequence: 2
  givenname: Arang
  surname: Rhie
  fullname: Rhie, Arang
  organization: National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
– sequence: 3
  givenname: Haowen
  surname: Zhang
  fullname: Zhang, Haowen
  organization: College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
– sequence: 4
  givenname: Claudia
  surname: Chu
  fullname: Chu, Claudia
  organization: College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
– sequence: 5
  givenname: Brian P
  surname: Walenz
  fullname: Walenz, Brian P
  organization: National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
– sequence: 6
  givenname: Sergey
  surname: Koren
  fullname: Koren, Sergey
  organization: National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
– sequence: 7
  givenname: Adam M
  surname: Phillippy
  fullname: Phillippy, Adam M
  organization: National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32657365$$D View this record in MEDLINE/PubMed
BookMark eNqNkc9LwzAcxYNMnJv-C6NHL3NJmqQNiCDDXzDwongMSZpukbapSTvQv96MzeG86CEk4ft574W8ERg0rjEATBC8RJCnM2WdbUrna9lZHWaqk5Kk9AicopRlU5IjNNifYToEoxDeIIQUUnYChilmNEsZPQXs1djlqjNFUtvG1vbT-CTIuq1ss0xs3Xq3NiGpXLx5IyMl2zaOzsBxKatgznf7GLzc3T7PH6aLp_vH-c1iqinG3RTjTPFSkQxDornUTBZZjijjSimqeE5IyRgvmC5LGBcvkC4lNFmmlUJRkI7B9da37VVtCm2azstKtN7W0n8IJ604nDR2JZZuLbKUUpyTaHCxM_DuvTehE7UN2lSVbIzrg8AEE84RyXlEJz-z9iHfnxUBtgW0dyF4U-4RBMWmFXHYiti1EoVXv4TadhFxmzfb6m852spd3_438gvJ_K-r
CitedBy_id crossref_primary_10_1093_nargab_lqab115
crossref_primary_10_1186_s13059_024_03324_5
crossref_primary_10_1089_cmb_2024_0544
crossref_primary_10_1038_s41467_024_46614_z
crossref_primary_10_1186_s13059_022_02831_7
crossref_primary_10_1109_TKDE_2022_3231780
crossref_primary_10_1038_s41467_024_55762_1
crossref_primary_10_1126_science_ads3484
crossref_primary_10_1101_gr_276871_122
crossref_primary_10_1007_s00521_021_06188_z
crossref_primary_10_1089_cmb_2023_0094
crossref_primary_10_1016_j_csbj_2022_08_019
crossref_primary_10_1038_s41467_024_55195_w
crossref_primary_10_1007_s11427_024_2742_y
crossref_primary_10_1016_j_gpb_2021_08_003
crossref_primary_10_1093_gigascience_giaf009
crossref_primary_10_1186_s13059_021_02283_5
crossref_primary_10_1093_molbev_msad122
crossref_primary_10_3389_fpls_2025_1580779
crossref_primary_10_1002_ece3_71153
crossref_primary_10_1038_s41592_022_01457_8
crossref_primary_10_1093_nargab_lqad004
crossref_primary_10_1186_s13059_024_03414_4
crossref_primary_10_1126_science_abj6987
crossref_primary_10_1186_s13059_025_03606_6
crossref_primary_10_1101_gr_280149_124
crossref_primary_10_1111_eva_13653
crossref_primary_10_1038_s41576_024_00718_w
crossref_primary_10_1038_s41586_021_03420_7
crossref_primary_10_1038_s41592_022_01445_y
crossref_primary_10_1126_science_abi7489
crossref_primary_10_1371_journal_pgen_1010306
crossref_primary_10_1093_bioadv_vbaf081
crossref_primary_10_1038_s41467_024_52384_5
crossref_primary_10_1093_plcell_koac305
crossref_primary_10_1093_hr_uhad103
crossref_primary_10_1089_cmb_2022_0275
crossref_primary_10_1093_hr_uhaf127
crossref_primary_10_1126_science_abl3533
crossref_primary_10_1089_cmb_2024_0483
crossref_primary_10_1038_s41597_025_04943_8
crossref_primary_10_1038_s41597_025_05741_y
crossref_primary_10_1089_cmb_2021_0599
crossref_primary_10_1101_gr_275648_121
crossref_primary_10_1093_hr_uhae022
crossref_primary_10_1093_gigascience_giaf059
crossref_primary_10_1093_hr_uhaf079
crossref_primary_10_1016_j_hpj_2024_02_002
crossref_primary_10_1093_nar_gkae842
crossref_primary_10_1126_science_abl4178
crossref_primary_10_12688_f1000research_154432_2
crossref_primary_10_1128_spectrum_00895_23
crossref_primary_10_1038_s41592_022_01440_3
crossref_primary_10_12688_f1000research_154432_1
crossref_primary_10_3390_ijms252011066
crossref_primary_10_1038_s41467_024_53294_2
crossref_primary_10_1186_s12859_022_05014_0
crossref_primary_10_1093_gigascience_giaa153
crossref_primary_10_1186_s12859_024_05807_5
crossref_primary_10_1089_cmb_2023_0212
crossref_primary_10_1016_j_molp_2025_03_005
crossref_primary_10_1016_j_semcdb_2022_04_022
crossref_primary_10_1371_journal_pcbi_1009078
crossref_primary_10_1093_icesjms_fsaf105
crossref_primary_10_3389_fbioe_2021_734023
crossref_primary_10_3390_genes12010048
crossref_primary_10_1038_s41586_024_07278_3
crossref_primary_10_1093_hr_uhad209
crossref_primary_10_1016_j_pld_2024_11_001
crossref_primary_10_1186_s13059_023_02972_3
crossref_primary_10_7717_peerj_16515
crossref_primary_10_1016_j_xgen_2025_100808
crossref_primary_10_7717_peerj_10805
crossref_primary_10_1038_s41576_022_00551_z
crossref_primary_10_1038_s41576_023_00590_0
crossref_primary_10_1101_gr_277637_122
crossref_primary_10_1093_nar_gkaa829
crossref_primary_10_1038_s41597_024_04350_5
crossref_primary_10_1093_gigascience_giab063
crossref_primary_10_1111_1755_0998_13783
crossref_primary_10_1111_mec_16468
crossref_primary_10_1093_hr_uhae071
crossref_primary_10_1038_s41598_023_34257_x
crossref_primary_10_1093_gigascience_giae053
crossref_primary_10_1371_journal_pcbi_1010638
crossref_primary_10_1038_s41597_024_03717_y
crossref_primary_10_1101_gr_280041_124
crossref_primary_10_1093_dnares_dsaf017
crossref_primary_10_1109_TCBBIO_2025_3545285
crossref_primary_10_1089_cmb_2023_0186
crossref_primary_10_1371_journal_pcbi_1012885
crossref_primary_10_1101_gr_278203_123
crossref_primary_10_1126_science_abj6965
crossref_primary_10_1038_s41467_025_57722_9
crossref_primary_10_1038_s41597_024_04093_3
crossref_primary_10_1007_s00438_024_02158_x
crossref_primary_10_1093_nar_gkaf298
Cites_doi 10.1089/cmb.2014.0160
10.1093/bioinformatics/btw152
10.1038/ncomms15311
10.1101/gr.215087.116
10.1186/gb-2004-5-2-r12
10.1101/gr.213611.116
10.1016/0022-2836(81)90087-5
10.1089/cmb.2018.0036
10.1093/bioinformatics/bty191
10.1186/s13059-019-1809-x
10.1016/j.cels.2015.08.004
10.1093/bioinformatics/bty258
10.1038/nmeth.1923
10.1038/s41467-019-10934-2
10.1186/s13059-016-0997-x
10.1038/nbt.3238
10.1371/journal.pone.0028819
10.1093/bioinformatics/btx235
10.1093/bioinformatics/bth408
10.1093/bioinformatics/bts649
10.1093/nar/25.17.3389
10.1007/978-3-319-43681-4_21
10.1146/annurev-biodatasci-072018-021156
ContentType Journal Article
Copyright Published by Oxford University Press 2020. 2020
Published by Oxford University Press 2020.
Copyright_xml – notice: Published by Oxford University Press 2020. 2020
– notice: Published by Oxford University Press 2020.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/bioinformatics/btaa435
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
DocumentTitleAlternate ISMB 2020 Proceedings
EISSN 1367-4811
EndPage i118
ExternalDocumentID PMC7355284
32657365
10_1093_bioinformatics_btaa435
10.1093/bioinformatics/btaa435
Genre Journal Article
Research Support, N.I.H., Intramural
GrantInformation_xml – fundername: ; ;
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
ROX
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c522t-227b9fb47204c9ac6ad781569bbb5b9844f669d6cff0cff9d1cfa0e77cbb1c9a3
IEDL.DBID TOX
ISICitedReferencesCount 124
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000579894600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 18:00:51 EDT 2025
Thu Jul 10 18:17:15 EDT 2025
Mon Jul 21 06:02:24 EDT 2025
Sat Nov 29 03:49:17 EST 2025
Tue Nov 18 21:05:58 EST 2025
Wed Apr 02 07:03:53 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Supplement_1
Language English
License This work is written by US Government employees and is in the public domain in the US.
Published by Oxford University Press 2020.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c522t-227b9fb47204c9ac6ad781569bbb5b9844f669d6cff0cff9d1cfa0e77cbb1c9a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/7355284
PMID 32657365
PQID 2424991489
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_7355284
proquest_miscellaneous_2424991489
pubmed_primary_32657365
crossref_primary_10_1093_bioinformatics_btaa435
crossref_citationtrail_10_1093_bioinformatics_btaa435
oup_primary_10_1093_bioinformatics_btaa435
PublicationCentury 2000
PublicationDate 2020-07-01
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-07-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2020
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Roberts (2024021913324726800_btaa435-B27) 2004; 20
Marçais (2024021913324726800_btaa435-B18) 2017; 33
Rowe (2024021913324726800_btaa435-B28) 2019; 20
Shafin (2024021913324726800_btaa435-B33)
Smith (2024021913324726800_btaa435-B34) 2011
Chin (2024021913324726800_btaa435-B5) 2019
Ono (2024021913324726800_btaa435-B23) 2013; 29
Chikhi (2024021913324726800_btaa435-B4) 2015; 22
Xin (2024021913324726800_btaa435-B36) 2018
Li (2024021913324726800_btaa435-B17) 2018; 34
Schleimer (2024021913324726800_btaa435-B31) 2003
Sahlin (2024021913324726800_btaa435-B30) 2020
Rhie (2024021913324726800_btaa435-B26) 2020
Altschul (2024021913324726800_btaa435-B1) 1997; 25
Langmead (2024021913324726800_btaa435-B14) 2012; 9
Smith (2024021913324726800_btaa435-B35) 1981; 147
Broder (2024021913324726800_btaa435-B3) 1997
Orenstein (2024021913324726800_btaa435-B24) 2016
Yu (2024021913324726800_btaa435-B37) 2015; 1
Li (2024021913324726800_btaa435-B15) 2016; 32
Dilthey (2024021913324726800_btaa435-B8) 2019; 10
Kurtz (2024021913324726800_btaa435-B13) 2004; 5
Kundu (2024021913324726800_btaa435-B12) 2019
Koren (2024021913324726800_btaa435-B11) 2017; 27
Jain (2024021913324726800_btaa435-B10) 2018; 25
Berlin (2024021913324726800_btaa435-B2) 2015; 33
Miga (2024021913324726800_btaa435-B21) 2019
Ondov (2024021913324726800_btaa435-B22) 2016; 17
Marçais (2024021913324726800_btaa435-B19) 2018; 34
Sahlin (2024021913324726800_btaa435-B29) 2020
Li (2024021913324726800_btaa435-B16) 2018
Popic (2024021913324726800_btaa435-B25) 2017; 8
2024021913324726800_btaa435-B38
DeBlasio (2024021913324726800_btaa435-B7) 2019
Chum (2024021913324726800_btaa435-B6) 2008; 810
Frith (2024021913324726800_btaa435-B9) 2011; 6
Marçais (2024021913324726800_btaa435-B20) 2019; 2
Schneider (2024021913324726800_btaa435-B32) 2017; 27
References_xml – volume: 22
  start-page: 336
  year: 2015
  ident: 2024021913324726800_btaa435-B4
  article-title: On the representation of de Bruijn graphs
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.2014.0160
– volume: 32
  start-page: 2103
  year: 2016
  ident: 2024021913324726800_btaa435-B15
  article-title: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw152
– volume: 8
  start-page: 15311
  year: 2017
  ident: 2024021913324726800_btaa435-B25
  article-title: A hybrid cloud read aligner based on minhash and kmer voting that preserves privacy
  publication-title: Nat. Commun
  doi: 10.1038/ncomms15311
– start-page: 167
  year: 2019
  ident: 2024021913324726800_btaa435-B7
– volume: 27
  start-page: 722
  year: 2017
  ident: 2024021913324726800_btaa435-B11
  article-title: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation
  publication-title: Genome Res
  doi: 10.1101/gr.215087.116
– volume: 5
  start-page: R12
  year: 2004
  ident: 2024021913324726800_btaa435-B13
  article-title: Versatile and open software for comparing large genomes
  publication-title: Genome Biol
  doi: 10.1186/gb-2004-5-2-r12
– start-page: 735928
  year: 2019
  ident: 2024021913324726800_btaa435-B21
– volume: 27
  start-page: 849
  year: 2017
  ident: 2024021913324726800_btaa435-B32
  article-title: Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly
  publication-title: Genome Res
  doi: 10.1101/gr.213611.116
– volume: 147
  start-page: 195
  year: 1981
  ident: 2024021913324726800_btaa435-B35
  article-title: Identification of common molecular subsequences
  publication-title: J. Mol. Biol
  doi: 10.1016/0022-2836(81)90087-5
– ident: 2024021913324726800_btaa435-B33
– volume: 25
  start-page: 766
  year: 2018
  ident: 2024021913324726800_btaa435-B10
  article-title: A fast approximate algorithm for mapping long reads to large reference databases
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.2018.0036
– start-page: 76
  year: 2003
  ident: 2024021913324726800_btaa435-B31
– ident: 2024021913324726800_btaa435-B38
– start-page: 472
  year: 2020
  ident: 2024021913324726800_btaa435-B29
– volume: 34
  start-page: 3094
  year: 2018
  ident: 2024021913324726800_btaa435-B17
  article-title: Minimap2: pairwise alignment for nucleotide sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty191
– year: 2019
  ident: 2024021913324726800_btaa435-B12
– volume: 20
  start-page: 199
  year: 2019
  ident: 2024021913324726800_btaa435-B28
  article-title: When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1809-x
– volume: 1
  start-page: 130
  year: 2015
  ident: 2024021913324726800_btaa435-B37
  article-title: Entropy-scaling search of massive biological data
  publication-title: Cell Syst
  doi: 10.1016/j.cels.2015.08.004
– volume: 34
  start-page: i13
  year: 2018
  ident: 2024021913324726800_btaa435-B19
  article-title: Asymptotically optimal minimizers schemes
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty258
– start-page: 21
  year: 1997
  ident: 2024021913324726800_btaa435-B3
– volume: 9
  start-page: 357
  year: 2012
  ident: 2024021913324726800_btaa435-B14
  article-title: Fast gapped-read alignment with bowtie 2
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.1923
– volume: 10
  start-page: 1
  year: 2019
  ident: 2024021913324726800_btaa435-B8
  article-title: Strain-level metagenomic assignment and compositional estimation for long reads with metamaps
  publication-title: Nat. Commun
  doi: 10.1038/s41467-019-10934-2
– volume: 17
  start-page: 132
  year: 2016
  ident: 2024021913324726800_btaa435-B22
  article-title: Mash: fast genome and metagenome distance estimation using minhash
  publication-title: Genome Biol
  doi: 10.1186/s13059-016-0997-x
– year: 2019
  ident: 2024021913324726800_btaa435-B5
– volume: 810
  start-page: 812
  year: 2008
  ident: 2024021913324726800_btaa435-B6
  article-title: Near duplicate image detection: min-Hash and tf-idf weighting
  publication-title: BMVC
– year: 2020
  ident: 2024021913324726800_btaa435-B30
– year: 2011
  ident: 2024021913324726800_btaa435-B34
– year: 2018
  ident: 2024021913324726800_btaa435-B36
– year: 2018
  ident: 2024021913324726800_btaa435-B16
– year: 2020
  ident: 2024021913324726800_btaa435-B26
– volume: 33
  start-page: 623
  year: 2015
  ident: 2024021913324726800_btaa435-B2
  article-title: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.3238
– volume: 6
  start-page: e28819
  year: 2011
  ident: 2024021913324726800_btaa435-B9
  article-title: Gentle masking of low-complexity sequences improves homology search
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0028819
– volume: 33
  start-page: i110
  year: 2017
  ident: 2024021913324726800_btaa435-B18
  article-title: Improving the performance of minimizers and winnowing schemes
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx235
– volume: 20
  start-page: 3363
  year: 2004
  ident: 2024021913324726800_btaa435-B27
  article-title: Reducing storage requirements for biological sequence comparison
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth408
– volume: 29
  start-page: 119
  year: 2013
  ident: 2024021913324726800_btaa435-B23
  article-title: PBSIM: PacBio reads simulator-toward accurate genome assembly
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts649
– volume: 25
  start-page: 3389
  year: 1997
  ident: 2024021913324726800_btaa435-B1
  article-title: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/25.17.3389
– start-page: 257
  volume-title: International Workshop on Algorithms in Bioinformatics
  year: 2016
  ident: 2024021913324726800_btaa435-B24
  doi: 10.1007/978-3-319-43681-4_21
– volume: 2
  start-page: 93
  year: 2019
  ident: 2024021913324726800_btaa435-B20
  article-title: Sketching and sublinear data structures in genomics
  publication-title: Annu. Rev. Biomed. Data Sci
  doi: 10.1146/annurev-biodatasci-072018-021156
SSID ssj0005056
Score 2.6500916
Snippet Abstract Motivation In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence...
In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage i111
SubjectTerms Algorithms
Comparative and Functional Genomics
Data Compression
Genomics
High-Throughput Nucleotide Sequencing
Humans
Sequence Analysis, DNA
Software
Title Weighted minimizer sampling improves long read mapping
URI https://www.ncbi.nlm.nih.gov/pubmed/32657365
https://www.proquest.com/docview/2424991489
https://pubmed.ncbi.nlm.nih.gov/PMC7355284
Volume 36
WOSCitedRecordID wos000579894600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEB7WRcGL78f6WCp4Esq227RpjiIuHmT1sGJvJUmTtaCt7EPQX--k7a7bBfFxKKU0E0Jm2vmSzHwDcC4Mp1aiuS00S2zC_cBGGMJtQ-4kpaBU6YJn9pb2-2EUsfsGuLNcmOUjfOZ1RJpXJKKGuLgjJpyjj8e_ruuHpmbB4C76CupwinqthofMJqHjzXKCv-2m5o5qKW4LSHM5YHLBA_U2_zH2Ldio4KZ1WdrHNjRUtgNrZQHK910IHoudUZVYhmLkJf1QI2vMTZB5NrTSYr9Bja3nHJ8QXGIrbugchnvw0LseXN3YVSUFWyK-mtjdLhVMC2Iq0kjGZcATamhimBDCFywkRAcBSwKptYMXS1ypuaMolUK4KODtQzPLM3UIlqM4QjLNOA0V8QkK-1wIT-CyMMEbaYE_m9BYVjTjptrFc1wed3txfU7iak5a0JnLvZZEGz9KXKC-ft34bKbWGD8gcyrCM5VPx7HJj0GQTELWgoNSzfM-Edv61AtQmtYMYN7AkHPX32TpU0HSTRHIoes_-ssgj2G9a5bzRTTwCTQno6k6hVX5NknHozas0ChsF_b-CbHdClQ
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Weighted+minimizer+sampling+improves+long+read+mapping&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jain%2C+Chirag&rft.au=Rhie%2C+Arang&rft.au=Zhang%2C+Haowen&rft.au=Chu%2C+Claudia&rft.date=2020-07-01&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=Suppl+1&rft.spage=i111&rft.epage=i118&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa435&rft_id=info%3Apmid%2F32657365&rft.externalDocID=PMC7355284
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon