Protein-to-genome alignment with miniprot

Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not k...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) Vol. 39; no. 1
Main Author: Li, Heng
Format: Journal Article
Language:English
Published: England Oxford University Press 01.01.2023
Oxford Publishing Limited (England)
Subjects:
ISSN:1367-4811, 1367-4803, 1367-4811
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation https://github.com/lh3/miniport.
AbstractList Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.MOTIVATIONProtein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.RESULTSHere, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.https://github.com/lh3/miniport.AVAILABILITY AND IMPLEMENTATIONhttps://github.com/lh3/miniport.
Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation https://github.com/lh3/miniport.
Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation https://github.com/lh3/miniport.
Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. https://github.com/lh3/miniport.
Author Li, Heng
Author_xml – sequence: 1
  givenname: Heng
  orcidid: 0000-0003-4874-2874
  surname: Li
  fullname: Li, Heng
  email: hli@ds.dfci.harvard.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36648328$$D View this record in MEDLINE/PubMed
BookMark eNqNkUtLAzEUhYNUbH38hVJwo4vRZJLOTEAEKb6goIvuQyaPNmUmqUlG8d-b0iq1G13lQr5zcm7OMehZZxUAQwSvEKT4ujbOWO18y6MR4bqOXEJEDsAA4aLMSIVQb2fug-MQlhDCMRwXR6CPi4JUOK8G4PLVu6iMzaLL5sq6Vo14Y-a2VTaOPkxcjFpjzSpBp-BQ8yaos-15AmYP97PJUzZ9eXye3E0zQSoSMyxzKZWspNZUUCyILCWlhYSw5FTnNdUIc11CKTisa06g1qSiFVVUp1HiE3C7sV11daukSDk8b9jKm5b7T-a4Yb9vrFmwuXtntCoowXkyuNgaePfWqRBZa4JQTcOtcl1geZmWxzkty4Se76FL13mbtmMYYTzO85LCRA13E_1E-f7EBNxsAOFdCF5pJkxMvbh1QNMwBNm6M_a7M7btLMmLPfn3C38K0UboutV_NV93Brac
CitedBy_id crossref_primary_10_1038_s41597_025_04893_1
crossref_primary_10_1038_s41467_025_59565_w
crossref_primary_10_1038_s41597_024_03849_1
crossref_primary_10_1038_s41597_025_05271_7
crossref_primary_10_1038_s41597_025_05402_0
crossref_primary_10_1111_eva_70106
crossref_primary_10_12688_f1000research_166849_1
crossref_primary_10_1093_nargab_lqae072
crossref_primary_10_1093_gigascience_giae032
crossref_primary_10_1073_pnas_2503368122
crossref_primary_10_1093_gbe_evaf052
crossref_primary_10_1093_bioadv_vbad162
crossref_primary_10_1038_s41597_025_04421_1
crossref_primary_10_1038_s41597_025_05479_7
crossref_primary_10_1038_s41541_025_01231_9
crossref_primary_10_1111_tpj_70319
crossref_primary_10_48130_gcomm_0025_0016
crossref_primary_10_1099_mgen_0_001396
crossref_primary_10_1002_ece3_70874
crossref_primary_10_1186_s12864_024_10521_w
crossref_primary_10_1186_s12859_023_05449_z
crossref_primary_10_1093_g3journal_jkae195
crossref_primary_10_1038_s41467_023_43556_w
crossref_primary_10_1038_s41597_024_04301_0
crossref_primary_10_1093_g3journal_jkaf162
crossref_primary_10_1073_pnas_2409943121
crossref_primary_10_1002_ps_7789
crossref_primary_10_1038_s41597_024_04284_y
crossref_primary_10_1093_bioadv_vbaf079
crossref_primary_10_1007_s10126_023_10248_x
crossref_primary_10_1038_s41597_024_03322_z
crossref_primary_10_1093_molbev_msaf030
crossref_primary_10_1093_molbev_msaf151
crossref_primary_10_1038_s41597_024_03514_7
crossref_primary_10_1038_s41597_024_03070_0
crossref_primary_10_1038_s42003_025_08629_0
crossref_primary_10_1038_s41576_024_00718_w
crossref_primary_10_1111_tpj_17158
crossref_primary_10_1038_s41597_024_03846_4
crossref_primary_10_1111_1751_7915_70201
crossref_primary_10_1126_science_ado1663
crossref_primary_10_1186_s13059_023_03071_z
crossref_primary_10_48130_gcomm_0025_0006
crossref_primary_10_1038_s41597_025_04418_w
crossref_primary_10_1038_s41598_024_70018_0
crossref_primary_10_1093_bioinformatics_btae456
crossref_primary_10_1002_jsfa_70145
crossref_primary_10_1371_journal_pgen_1011512
crossref_primary_10_1016_j_micpath_2025_107281
crossref_primary_10_1038_s41467_024_52384_5
crossref_primary_10_1016_j_ympev_2023_107968
crossref_primary_10_1093_gigascience_giaf105
crossref_primary_10_1093_molbev_msae250
crossref_primary_10_3389_fnins_2024_1357873
crossref_primary_10_1038_s41586_023_05936_6
crossref_primary_10_1038_s41597_024_03906_9
crossref_primary_10_1038_s41597_025_05415_9
crossref_primary_10_1111_mec_17627
crossref_primary_10_1038_s41597_025_05057_x
crossref_primary_10_1038_s41597_025_04947_4
crossref_primary_10_1186_s12863_024_01261_7
crossref_primary_10_1126_science_adp7978
crossref_primary_10_12688_openreseurope_17365_1
crossref_primary_10_12688_openreseurope_17365_2
crossref_primary_10_1002_ece3_70734
crossref_primary_10_1002_ece3_71134
crossref_primary_10_1038_s41597_025_05116_3
crossref_primary_10_1186_s13059_024_03359_8
crossref_primary_10_3389_ffgc_2023_1240804
crossref_primary_10_1038_s41559_025_02642_6
crossref_primary_10_1016_j_visres_2024_108447
crossref_primary_10_1038_s41597_024_04333_6
crossref_primary_10_1038_s41597_025_05373_2
crossref_primary_10_1038_s41597_025_04837_9
crossref_primary_10_1093_g3journal_jkaf061
crossref_primary_10_1093_gigascience_giae124
crossref_primary_10_1111_zsc_12687
crossref_primary_10_1038_s41586_024_07070_3
crossref_primary_10_1007_s10592_023_01575_6
crossref_primary_10_1093_isd_ixaf027
crossref_primary_10_1093_jhered_esaf034
crossref_primary_10_1093_jhered_esaf036
crossref_primary_10_1101_gr_279569_124
crossref_primary_10_1038_s41564_025_02084_7
crossref_primary_10_3389_fpls_2024_1437132
crossref_primary_10_1038_s41597_024_04222_y
crossref_primary_10_1093_bioinformatics_btae517
crossref_primary_10_1093_nar_gkad834
crossref_primary_10_1128_spectrum_02988_23
crossref_primary_10_1038_s41597_024_03260_w
crossref_primary_10_1038_s42003_024_06550_6
crossref_primary_10_1101_gr_279364_124
crossref_primary_10_3390_plants14010124
crossref_primary_10_1038_s41467_025_62544_w
crossref_primary_10_1038_s41597_025_04631_7
crossref_primary_10_1093_gigascience_giae118
crossref_primary_10_1038_s41597_025_04737_y
crossref_primary_10_1093_bioinformatics_btaf219
crossref_primary_10_3390_ijms241310755
crossref_primary_10_1186_s12864_023_09678_7
crossref_primary_10_1038_s41586_025_09270_x
crossref_primary_10_12688_f1000research_156485_1
crossref_primary_10_1038_s41597_025_05747_6
crossref_primary_10_3389_fpls_2024_1352253
crossref_primary_10_1038_s41597_024_02988_9
crossref_primary_10_1093_jhered_esae042
crossref_primary_10_1093_nargab_lqaf110
crossref_primary_10_1021_acs_biochem_5c00186
crossref_primary_10_1111_mec_70103
crossref_primary_10_59717_j_xinn_life_2025_100144
crossref_primary_10_1038_s41588_025_02113_5
crossref_primary_10_1073_pnas_2501111122
crossref_primary_10_1038_s41597_024_02965_2
crossref_primary_10_1093_g3journal_jkaf127
crossref_primary_10_1093_g3journal_jkaf005
crossref_primary_10_1093_genetics_iyad016
crossref_primary_10_1038_s41597_024_03905_w
crossref_primary_10_1093_nar_gkad814
crossref_primary_10_1038_s41597_024_03157_8
crossref_primary_10_1093_molbev_msae169
crossref_primary_10_1093_nar_gkae987
crossref_primary_10_1038_s41597_025_04764_9
crossref_primary_10_1038_s41597_024_04350_5
crossref_primary_10_1038_s41467_025_60222_5
crossref_primary_10_1111_1755_0998_13823
crossref_primary_10_1101_gr_278566_123
crossref_primary_10_1093_g3journal_jkaf030
crossref_primary_10_1186_s12864_024_10829_7
crossref_primary_10_1186_s13015_025_00275_9
crossref_primary_10_1093_g3journal_jkae223
crossref_primary_10_1093_nar_gkaf045
crossref_primary_10_1126_sciadv_adq3938
crossref_primary_10_1111_jipb_13748
crossref_primary_10_1186_s12915_025_02328_2
crossref_primary_10_1038_s41467_025_61387_9
crossref_primary_10_1038_s41597_025_05573_w
crossref_primary_10_1101_gr_280377_124
crossref_primary_10_1111_mec_17147
crossref_primary_10_1038_s41597_025_05114_5
crossref_primary_10_1093_dnares_dsaf010
crossref_primary_10_1186_s12864_025_11507_y
crossref_primary_10_3389_fgene_2025_1502681
crossref_primary_10_1038_s41586_025_08619_6
crossref_primary_10_1093_g3journal_jkad126
crossref_primary_10_1186_s12864_025_11332_3
crossref_primary_10_1038_s41597_023_02509_0
crossref_primary_10_1093_molbev_msae164
crossref_primary_10_1038_s41597_025_04661_1
crossref_primary_10_1038_s41589_024_01735_w
Cites_doi 10.1038/s41592-020-01056-5
10.1186/s12864-020-6707-9
10.1101/gr.1865504
10.1038/s41587-019-0217-9
10.1093/bioinformatics/btl582
10.1101/gr.263566.120
10.1016/j.infsof.2005.09.005
10.1093/molbev/msab199
10.1101/gr.6743907
10.1073/pnas.1720115115
10.1093/bioinformatics/btaa1016
10.1038/nrg.2016.46
10.1186/gb-2008-9-1-r7
10.1371/journal.pcbi.1002195
10.1089/cmb.1997.4.339
10.1093/nar/gkl556
10.1186/1471-2105-6-31
10.1093/database/baw093
10.1186/1471-2105-8-349
10.1016/S0092-8240(86)90010-8
10.1073/pnas.89.22.10915
10.1101/gr.233460.117
10.1038/s41587-022-01261-x
10.1093/bioinformatics/btw152
10.1093/bioinformatics/bti310
10.1038/nbt.3988
10.1186/1471-2105-12-491
10.1371/journal.pgen.1000148
10.1093/bioinformatics/bts635
10.1006/jmbi.2000.3641
10.1186/s13059-019-1910-1
10.1093/bioinformatics/bty191
10.1093/bioinformatics/btr342
10.1007/978-1-4939-9173-0_9
10.1186/1745-6150-3-20
10.1093/bioinformatics/btn460
10.1093/nar/gkh180
10.1093/nargab/lqaa108
10.1093/nar/gks708
10.1093/nargab/lqaa026
10.1038/s41586-021-03451-0
10.1186/s13059-021-02443-7
ContentType Journal Article
Copyright The Author(s) 2023. Published by Oxford University Press. 2023
The Author(s) 2023. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2023. Published by Oxford University Press. 2023
– notice: The Author(s) 2023. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
DOI 10.1093/bioinformatics/btad014
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
Materials Research Database

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
ExternalDocumentID PMC9869432
36648328
10_1093_bioinformatics_btad014
10.1093/bioinformatics/btad014
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R01HG010040
– fundername: NHGRI NIH HHS
  grantid: R01 HG010040
– fundername: ;
  grantid: 237653
– fundername: ;
  grantid: R01HG010040
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
ROX
ADRIX
AFXEN
BCRHZ
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
ID FETCH-LOGICAL-c484t-3d2dded8dff9c93c4d7d996d007a9f2b9f13af70dca0bba40ff48989e9f0ffd3
IEDL.DBID TOX
ISICitedReferencesCount 200
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000940926100075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4811
1367-4803
IngestDate Thu Aug 21 18:38:42 EDT 2025
Fri Jul 11 15:20:56 EDT 2025
Mon Oct 06 17:46:57 EDT 2025
Wed Feb 19 02:08:31 EST 2025
Sat Nov 29 03:49:26 EST 2025
Tue Nov 18 21:26:55 EST 2025
Wed Apr 02 07:03:59 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2023. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c484t-3d2dded8dff9c93c4d7d996d007a9f2b9f13af70dca0bba40ff48989e9f0ffd3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-4874-2874
OpenAccessLink https://dx.doi.org/10.1093/bioinformatics/btad014
PMID 36648328
PQID 3133522790
PQPubID 36124
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9869432
proquest_miscellaneous_2766432977
proquest_journals_3133522790
pubmed_primary_36648328
crossref_citationtrail_10_1093_bioinformatics_btad014
crossref_primary_10_1093_bioinformatics_btad014
oup_primary_10_1093_bioinformatics_btad014
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2023
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Li (2023012312141952300_btad014-B27) 2016; 32
Alser (2023012312141952300_btad014-B2) 2021; 22
Kovaka (2023012312141952300_btad014-B25) 2019; 20
Li (2023012312141952300_btad014-B29) 2007; 8
Scalzitti (2023012312141952300_btad014-B33) 2020; 21
Aken (2023012312141952300_btad014-B1) 2016; 2016
Shumate (2023012312141952300_btad014-B36) 2020; 37
Gotoh (2023012312141952300_btad014-B16) 2008; 24
Nurk (2023012312141952300_btad014-B31) 2020; 30
Slater (2023012312141952300_btad014-B38) 2005; 6
Edgar (2023012312141952300_btad014-B13) 2004; 32
Kapustin (2023012312141952300_btad014-B23) 2008; 3
Wu (2023012312141952300_btad014-B42) 2005; 21
Brůna (2023012312141952300_btad014-B6) 2020; 2
Manni (2023012312141952300_btad014-B30) 2021; 38
Altschul (2023012312141952300_btad014-B3) 1986; 48
Birney (2023012312141952300_btad014-B4) 1997; 5
Dobin (2023012312141952300_btad014-B11) 2013; 29
Keilwagen (2023012312141952300_btad014-B24) 2019; 1962
Zhang (2023012312141952300_btad014-B43) 1997; 4
Irimia (2023012312141952300_btad014-B21) 2008; 4
Brůna (2023012312141952300_btad014-B7) 2021; 3
Birney (2023012312141952300_btad014-B5) 2004; 14
Wenger (2023012312141952300_btad014-B41) 2019; 37
Cantarel (2023012312141952300_btad014-B8) 2008; 18
Cheng (2023012312141952300_btad014-B9) 2021; 18
Cheng (2023012312141952300_btad014-B10) 2022; 40
Henikoff (2023012312141952300_btad014-B19) 1992; 89
Sheth (2023012312141952300_btad014-B35) 2006; 34
Steinegger (2023012312141952300_btad014-B39) 2017; 35
Haas (2023012312141952300_btad014-B18) 2008; 9
Lewin (2023012312141952300_btad014-B26) 2018; 115
Usuka (2023012312141952300_btad014-B40) 2000; 297
Iwata (2023012312141952300_btad014-B22) 2012; 40
Farrar (2023012312141952300_btad014-B14) 2007; 23
Eddy (2023012312141952300_btad014-B12) 2011; 7
Sibley (2023012312141952300_btad014-B37) 2016; 17
Holt (2023012312141952300_btad014-B20) 2011; 12
She (2023012312141952300_btad014-B34) 2011; 27
Li (2023012312141952300_btad014-B28) 2018; 34
Fiddes (2023012312141952300_btad014-B15) 2018; 28
Rhie (2023012312141952300_btad014-B32) 2021; 592
Gremme (2023012312141952300_btad014-B17) 2005; 47
References_xml – volume: 18
  start-page: 170
  year: 2021
  ident: 2023012312141952300_btad014-B9
  article-title: Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
  publication-title: Nat. Methods
  doi: 10.1038/s41592-020-01056-5
– volume: 21
  start-page: 293
  year: 2020
  ident: 2023012312141952300_btad014-B33
  article-title: A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
  publication-title: BMC Genomics
  doi: 10.1186/s12864-020-6707-9
– volume: 14
  start-page: 988
  year: 2004
  ident: 2023012312141952300_btad014-B5
  article-title: Genewise and genomewise
  publication-title: Genome Res
  doi: 10.1101/gr.1865504
– volume: 37
  start-page: 1155
  year: 2019
  ident: 2023012312141952300_btad014-B41
  article-title: Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
  publication-title: Nat. Biotechnol
  doi: 10.1038/s41587-019-0217-9
– volume: 23
  start-page: 156
  year: 2007
  ident: 2023012312141952300_btad014-B14
  article-title: Striped Smith-Waterman speeds database searches six times over other SIMD implementations
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl582
– volume: 30
  start-page: 1291
  year: 2020
  ident: 2023012312141952300_btad014-B31
  article-title: HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
  publication-title: Genome Res
  doi: 10.1101/gr.263566.120
– volume: 47
  start-page: 965
  year: 2005
  ident: 2023012312141952300_btad014-B17
  article-title: Engineering a software tool for gene structure prediction in higher organisms
  publication-title: Inf. Softw. Technol
  doi: 10.1016/j.infsof.2005.09.005
– volume: 38
  start-page: 4647
  year: 2021
  ident: 2023012312141952300_btad014-B30
  article-title: BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
  publication-title: Mol. Biol. Evol
  doi: 10.1093/molbev/msab199
– volume: 18
  start-page: 188
  year: 2008
  ident: 2023012312141952300_btad014-B8
  article-title: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes
  publication-title: Genome Res
  doi: 10.1101/gr.6743907
– volume: 115
  start-page: 4325
  year: 2018
  ident: 2023012312141952300_btad014-B26
  article-title: Earth BioGenome project: sequencing life for the future of life
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.1720115115
– volume: 37
  start-page: 1639
  year: 2020
  ident: 2023012312141952300_btad014-B36
  article-title: Liftoff: accurate mapping of gene annotations
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa1016
– volume: 17
  start-page: 407
  year: 2016
  ident: 2023012312141952300_btad014-B37
  article-title: Lessons from non-canonical splicing
  publication-title: Nat. Rev. Genet
  doi: 10.1038/nrg.2016.46
– volume: 9
  start-page: R7
  year: 2008
  ident: 2023012312141952300_btad014-B18
  article-title: Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments
  publication-title: Genome Biol
  doi: 10.1186/gb-2008-9-1-r7
– volume: 7
  start-page: e1002195
  year: 2011
  ident: 2023012312141952300_btad014-B12
  article-title: Accelerated profile HMM searches
  publication-title: PLoS Comput. Biol
  doi: 10.1371/journal.pcbi.1002195
– volume: 4
  start-page: 339
  year: 1997
  ident: 2023012312141952300_btad014-B43
  article-title: Aligning a DNA sequence with a protein sequence
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.1997.4.339
– volume: 34
  start-page: 3955
  year: 2006
  ident: 2023012312141952300_btad014-B35
  article-title: Comprehensive splice-site analysis using comparative genomics
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkl556
– volume: 6
  start-page: 31
  year: 2005
  ident: 2023012312141952300_btad014-B38
  article-title: Automated generation of heuristics for biological sequence comparison
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-6-31
– volume: 2016
  start-page: baw093
  year: 2016
  ident: 2023012312141952300_btad014-B1
  article-title: The Ensembl gene annotation system
  publication-title: Database (Oxford)
  doi: 10.1093/database/baw093
– volume: 8
  start-page: 349
  year: 2007
  ident: 2023012312141952300_btad014-B29
  article-title: A cross-species alignment tool (CAT)
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-8-349
– volume: 48
  start-page: 603
  year: 1986
  ident: 2023012312141952300_btad014-B3
  article-title: Optimal sequence alignment using affine gap costs
  publication-title: Bull. Math. Biol
  doi: 10.1016/S0092-8240(86)90010-8
– volume: 89
  start-page: 10915
  year: 1992
  ident: 2023012312141952300_btad014-B19
  article-title: Amino acid substitution matrices from protein blocks
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.89.22.10915
– volume: 28
  start-page: 1029
  year: 2018
  ident: 2023012312141952300_btad014-B15
  article-title: Comparative annotation toolkit (CAT)-simultaneous clade and personal genome annotation
  publication-title: Genome Res
  doi: 10.1101/gr.233460.117
– volume: 40
  start-page: 1332
  year: 2022
  ident: 2023012312141952300_btad014-B10
  article-title: Haplotype-resolved assembly of diploid genomes without parental data
  publication-title: Nat. Biotechnol
  doi: 10.1038/s41587-022-01261-x
– volume: 32
  start-page: 2103
  year: 2016
  ident: 2023012312141952300_btad014-B27
  article-title: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw152
– volume: 21
  start-page: 1859
  year: 2005
  ident: 2023012312141952300_btad014-B42
  article-title: GMAP: a genomic mapping and alignment program for mRNA and EST sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti310
– volume: 35
  start-page: 1026
  year: 2017
  ident: 2023012312141952300_btad014-B39
  article-title: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.3988
– volume: 12
  start-page: 491
  year: 2011
  ident: 2023012312141952300_btad014-B20
  article-title: MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-12-491
– volume: 4
  start-page: e1000148
  year: 2008
  ident: 2023012312141952300_btad014-B21
  article-title: Evolutionary convergence on highly-conserved 3′ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome
  publication-title: PLoS Genet
  doi: 10.1371/journal.pgen.1000148
– volume: 29
  start-page: 15
  year: 2013
  ident: 2023012312141952300_btad014-B11
  article-title: STAR: ultrafast universal RNA-seq aligner
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts635
– volume: 297
  start-page: 1075
  year: 2000
  ident: 2023012312141952300_btad014-B40
  article-title: Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring
  publication-title: J. Mol. Biol
  doi: 10.1006/jmbi.2000.3641
– volume: 20
  start-page: 278
  year: 2019
  ident: 2023012312141952300_btad014-B25
  article-title: Transcriptome assembly from long-read RNA-seq alignments with StringTie2
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1910-1
– volume: 34
  start-page: 3094
  year: 2018
  ident: 2023012312141952300_btad014-B28
  article-title: Minimap2: pairwise alignment for nucleotide sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty191
– volume: 27
  start-page: 2141
  year: 2011
  ident: 2023012312141952300_btad014-B34
  article-title: genBlastG: using blast searches to build homologous gene models
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr342
– volume: 1962
  start-page: 161
  year: 2019
  ident: 2023012312141952300_btad014-B24
  article-title: GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data
  publication-title: Methods Mol. Biol
  doi: 10.1007/978-1-4939-9173-0_9
– volume: 3
  start-page: 20
  year: 2008
  ident: 2023012312141952300_btad014-B23
  article-title: Splign: algorithms for computing spliced alignments with identification of paralogs
  publication-title: Biol. Direct
  doi: 10.1186/1745-6150-3-20
– volume: 5
  start-page: 56
  year: 1997
  ident: 2023012312141952300_btad014-B4
  article-title: Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison
  publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol
– volume: 24
  start-page: 2438
  year: 2008
  ident: 2023012312141952300_btad014-B16
  article-title: Direct mapping and alignment of protein sequences onto genomic sequence
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn460
– volume: 32
  start-page: 380
  year: 2004
  ident: 2023012312141952300_btad014-B13
  article-title: Local homology recognition and distance measures in linear time using compressed amino acid alphabets
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkh180
– volume: 3
  start-page: lqaa108
  year: 2021
  ident: 2023012312141952300_btad014-B7
  article-title: BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
  publication-title: NAR Genom. Bioinform
  doi: 10.1093/nargab/lqaa108
– volume: 40
  start-page: e161
  year: 2012
  ident: 2023012312141952300_btad014-B22
  article-title: Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks708
– volume: 2
  start-page: lqaa026
  year: 2020
  ident: 2023012312141952300_btad014-B6
  article-title: GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins
  publication-title: NAR Genom. Bioinform
  doi: 10.1093/nargab/lqaa026
– volume: 592
  start-page: 737
  year: 2021
  ident: 2023012312141952300_btad014-B32
  article-title: Towards complete and error-free genome assemblies of all vertebrate species
  publication-title: Nature
  doi: 10.1038/s41586-021-03451-0
– volume: 22
  start-page: 249
  year: 2021
  ident: 2023012312141952300_btad014-B2
  article-title: Technology dictates algorithms: recent developments in read alignment
  publication-title: Genome Biol
  doi: 10.1186/s13059-021-02443-7
SSID ssj0005056
Score 2.7101297
Snippet Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of...
Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed...
Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
SubjectTerms Algorithms
Alignment
Dynamic programming
Gene mapping
Gene sequencing
Genome
Genomes
Original Paper
Peptide mapping
Proteins
Sequence Alignment
Sequence Analysis, DNA - methods
Software
Title Protein-to-genome alignment with miniprot
URI https://www.ncbi.nlm.nih.gov/pubmed/36648328
https://www.proquest.com/docview/3133522790
https://www.proquest.com/docview/2766432977
https://pubmed.ncbi.nlm.nih.gov/PMC9869432
Volume 39
WOSCitedRecordID wos000940926100075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: DOA
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFD7MoeCL90t1jgq-KJS1Tdc0jyIOn-Ye9rC3kjSJFrSVrRP895603VwH4uWtkIQ2J7ev5-R8H8CVoJJz6SknVGFiSLU9R_jG26QCpjmjUjJRik3Q4TCaTNioBd4iF2Y9hM9IT6R5TSJqiIt7ouDSLaWrvX5kNAvGj5OvSx14ni_ygL9t2jiCGmltK-hy_ZLkyqkz2P3H9-7BTg0x7dtqTuxDS2UHsFWJTn4cwvXIUDOkmVPkjmFofVU2YvGn8laAbdyytqEbMfwNRzAe3I_vHpxaMMFJgigoHCJ93K1kJLVmCSNJIKnE_xmJOIAz7QumPcI1dWXCXSF44GodGPlIxTQ-SnIM7SzP1CnYREqtwogyBJAIMXAn0v1Q-wntK8YDFVrQX5gwTmoycaNp8RJXQW0SN60Q11awoLds91bRafzY4gZH6NeVO4uBjOu1OIuJZ_LKfOyLBZfLYlxFJjTCM5XPZ7FPQ4RmPoJhC06qcV--kmAR7nuRBbQxI5YVDEN3syRLn0umbhaFaD7_7C99OIdtI2lfuXk60C6mc3UBm8l7kc6mXdigk6hbeg665dT_BEPUDls
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Protein-to-genome+alignment+with+miniprot&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Li%2C+Heng&rft.date=2023-01-01&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=39&rft.issue=1&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtad014&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon