Protein-to-genome alignment with miniprot
Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not k...
Saved in:
| Published in: | Bioinformatics (Oxford, England) Vol. 39; no. 1 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
Oxford University Press
01.01.2023
Oxford Publishing Limited (England) |
| Subjects: | |
| ISSN: | 1367-4811, 1367-4803, 1367-4811 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Abstract
Motivation
Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.
Results
Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.
Availability and implementation
https://github.com/lh3/miniport. |
|---|---|
| AbstractList | Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.MOTIVATIONProtein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.RESULTSHere, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.https://github.com/lh3/miniport.AVAILABILITY AND IMPLEMENTATIONhttps://github.com/lh3/miniport. Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation https://github.com/lh3/miniport. Abstract Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation https://github.com/lh3/miniport. Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. https://github.com/lh3/miniport. |
| Author | Li, Heng |
| Author_xml | – sequence: 1 givenname: Heng orcidid: 0000-0003-4874-2874 surname: Li fullname: Li, Heng email: hli@ds.dfci.harvard.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36648328$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkUtLAzEUhYNUbH38hVJwo4vRZJLOTEAEKb6goIvuQyaPNmUmqUlG8d-b0iq1G13lQr5zcm7OMehZZxUAQwSvEKT4ujbOWO18y6MR4bqOXEJEDsAA4aLMSIVQb2fug-MQlhDCMRwXR6CPi4JUOK8G4PLVu6iMzaLL5sq6Vo14Y-a2VTaOPkxcjFpjzSpBp-BQ8yaos-15AmYP97PJUzZ9eXye3E0zQSoSMyxzKZWspNZUUCyILCWlhYSw5FTnNdUIc11CKTisa06g1qSiFVVUp1HiE3C7sV11daukSDk8b9jKm5b7T-a4Yb9vrFmwuXtntCoowXkyuNgaePfWqRBZa4JQTcOtcl1geZmWxzkty4Se76FL13mbtmMYYTzO85LCRA13E_1E-f7EBNxsAOFdCF5pJkxMvbh1QNMwBNm6M_a7M7btLMmLPfn3C38K0UboutV_NV93Brac |
| CitedBy_id | crossref_primary_10_1038_s41597_025_04893_1 crossref_primary_10_1038_s41467_025_59565_w crossref_primary_10_1038_s41597_024_03849_1 crossref_primary_10_1038_s41597_025_05271_7 crossref_primary_10_1038_s41597_025_05402_0 crossref_primary_10_1111_eva_70106 crossref_primary_10_12688_f1000research_166849_1 crossref_primary_10_1093_nargab_lqae072 crossref_primary_10_1093_gigascience_giae032 crossref_primary_10_1073_pnas_2503368122 crossref_primary_10_1093_gbe_evaf052 crossref_primary_10_1093_bioadv_vbad162 crossref_primary_10_1038_s41597_025_04421_1 crossref_primary_10_1038_s41597_025_05479_7 crossref_primary_10_1038_s41541_025_01231_9 crossref_primary_10_1111_tpj_70319 crossref_primary_10_48130_gcomm_0025_0016 crossref_primary_10_1099_mgen_0_001396 crossref_primary_10_1002_ece3_70874 crossref_primary_10_1186_s12864_024_10521_w crossref_primary_10_1186_s12859_023_05449_z crossref_primary_10_1093_g3journal_jkae195 crossref_primary_10_1038_s41467_023_43556_w crossref_primary_10_1038_s41597_024_04301_0 crossref_primary_10_1093_g3journal_jkaf162 crossref_primary_10_1073_pnas_2409943121 crossref_primary_10_1002_ps_7789 crossref_primary_10_1038_s41597_024_04284_y crossref_primary_10_1093_bioadv_vbaf079 crossref_primary_10_1007_s10126_023_10248_x crossref_primary_10_1038_s41597_024_03322_z crossref_primary_10_1093_molbev_msaf030 crossref_primary_10_1093_molbev_msaf151 crossref_primary_10_1038_s41597_024_03514_7 crossref_primary_10_1038_s41597_024_03070_0 crossref_primary_10_1038_s42003_025_08629_0 crossref_primary_10_1038_s41576_024_00718_w crossref_primary_10_1111_tpj_17158 crossref_primary_10_1038_s41597_024_03846_4 crossref_primary_10_1111_1751_7915_70201 crossref_primary_10_1126_science_ado1663 crossref_primary_10_1186_s13059_023_03071_z crossref_primary_10_48130_gcomm_0025_0006 crossref_primary_10_1038_s41597_025_04418_w crossref_primary_10_1038_s41598_024_70018_0 crossref_primary_10_1093_bioinformatics_btae456 crossref_primary_10_1002_jsfa_70145 crossref_primary_10_1371_journal_pgen_1011512 crossref_primary_10_1016_j_micpath_2025_107281 crossref_primary_10_1038_s41467_024_52384_5 crossref_primary_10_1016_j_ympev_2023_107968 crossref_primary_10_1093_gigascience_giaf105 crossref_primary_10_1093_molbev_msae250 crossref_primary_10_3389_fnins_2024_1357873 crossref_primary_10_1038_s41586_023_05936_6 crossref_primary_10_1038_s41597_024_03906_9 crossref_primary_10_1038_s41597_025_05415_9 crossref_primary_10_1111_mec_17627 crossref_primary_10_1038_s41597_025_05057_x crossref_primary_10_1038_s41597_025_04947_4 crossref_primary_10_1186_s12863_024_01261_7 crossref_primary_10_1126_science_adp7978 crossref_primary_10_12688_openreseurope_17365_1 crossref_primary_10_12688_openreseurope_17365_2 crossref_primary_10_1002_ece3_70734 crossref_primary_10_1002_ece3_71134 crossref_primary_10_1038_s41597_025_05116_3 crossref_primary_10_1186_s13059_024_03359_8 crossref_primary_10_3389_ffgc_2023_1240804 crossref_primary_10_1038_s41559_025_02642_6 crossref_primary_10_1016_j_visres_2024_108447 crossref_primary_10_1038_s41597_024_04333_6 crossref_primary_10_1038_s41597_025_05373_2 crossref_primary_10_1038_s41597_025_04837_9 crossref_primary_10_1093_g3journal_jkaf061 crossref_primary_10_1093_gigascience_giae124 crossref_primary_10_1111_zsc_12687 crossref_primary_10_1038_s41586_024_07070_3 crossref_primary_10_1007_s10592_023_01575_6 crossref_primary_10_1093_isd_ixaf027 crossref_primary_10_1093_jhered_esaf034 crossref_primary_10_1093_jhered_esaf036 crossref_primary_10_1101_gr_279569_124 crossref_primary_10_1038_s41564_025_02084_7 crossref_primary_10_3389_fpls_2024_1437132 crossref_primary_10_1038_s41597_024_04222_y crossref_primary_10_1093_bioinformatics_btae517 crossref_primary_10_1093_nar_gkad834 crossref_primary_10_1128_spectrum_02988_23 crossref_primary_10_1038_s41597_024_03260_w crossref_primary_10_1038_s42003_024_06550_6 crossref_primary_10_1101_gr_279364_124 crossref_primary_10_3390_plants14010124 crossref_primary_10_1038_s41467_025_62544_w crossref_primary_10_1038_s41597_025_04631_7 crossref_primary_10_1093_gigascience_giae118 crossref_primary_10_1038_s41597_025_04737_y crossref_primary_10_1093_bioinformatics_btaf219 crossref_primary_10_3390_ijms241310755 crossref_primary_10_1186_s12864_023_09678_7 crossref_primary_10_1038_s41586_025_09270_x crossref_primary_10_12688_f1000research_156485_1 crossref_primary_10_1038_s41597_025_05747_6 crossref_primary_10_3389_fpls_2024_1352253 crossref_primary_10_1038_s41597_024_02988_9 crossref_primary_10_1093_jhered_esae042 crossref_primary_10_1093_nargab_lqaf110 crossref_primary_10_1021_acs_biochem_5c00186 crossref_primary_10_1111_mec_70103 crossref_primary_10_59717_j_xinn_life_2025_100144 crossref_primary_10_1038_s41588_025_02113_5 crossref_primary_10_1073_pnas_2501111122 crossref_primary_10_1038_s41597_024_02965_2 crossref_primary_10_1093_g3journal_jkaf127 crossref_primary_10_1093_g3journal_jkaf005 crossref_primary_10_1093_genetics_iyad016 crossref_primary_10_1038_s41597_024_03905_w crossref_primary_10_1093_nar_gkad814 crossref_primary_10_1038_s41597_024_03157_8 crossref_primary_10_1093_molbev_msae169 crossref_primary_10_1093_nar_gkae987 crossref_primary_10_1038_s41597_025_04764_9 crossref_primary_10_1038_s41597_024_04350_5 crossref_primary_10_1038_s41467_025_60222_5 crossref_primary_10_1111_1755_0998_13823 crossref_primary_10_1101_gr_278566_123 crossref_primary_10_1093_g3journal_jkaf030 crossref_primary_10_1186_s12864_024_10829_7 crossref_primary_10_1186_s13015_025_00275_9 crossref_primary_10_1093_g3journal_jkae223 crossref_primary_10_1093_nar_gkaf045 crossref_primary_10_1126_sciadv_adq3938 crossref_primary_10_1111_jipb_13748 crossref_primary_10_1186_s12915_025_02328_2 crossref_primary_10_1038_s41467_025_61387_9 crossref_primary_10_1038_s41597_025_05573_w crossref_primary_10_1101_gr_280377_124 crossref_primary_10_1111_mec_17147 crossref_primary_10_1038_s41597_025_05114_5 crossref_primary_10_1093_dnares_dsaf010 crossref_primary_10_1186_s12864_025_11507_y crossref_primary_10_3389_fgene_2025_1502681 crossref_primary_10_1038_s41586_025_08619_6 crossref_primary_10_1093_g3journal_jkad126 crossref_primary_10_1186_s12864_025_11332_3 crossref_primary_10_1038_s41597_023_02509_0 crossref_primary_10_1093_molbev_msae164 crossref_primary_10_1038_s41597_025_04661_1 crossref_primary_10_1038_s41589_024_01735_w |
| Cites_doi | 10.1038/s41592-020-01056-5 10.1186/s12864-020-6707-9 10.1101/gr.1865504 10.1038/s41587-019-0217-9 10.1093/bioinformatics/btl582 10.1101/gr.263566.120 10.1016/j.infsof.2005.09.005 10.1093/molbev/msab199 10.1101/gr.6743907 10.1073/pnas.1720115115 10.1093/bioinformatics/btaa1016 10.1038/nrg.2016.46 10.1186/gb-2008-9-1-r7 10.1371/journal.pcbi.1002195 10.1089/cmb.1997.4.339 10.1093/nar/gkl556 10.1186/1471-2105-6-31 10.1093/database/baw093 10.1186/1471-2105-8-349 10.1016/S0092-8240(86)90010-8 10.1073/pnas.89.22.10915 10.1101/gr.233460.117 10.1038/s41587-022-01261-x 10.1093/bioinformatics/btw152 10.1093/bioinformatics/bti310 10.1038/nbt.3988 10.1186/1471-2105-12-491 10.1371/journal.pgen.1000148 10.1093/bioinformatics/bts635 10.1006/jmbi.2000.3641 10.1186/s13059-019-1910-1 10.1093/bioinformatics/bty191 10.1093/bioinformatics/btr342 10.1007/978-1-4939-9173-0_9 10.1186/1745-6150-3-20 10.1093/bioinformatics/btn460 10.1093/nar/gkh180 10.1093/nargab/lqaa108 10.1093/nar/gks708 10.1093/nargab/lqaa026 10.1038/s41586-021-03451-0 10.1186/s13059-021-02443-7 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2023. Published by Oxford University Press. 2023 The Author(s) 2023. Published by Oxford University Press. |
| Copyright_xml | – notice: The Author(s) 2023. Published by Oxford University Press. 2023 – notice: The Author(s) 2023. Published by Oxford University Press. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM |
| DOI | 10.1093/bioinformatics/btad014 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic Materials Research Database MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| ExternalDocumentID | PMC9869432 36648328 10_1093_bioinformatics_btad014 10.1093/bioinformatics/btad014 |
| Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: R01HG010040 – fundername: NHGRI NIH HHS grantid: R01 HG010040 – fundername: ; grantid: 237653 – fundername: ; grantid: R01HG010040 |
| GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX CITATION ROX ADRIX AFXEN BCRHZ CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM |
| ID | FETCH-LOGICAL-c484t-3d2dded8dff9c93c4d7d996d007a9f2b9f13af70dca0bba40ff48989e9f0ffd3 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 200 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000940926100075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4811 1367-4803 |
| IngestDate | Thu Aug 21 18:38:42 EDT 2025 Fri Jul 11 15:20:56 EDT 2025 Mon Oct 06 17:46:57 EDT 2025 Wed Feb 19 02:08:31 EST 2025 Sat Nov 29 03:49:26 EST 2025 Tue Nov 18 21:26:55 EST 2025 Wed Apr 02 07:03:59 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2023. Published by Oxford University Press. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c484t-3d2dded8dff9c93c4d7d996d007a9f2b9f13af70dca0bba40ff48989e9f0ffd3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0003-4874-2874 |
| OpenAccessLink | https://dx.doi.org/10.1093/bioinformatics/btad014 |
| PMID | 36648328 |
| PQID | 3133522790 |
| PQPubID | 36124 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9869432 proquest_miscellaneous_2766432977 proquest_journals_3133522790 pubmed_primary_36648328 crossref_citationtrail_10_1093_bioinformatics_btad014 crossref_primary_10_1093_bioinformatics_btad014 oup_primary_10_1093_bioinformatics_btad014 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-01-01 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England – name: Oxford |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2023 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Li (2023012312141952300_btad014-B27) 2016; 32 Alser (2023012312141952300_btad014-B2) 2021; 22 Kovaka (2023012312141952300_btad014-B25) 2019; 20 Li (2023012312141952300_btad014-B29) 2007; 8 Scalzitti (2023012312141952300_btad014-B33) 2020; 21 Aken (2023012312141952300_btad014-B1) 2016; 2016 Shumate (2023012312141952300_btad014-B36) 2020; 37 Gotoh (2023012312141952300_btad014-B16) 2008; 24 Nurk (2023012312141952300_btad014-B31) 2020; 30 Slater (2023012312141952300_btad014-B38) 2005; 6 Edgar (2023012312141952300_btad014-B13) 2004; 32 Kapustin (2023012312141952300_btad014-B23) 2008; 3 Wu (2023012312141952300_btad014-B42) 2005; 21 Brůna (2023012312141952300_btad014-B6) 2020; 2 Manni (2023012312141952300_btad014-B30) 2021; 38 Altschul (2023012312141952300_btad014-B3) 1986; 48 Birney (2023012312141952300_btad014-B4) 1997; 5 Dobin (2023012312141952300_btad014-B11) 2013; 29 Keilwagen (2023012312141952300_btad014-B24) 2019; 1962 Zhang (2023012312141952300_btad014-B43) 1997; 4 Irimia (2023012312141952300_btad014-B21) 2008; 4 Brůna (2023012312141952300_btad014-B7) 2021; 3 Birney (2023012312141952300_btad014-B5) 2004; 14 Wenger (2023012312141952300_btad014-B41) 2019; 37 Cantarel (2023012312141952300_btad014-B8) 2008; 18 Cheng (2023012312141952300_btad014-B9) 2021; 18 Cheng (2023012312141952300_btad014-B10) 2022; 40 Henikoff (2023012312141952300_btad014-B19) 1992; 89 Sheth (2023012312141952300_btad014-B35) 2006; 34 Steinegger (2023012312141952300_btad014-B39) 2017; 35 Haas (2023012312141952300_btad014-B18) 2008; 9 Lewin (2023012312141952300_btad014-B26) 2018; 115 Usuka (2023012312141952300_btad014-B40) 2000; 297 Iwata (2023012312141952300_btad014-B22) 2012; 40 Farrar (2023012312141952300_btad014-B14) 2007; 23 Eddy (2023012312141952300_btad014-B12) 2011; 7 Sibley (2023012312141952300_btad014-B37) 2016; 17 Holt (2023012312141952300_btad014-B20) 2011; 12 She (2023012312141952300_btad014-B34) 2011; 27 Li (2023012312141952300_btad014-B28) 2018; 34 Fiddes (2023012312141952300_btad014-B15) 2018; 28 Rhie (2023012312141952300_btad014-B32) 2021; 592 Gremme (2023012312141952300_btad014-B17) 2005; 47 |
| References_xml | – volume: 18 start-page: 170 year: 2021 ident: 2023012312141952300_btad014-B9 article-title: Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm publication-title: Nat. Methods doi: 10.1038/s41592-020-01056-5 – volume: 21 start-page: 293 year: 2020 ident: 2023012312141952300_btad014-B33 article-title: A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms publication-title: BMC Genomics doi: 10.1186/s12864-020-6707-9 – volume: 14 start-page: 988 year: 2004 ident: 2023012312141952300_btad014-B5 article-title: Genewise and genomewise publication-title: Genome Res doi: 10.1101/gr.1865504 – volume: 37 start-page: 1155 year: 2019 ident: 2023012312141952300_btad014-B41 article-title: Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome publication-title: Nat. Biotechnol doi: 10.1038/s41587-019-0217-9 – volume: 23 start-page: 156 year: 2007 ident: 2023012312141952300_btad014-B14 article-title: Striped Smith-Waterman speeds database searches six times over other SIMD implementations publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl582 – volume: 30 start-page: 1291 year: 2020 ident: 2023012312141952300_btad014-B31 article-title: HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads publication-title: Genome Res doi: 10.1101/gr.263566.120 – volume: 47 start-page: 965 year: 2005 ident: 2023012312141952300_btad014-B17 article-title: Engineering a software tool for gene structure prediction in higher organisms publication-title: Inf. Softw. Technol doi: 10.1016/j.infsof.2005.09.005 – volume: 38 start-page: 4647 year: 2021 ident: 2023012312141952300_btad014-B30 article-title: BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes publication-title: Mol. Biol. Evol doi: 10.1093/molbev/msab199 – volume: 18 start-page: 188 year: 2008 ident: 2023012312141952300_btad014-B8 article-title: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes publication-title: Genome Res doi: 10.1101/gr.6743907 – volume: 115 start-page: 4325 year: 2018 ident: 2023012312141952300_btad014-B26 article-title: Earth BioGenome project: sequencing life for the future of life publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.1720115115 – volume: 37 start-page: 1639 year: 2020 ident: 2023012312141952300_btad014-B36 article-title: Liftoff: accurate mapping of gene annotations publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa1016 – volume: 17 start-page: 407 year: 2016 ident: 2023012312141952300_btad014-B37 article-title: Lessons from non-canonical splicing publication-title: Nat. Rev. Genet doi: 10.1038/nrg.2016.46 – volume: 9 start-page: R7 year: 2008 ident: 2023012312141952300_btad014-B18 article-title: Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments publication-title: Genome Biol doi: 10.1186/gb-2008-9-1-r7 – volume: 7 start-page: e1002195 year: 2011 ident: 2023012312141952300_btad014-B12 article-title: Accelerated profile HMM searches publication-title: PLoS Comput. Biol doi: 10.1371/journal.pcbi.1002195 – volume: 4 start-page: 339 year: 1997 ident: 2023012312141952300_btad014-B43 article-title: Aligning a DNA sequence with a protein sequence publication-title: J. Comput. Biol doi: 10.1089/cmb.1997.4.339 – volume: 34 start-page: 3955 year: 2006 ident: 2023012312141952300_btad014-B35 article-title: Comprehensive splice-site analysis using comparative genomics publication-title: Nucleic Acids Res doi: 10.1093/nar/gkl556 – volume: 6 start-page: 31 year: 2005 ident: 2023012312141952300_btad014-B38 article-title: Automated generation of heuristics for biological sequence comparison publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-6-31 – volume: 2016 start-page: baw093 year: 2016 ident: 2023012312141952300_btad014-B1 article-title: The Ensembl gene annotation system publication-title: Database (Oxford) doi: 10.1093/database/baw093 – volume: 8 start-page: 349 year: 2007 ident: 2023012312141952300_btad014-B29 article-title: A cross-species alignment tool (CAT) publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-8-349 – volume: 48 start-page: 603 year: 1986 ident: 2023012312141952300_btad014-B3 article-title: Optimal sequence alignment using affine gap costs publication-title: Bull. Math. Biol doi: 10.1016/S0092-8240(86)90010-8 – volume: 89 start-page: 10915 year: 1992 ident: 2023012312141952300_btad014-B19 article-title: Amino acid substitution matrices from protein blocks publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.89.22.10915 – volume: 28 start-page: 1029 year: 2018 ident: 2023012312141952300_btad014-B15 article-title: Comparative annotation toolkit (CAT)-simultaneous clade and personal genome annotation publication-title: Genome Res doi: 10.1101/gr.233460.117 – volume: 40 start-page: 1332 year: 2022 ident: 2023012312141952300_btad014-B10 article-title: Haplotype-resolved assembly of diploid genomes without parental data publication-title: Nat. Biotechnol doi: 10.1038/s41587-022-01261-x – volume: 32 start-page: 2103 year: 2016 ident: 2023012312141952300_btad014-B27 article-title: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw152 – volume: 21 start-page: 1859 year: 2005 ident: 2023012312141952300_btad014-B42 article-title: GMAP: a genomic mapping and alignment program for mRNA and EST sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti310 – volume: 35 start-page: 1026 year: 2017 ident: 2023012312141952300_btad014-B39 article-title: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets publication-title: Nat. Biotechnol doi: 10.1038/nbt.3988 – volume: 12 start-page: 491 year: 2011 ident: 2023012312141952300_btad014-B20 article-title: MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-12-491 – volume: 4 start-page: e1000148 year: 2008 ident: 2023012312141952300_btad014-B21 article-title: Evolutionary convergence on highly-conserved 3′ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome publication-title: PLoS Genet doi: 10.1371/journal.pgen.1000148 – volume: 29 start-page: 15 year: 2013 ident: 2023012312141952300_btad014-B11 article-title: STAR: ultrafast universal RNA-seq aligner publication-title: Bioinformatics doi: 10.1093/bioinformatics/bts635 – volume: 297 start-page: 1075 year: 2000 ident: 2023012312141952300_btad014-B40 article-title: Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring publication-title: J. Mol. Biol doi: 10.1006/jmbi.2000.3641 – volume: 20 start-page: 278 year: 2019 ident: 2023012312141952300_btad014-B25 article-title: Transcriptome assembly from long-read RNA-seq alignments with StringTie2 publication-title: Genome Biol doi: 10.1186/s13059-019-1910-1 – volume: 34 start-page: 3094 year: 2018 ident: 2023012312141952300_btad014-B28 article-title: Minimap2: pairwise alignment for nucleotide sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty191 – volume: 27 start-page: 2141 year: 2011 ident: 2023012312141952300_btad014-B34 article-title: genBlastG: using blast searches to build homologous gene models publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr342 – volume: 1962 start-page: 161 year: 2019 ident: 2023012312141952300_btad014-B24 article-title: GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data publication-title: Methods Mol. Biol doi: 10.1007/978-1-4939-9173-0_9 – volume: 3 start-page: 20 year: 2008 ident: 2023012312141952300_btad014-B23 article-title: Splign: algorithms for computing spliced alignments with identification of paralogs publication-title: Biol. Direct doi: 10.1186/1745-6150-3-20 – volume: 5 start-page: 56 year: 1997 ident: 2023012312141952300_btad014-B4 article-title: Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol – volume: 24 start-page: 2438 year: 2008 ident: 2023012312141952300_btad014-B16 article-title: Direct mapping and alignment of protein sequences onto genomic sequence publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn460 – volume: 32 start-page: 380 year: 2004 ident: 2023012312141952300_btad014-B13 article-title: Local homology recognition and distance measures in linear time using compressed amino acid alphabets publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh180 – volume: 3 start-page: lqaa108 year: 2021 ident: 2023012312141952300_btad014-B7 article-title: BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database publication-title: NAR Genom. Bioinform doi: 10.1093/nargab/lqaa108 – volume: 40 start-page: e161 year: 2012 ident: 2023012312141952300_btad014-B22 article-title: Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features publication-title: Nucleic Acids Res doi: 10.1093/nar/gks708 – volume: 2 start-page: lqaa026 year: 2020 ident: 2023012312141952300_btad014-B6 article-title: GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins publication-title: NAR Genom. Bioinform doi: 10.1093/nargab/lqaa026 – volume: 592 start-page: 737 year: 2021 ident: 2023012312141952300_btad014-B32 article-title: Towards complete and error-free genome assemblies of all vertebrate species publication-title: Nature doi: 10.1038/s41586-021-03451-0 – volume: 22 start-page: 249 year: 2021 ident: 2023012312141952300_btad014-B2 article-title: Technology dictates algorithms: recent developments in read alignment publication-title: Genome Biol doi: 10.1186/s13059-021-02443-7 |
| SSID | ssj0005056 |
| Score | 2.7101297 |
| Snippet | Abstract
Motivation
Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of... Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed... Motivation Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were... |
| SourceID | pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| SubjectTerms | Algorithms Alignment Dynamic programming Gene mapping Gene sequencing Genome Genomes Original Paper Peptide mapping Proteins Sequence Alignment Sequence Analysis, DNA - methods Software |
| Title | Protein-to-genome alignment with miniprot |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/36648328 https://www.proquest.com/docview/3133522790 https://www.proquest.com/docview/2766432977 https://pubmed.ncbi.nlm.nih.gov/PMC9869432 |
| Volume | 39 |
| WOSCitedRecordID | wos000940926100075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: DOA dateStart: 20230101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFD7MoeCL90t1jgq-KJS1Tdc0jyIOn-Ye9rC3kjSJFrSVrRP895603VwH4uWtkIQ2J7ev5-R8H8CVoJJz6SknVGFiSLU9R_jG26QCpjmjUjJRik3Q4TCaTNioBd4iF2Y9hM9IT6R5TSJqiIt7ouDSLaWrvX5kNAvGj5OvSx14ni_ygL9t2jiCGmltK-hy_ZLkyqkz2P3H9-7BTg0x7dtqTuxDS2UHsFWJTn4cwvXIUDOkmVPkjmFofVU2YvGn8laAbdyytqEbMfwNRzAe3I_vHpxaMMFJgigoHCJ93K1kJLVmCSNJIKnE_xmJOIAz7QumPcI1dWXCXSF44GodGPlIxTQ-SnIM7SzP1CnYREqtwogyBJAIMXAn0v1Q-wntK8YDFVrQX5gwTmoycaNp8RJXQW0SN60Q11awoLds91bRafzY4gZH6NeVO4uBjOu1OIuJZ_LKfOyLBZfLYlxFJjTCM5XPZ7FPQ4RmPoJhC06qcV--kmAR7nuRBbQxI5YVDEN3syRLn0umbhaFaD7_7C99OIdtI2lfuXk60C6mc3UBm8l7kc6mXdigk6hbeg665dT_BEPUDls |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Protein-to-genome+alignment+with+miniprot&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Li%2C+Heng&rft.date=2023-01-01&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=39&rft.issue=1&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtad014&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon |