Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper

Summary SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization wi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics (Oxford, England) Ročník 40; číslo 5
Hlavní autoři: Crauwels, Charlotte, Heidig, Sophie-Luise, Díaz, Adrián, Vranken, Wim F
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 02.05.2024
Oxford Publishing Limited (England)
Témata:
ISSN:1367-4811, 1367-4803, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Summary SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. Availability and implementation The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
Bibliografie:SourceType-Scholarly Journals-1
content type line 14
ObjectType-Report-1
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
Charlotte Crauwels and Sophie-Luise Heidig Equal contribution.
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae276