Signed rearrangement distances considering repeated genes, intergenic regions, and indels

Genome rearrangement distance problems allow to estimate the evolutionary distance between genomes. These problems aim to compute the minimum number of mutations called rearrangement events necessary to transform one genome into another. Two commonly studied rearrangements are the reversal, which in...

Full description

Saved in:
Bibliographic Details
Published in:Journal of combinatorial optimization Vol. 46; no. 2; p. 16
Main Authors: Siqueira, Gabriel, Alexandrino, Alexsandro Oliveira, Dias, Zanoni
Format: Journal Article
Language:English
Published: New York Springer US 01.09.2023
Springer Nature B.V
Subjects:
ISSN:1382-6905, 1573-2886
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Genome rearrangement distance problems allow to estimate the evolutionary distance between genomes. These problems aim to compute the minimum number of mutations called rearrangement events necessary to transform one genome into another. Two commonly studied rearrangements are the reversal, which inverts a sequence of genes, and the transposition, which exchanges two consecutive sequences of genes. Seminal works on this topic focused on the sequence of genes and assumed that each gene occurs exactly once on each genome. More realistic models have been assuming that a gene may have multiple copies or may appear in only one of the genomes. Other models also take into account the nucleotides between consecutive pairs of genes, which are called intergenic regions. This work combines all these generalizations defining the signed intergenic reversal distance (SIRD), the signed intergenic reversal and transposition distance (SIRTD), the signed intergenic reversal and indels distance (SIRID), and the signed intergenic reversal, transposition, and indels distance (SIRTID) problems. We show a relation between these problems and the signed minimum common intergenic string partition (SMCISP) problem. From such relation, we derive Θ ( k ) -approximation algorithms for the SIRD and the SIRTD problems, where k is maximum number of copies of a gene in the genomes. These algorithms also work as heuristics for the SIRID and SIRTID problems. Additionally, we present some parametrized algorithms for SMCISP that ensure constant approximation factors for the distance problems. Our experimental tests on simulated genomes show an improvement on the rearrangement distances with the use of the partition algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1382-6905
1573-2886
DOI:10.1007/s10878-023-01083-w