A Modified Median String Algorithm for Gene Regulatory Motif Classification

Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l...

Full description

Saved in:
Bibliographic Details
Published in:Symmetry (Basel) Vol. 12; no. 8; p. 1363
Main Authors: Kaysar, Mohammad Shibli, Khan, Mohammad Ibrahim
Format: Journal Article
Language:English
Published: Basel MDPI AG 01.08.2020
Subjects:
ISSN:2073-8994, 2073-8994
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l-mers from the alphabet to search a motif of length l. Out of all possible l-mers, it finds the consensus. This algorithm guarantees to return the consensus but this is NP-complete and runtime increases with the increase in l-mer size. Using transitional probability from the Markov chain, the proposed algorithm symmetrically generates four subsets of l-mers. Each of the subsets contains a few l-mers starting with a particular letter. We used these reduced sets of l-mers instead of using 4ll-mers. The experimental result shows that the proposed algorithm produces a much lower number of l-mers and takes less time to execute. In the case of l-mer of length 7, the proposed system is 48 times faster than the median string algorithm. For l-mer of size 7, the proposed algorithm produces only 2.5% l-mer in comparison with the median string algorithm. While compared with the recently proposed voting algorithm, our proposed algorithm is found to be 4.4 times faster for a longer l-mer size like 9.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2073-8994
2073-8994
DOI:10.3390/sym12081363