A Modified Median String Algorithm for Gene Regulatory Motif Classification
Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l...
Gespeichert in:
| Veröffentlicht in: | Symmetry (Basel) Jg. 12; H. 8; S. 1363 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Basel
MDPI AG
01.08.2020
|
| Schlagworte: | |
| ISSN: | 2073-8994, 2073-8994 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l-mers from the alphabet to search a motif of length l. Out of all possible l-mers, it finds the consensus. This algorithm guarantees to return the consensus but this is NP-complete and runtime increases with the increase in l-mer size. Using transitional probability from the Markov chain, the proposed algorithm symmetrically generates four subsets of l-mers. Each of the subsets contains a few l-mers starting with a particular letter. We used these reduced sets of l-mers instead of using 4ll-mers. The experimental result shows that the proposed algorithm produces a much lower number of l-mers and takes less time to execute. In the case of l-mer of length 7, the proposed system is 48 times faster than the median string algorithm. For l-mer of size 7, the proposed algorithm produces only 2.5% l-mer in comparison with the median string algorithm. While compared with the recently proposed voting algorithm, our proposed algorithm is found to be 4.4 times faster for a longer l-mer size like 9. |
|---|---|
| AbstractList | Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA consensus. A DNA sequence is represented using the alphabet Σ= {a, c, g, t}. The algorithm generates a set of all the 4l possible motifs or l-mers from the alphabet to search a motif of length l. Out of all possible l-mers, it finds the consensus. This algorithm guarantees to return the consensus but this is NP-complete and runtime increases with the increase in l-mer size. Using transitional probability from the Markov chain, the proposed algorithm symmetrically generates four subsets of l-mers. Each of the subsets contains a few l-mers starting with a particular letter. We used these reduced sets of l-mers instead of using 4ll-mers. The experimental result shows that the proposed algorithm produces a much lower number of l-mers and takes less time to execute. In the case of l-mer of length 7, the proposed system is 48 times faster than the median string algorithm. For l-mer of size 7, the proposed algorithm produces only 2.5% l-mer in comparison with the median string algorithm. While compared with the recently proposed voting algorithm, our proposed algorithm is found to be 4.4 times faster for a longer l-mer size like 9. |
| Author | Khan, Mohammad Ibrahim Kaysar, Mohammad Shibli |
| Author_xml | – sequence: 1 givenname: Mohammad Shibli surname: Kaysar fullname: Kaysar, Mohammad Shibli – sequence: 2 givenname: Mohammad Ibrahim surname: Khan fullname: Khan, Mohammad Ibrahim |
| BookMark | eNptkE1LAzEQhoNUsNae_AMBj7I6u8l-5FiKVrFF8OO8pMmkpmw3NUkP_fdG66GIc5kZeN75eM_JoHc9EnKZww1jAm7DfpMX0OSsYidkWEDNskYIPjiqz8g4hDWkKKHkFQzJ04QunLbGoqYL1Fb29DV626_opFs5b-PHhhrn6Qx7pC-42nUyOr9PomgNnXYyhCRWMlrXX5BTI7uA4988Iu_3d2_Th2z-PHucTuaZYgAxa0BrVMY0BasUaqOXxigjliXnWAqpsdRSqQJ4Y6RMfY7p_NooEEWNFW_YiFwd5m69-9xhiO3a7XyfVrYFZyWIHJo6UdcHSnkXgkfTbr3dSL9vc2i_DWuPDEt0_odWNv58Fb203b-aL7L4cR8 |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2021_3137767 crossref_primary_10_1007_s00500_021_06178_2 |
| Cites_doi | 10.1504/IJBRA.2014.062990 10.1007/s00453-003-1028-3 10.1126/science.8211139 10.1371/journal.pone.0086044 10.1093/nar/22.22.4673 10.1186/1471-2105-11-S8-S1 10.1007/3-540-44888-8_23 10.1007/3-540-45678-3_38 10.1016/S1570-8667(03)00079-0 10.1145/167088.167170 10.1287/ijoc.1040.0090 10.1007/978-3-540-30219-3_37 10.1137/0149012 10.1016/S0890-5401(03)00057-9 10.1109/TCBB.2014.2306842 10.1142/S0219720013500091 10.1089/1066527041410319 10.1145/506147.506150 10.1007/978-3-540-89097-3_26 10.1007/3-540-63307-3_53 10.1093/bioinformatics/btr459 |
| ContentType | Journal Article |
| Copyright | 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 7SC 7SR 7U5 8BQ 8FD 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO H8D HCIFZ JG9 JQ2 L6V L7M L~C L~D M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.3390/sym12081363 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Engineered Materials Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials - QC ProQuest Central Technology Collection ProQuest One ProQuest Central Aerospace Database SciTech Premium Collection Materials Research Database ProQuest Computer Science Collection ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Engineering Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database (ProQuest) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | CrossRef Publicly Available Content Database Materials Research Database Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Aerospace Database Engineered Materials Abstracts ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection METADEX Computer and Information Systems Abstracts Professional ProQuest One Academic UKI Edition Materials Science & Engineering Collection Solid State and Superconductivity Abstracts ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 2073-8994 |
| ExternalDocumentID | 10_3390_sym12081363 |
| GroupedDBID | 5VS 8FE 8FG AADQD AAYXX ABDBF ABJCF ACUHS ADBBV ADMLS AFFHD AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS AMVHM BCNDV BENPR BGLVJ CCPQU CITATION E3Z ESX GX1 HCIFZ IAO ITC J9A KQ8 L6V M7S MODMG M~E OK1 PHGZM PHGZT PIMPY PQGLB PROAC PTHSS TR2 TUS 7SC 7SR 7U5 8BQ 8FD ABUWG AZQEC DWQXO H8D JG9 JQ2 L7M L~C L~D PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c300t-80ddecff8236cedfdbffcf9b544e59ade5dacc2048faa9ad1e0737fc0927e6483 |
| IEDL.DBID | M7S |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000564789700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2073-8994 |
| IngestDate | Fri Jul 25 12:05:00 EDT 2025 Sat Nov 29 07:15:48 EST 2025 Tue Nov 18 22:11:25 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c300t-80ddecff8236cedfdbffcf9b544e59ade5dacc2048faa9ad1e0737fc0927e6483 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://www.proquest.com/docview/2435091087?pq-origsite=%requestingapplication% |
| PQID | 2435091087 |
| PQPubID | 2032326 |
| ParticipantIDs | proquest_journals_2435091087 crossref_primary_10_3390_sym12081363 crossref_citationtrail_10_3390_sym12081363 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-08-01 |
| PublicationDateYYYYMMDD | 2020-08-01 |
| PublicationDate_xml | – month: 08 year: 2020 text: 2020-08-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Basel |
| PublicationPlace_xml | – name: Basel |
| PublicationTitle | Symmetry (Basel) |
| PublicationYear | 2020 |
| Publisher | MDPI AG |
| Publisher_xml | – name: MDPI AG |
| References | Sun (ref_26) 2011; 27 Thompson (ref_2) 1994; 22 ref_13 ref_11 Meneses (ref_14) 2004; 16 ref_10 Li (ref_12) 2002; 49 Fatma (ref_25) 2019; 11 Tanaka (ref_29) 2014; 11 Lanctot (ref_22) 2003; 185 Kuksa (ref_7) 2010; 11 ref_19 ref_18 ref_17 ref_15 Altschul (ref_8) 1989; 49 Zhang (ref_6) 2013; 11 Kellis (ref_1) 2004; 11 ref_24 ref_23 ref_20 ref_27 Jansson (ref_21) 2004; 2 ref_9 Gramm (ref_16) 2003; 37 Lawrence (ref_4) 1993; 262 ref_5 Bandyopadhyay (ref_28) 2014; 10 Schneider (ref_3) 2002; 1 |
| References_xml | – volume: 10 start-page: 369 year: 2014 ident: ref_28 article-title: PMS6: A fast algorithm for motif discovery publication-title: Int. J. Bioinform. Res. Appl. doi: 10.1504/IJBRA.2014.062990 – ident: ref_9 – volume: 37 start-page: 25 year: 2003 ident: ref_16 article-title: Fixed-Parameter Algorithms for Closest String and Related Problems publication-title: Algorithmica doi: 10.1007/s00453-003-1028-3 – volume: 262 start-page: 208 year: 1993 ident: ref_4 article-title: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment publication-title: Science doi: 10.1126/science.8211139 – ident: ref_5 – ident: ref_27 doi: 10.1371/journal.pone.0086044 – volume: 11 start-page: 130 year: 2019 ident: ref_25 article-title: Review of different sequence motif finding algorithms publication-title: Avicenna J. Med. Biotechnol. – volume: 22 start-page: 4673 year: 1994 ident: ref_2 article-title: CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice publication-title: Nucleic Acids Res. doi: 10.1093/nar/22.22.4673 – volume: 11 start-page: S1 year: 2010 ident: ref_7 article-title: Efficient motif finding algorithms for large-alphabet inputs publication-title: BMC Bioinform. doi: 10.1186/1471-2105-11-S8-S1 – ident: ref_15 doi: 10.1007/3-540-44888-8_23 – ident: ref_10 doi: 10.1007/3-540-45678-3_38 – volume: 2 start-page: 289 year: 2004 ident: ref_21 article-title: Approximation algorithms for Hamming clustering problems publication-title: J. Discret. Algorithms doi: 10.1016/S1570-8667(03)00079-0 – volume: 1 start-page: 111 year: 2002 ident: ref_3 article-title: Consensus sequence Zen publication-title: Appl. Bioinform. – ident: ref_11 doi: 10.1145/167088.167170 – volume: 16 start-page: 419 year: 2004 ident: ref_14 article-title: Optimal Solutions for the Closest-String Problem via Integer Programming publication-title: INFORMS J. Comput. doi: 10.1287/ijoc.1040.0090 – ident: ref_24 doi: 10.1007/978-3-540-30219-3_37 – volume: 49 start-page: 197 year: 1989 ident: ref_8 article-title: Trees, stars, and multiple sequence alignment publication-title: SIAM J. Appl. Math. doi: 10.1137/0149012 – volume: 185 start-page: 41 year: 2003 ident: ref_22 article-title: Distinguishing string selection problems publication-title: Inf. Comput. doi: 10.1016/S0890-5401(03)00057-9 – volume: 11 start-page: 361 year: 2014 ident: ref_29 article-title: Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem publication-title: IEEE/ACM Trans. Comput. Boil. Bioinform. doi: 10.1109/TCBB.2014.2306842 – volume: 11 start-page: 1350009 year: 2013 ident: ref_6 article-title: A heuristic cluster-based em algorithm for the planted (l, d) problem publication-title: J. Bioinform. Comput. Boil. doi: 10.1142/S0219720013500091 – ident: ref_13 – ident: ref_17 – ident: ref_19 – volume: 11 start-page: 319 year: 2004 ident: ref_1 article-title: Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery publication-title: J. Comput. Boil. doi: 10.1089/1066527041410319 – ident: ref_20 – volume: 49 start-page: 157 year: 2002 ident: ref_12 article-title: On the closest string and substring problems publication-title: J. ACM doi: 10.1145/506147.506150 – ident: ref_23 doi: 10.1007/978-3-540-89097-3_26 – ident: ref_18 doi: 10.1007/3-540-63307-3_53 – volume: 27 start-page: 2641 year: 2011 ident: ref_26 article-title: Tree-structured algorithm for long weak motif discovery publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr459 |
| SSID | ssj0000505460 |
| Score | 2.178005 |
| Snippet | Consensus string is a significant feature of a deoxyribonucleic acid (DNA) sequence. The median string is one of the most popular exact algorithms to find DNA... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Enrichment Source Index Database |
| StartPage | 1363 |
| SubjectTerms | Algorithms Alphabets Deoxyribonucleic acid DNA Experiments Markov analysis Markov chains Probability Statistical analysis Strings |
| Title | A Modified Median String Algorithm for Gene Regulatory Motif Classification |
| URI | https://www.proquest.com/docview/2435091087 |
| Volume | 12 |
| WOSCitedRecordID | wos000564789700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2073-8994 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000505460 issn: 2073-8994 databaseCode: M~E dateStart: 20080101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Engineering Database customDbUrl: eissn: 2073-8994 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000505460 issn: 2073-8994 databaseCode: M7S dateStart: 20090301 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2073-8994 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000505460 issn: 2073-8994 databaseCode: BENPR dateStart: 20090301 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2073-8994 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000505460 issn: 2073-8994 databaseCode: PIMPY dateStart: 20090301 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NT8IwFG8UPXhR8SOiSHrgoCYL--i27mTQQDQGQkATPC2lH0oCDNk04eLf7usoKInx4mXJ1nZZ9tr3e-_19f0QqrJAcpc6xJKCcosAxljUkwy8FCqlciSJHJaTTYTtNu33o44JuKUmrXKpE3NFLRKuY-Q1F3BdYxsNr6dvlmaN0rurhkJjE23pKglunrrXW8VYNEsbCezFsTwPvPtaOh87LqCgF3jrQLSuh3Nwae7997P20a4xK3F9MQ-KaENODlDRLNwUX5jq0peH6KGOW4kYKjA9sd6lYRPcy3RwD9dHL_Dm7HWMwZDFegTuLpjqk9kcBmVDhXMOTZ1dlAv0CD01G4-3d5ZhVLC4Z9sZwBFoM66UZjnnUigxUIqraOATIv2ICekLxrmu5asYg3tHggYIFbcjN5QBod4xKkySiTxBOGL-wHWlE4HLRmwRUCJCsDaZo6jPwKYsoavl7425KTeuWS9GMbgdWhbxD1mUUHXVebqosvF7t_JSCLFZamn8LYHTv5vP0I6rneU8e6-MCtnsXZ6jbf6RDdNZBW3dNNqdbiWfQfr62YBnnftW5_kLHurTEg |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3fT9RAEJ4AmuCLikI8RdkHTMCkuXa7bbcPxlxUAjm4GMCEt7K3O4uXcHd4rZr7p_wbnem1CInxjQcem-422c7sNz92dj6AbZOilTpSATptA0U2JtAxGopSNKKPUOWRqckmssFAn53lX5bgd3sXhssqW0ysgdpNLefIu5LsOts2nX24-h4waxSfrrYUGgu16OP8F4Vs5fuDTyTft1LufT79uB80rAKBjcOwIkimHW29Z6Zvi867offW58NEKUxy4zBxxlruZ-uNoecIaRdk3oa5zDBVOqbvLsMDxehflwqeXOd0mBVOpeHiGmAc52G3nI8jSVY3TuPbhu827tfGbO_JffsNT-Fx4zaL3kLP12AJJ89grQGmUuw03bN3n0O_J46mbuTJtRZ8CmUm4qTi5KXoXV7QSqpvY0GOuuAZ4hgvmL5sOpvTpGrkRc0RytVTtcKuw9c7WdQGrEymE3wBIjfJUEqMcgpJVehSrVxG3rSJvE4M-cwdeNeKs7BNO3Vm9bgsKKxi2Rc3ZN-B7evBV4suIv8ettkKvWigpCz-Svzl_19vwer-6dFhcXgw6L-CR5ITA3Wl4iasVLMf-Boe2p_VqJy9qbVWwPld68cfU_cw9g |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3PT9swFH6CMqFdYGxMg7HhA0hsUtTEcRLngKZqrKIqVNVgEpyCaz93laCFJmzqv7a_bs9pwkCauHHgGMWO5LzP75ef3wewo2LUXAbCQyO1J8jGeDJERVGKRLQBijRQJdlE0uvJs7O0vwB_6rswrqyy1omlojYT7XLkTU523dk2mTRtVRbRP2h_ub7xHIOUO2mt6TTmEOni7DeFb_l-54Bkvct5-9vp10OvYhjwdOj7Baln2t3aWsf6rdFYM7BW23QQCYFRqgxGRmntettapeg5QNoRidV-yhOMhQzpu4uwRC65EA1Y6neO--d3GR7HESdif34pMAxTv5nPrgJONjiMw4dm8KEVKE1be_U5_5RXsFI51Kw13wFrsIDj17BWqayc7VV9tT-9gW6LHU_MyJLTzdz5lBqzk8KlNVnrckgrKX5eMXLhmZvBvuPQEZtNpjOaVIwsK9lDXV1VCeV1-PEki3oLjfFkjO-ApSoacI5BSsGq8E0shUnIz1aBlZEib3oDPteizXTVaN3xfVxmFHA5HGT3cLABO3eDr-f9Rf4_bKsGQFYpmTz7J_3Nx19vwzLBIjvq9Lrv4SV3GYOyhHELGsX0Fj_AC_2rGOXTjxWEGVw8NUD-AuDmOzY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Modified+Median+String+Algorithm+for+Gene+Regulatory+Motif+Classification&rft.jtitle=Symmetry+%28Basel%29&rft.au=Mohammad+Shibli+Kaysar&rft.au=Mohammad+Ibrahim+Khan&rft.date=2020-08-01&rft.pub=MDPI+AG&rft.eissn=2073-8994&rft.volume=12&rft.issue=8&rft.spage=1363&rft_id=info:doi/10.3390%2Fsym12081363&rft.externalDBID=HAS_PDF_LINK |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-8994&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-8994&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-8994&client=summon |