Randomized algorithms for motif detection

Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool. (1) We de...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of bioinformatics and computational biology Ročník 3; číslo 5; s. 1039
Hlavní autori: Wang, Lusheng, Dong, Liang
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Singapore 01.10.2005
Predmet:
ISSN:0219-7200
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool. (1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most x l for each string, where l is the length of the motif and can be any positive number given by the user. (2) We design an improved EM algorithm that outperforms the original EM algorithm. (3) We develop a software tool, MotifDetector, that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large. It is available for free at http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.
AbstractList Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool.MOTIVATIONMotif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool.(1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most x l for each string, where l is the length of the motif and can be any positive number given by the user. (2) We design an improved EM algorithm that outperforms the original EM algorithm. (3) We develop a software tool, MotifDetector, that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large.RESULTS(1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most x l for each string, where l is the length of the motif and can be any positive number given by the user. (2) We design an improved EM algorithm that outperforms the original EM algorithm. (3) We develop a software tool, MotifDetector, that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large.It is available for free at http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.AVAILABILITYIt is available for free at http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.
Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool. (1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most x l for each string, where l is the length of the motif and can be any positive number given by the user. (2) We design an improved EM algorithm that outperforms the original EM algorithm. (3) We develop a software tool, MotifDetector, that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large. It is available for free at http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.
Author Dong, Liang
Wang, Lusheng
Author_xml – sequence: 1
  givenname: Lusheng
  surname: Wang
  fullname: Wang, Lusheng
  email: lwang@cs.cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, P. R. China. lwang@cs.cityu.edu.hk
– sequence: 2
  givenname: Liang
  surname: Dong
  fullname: Dong, Liang
BackLink https://www.ncbi.nlm.nih.gov/pubmed/16278946$$D View this record in MEDLINE/PubMed
BookMark eNo1j0tLAzEUhbOo2If-ADcyK8HF6E2aZO4spVgVCoLoerjNQyOTSZ3MLPTX22JdHfj4OIczZ5MudY6xCw43nEtxm0HwuhIAoAC4Apyw2QGVBzZl85w_AYRUHE_ZlGtRYS31jF2_UGdTDD_OFtS-pz4MHzEXPvVFTEPwhXWDM0NI3Rk78dRmd37MBXtb37-uHsvN88PT6m5TmqUSWOpaIS2VJPJYcSW2IAitUU6TriQncoCa5NJoC7WoABVJtOiNFTWC52LBrv56d336Gl0emhiycW1LnUtjbjRWuF-Se_HyKI7b6Gyz60Ok_rv5Pyd-AbCzT7A
CitedBy_id crossref_primary_10_1016_j_jcss_2011_01_003
crossref_primary_10_3390_a6040636
crossref_primary_10_3390_computation9120146
crossref_primary_10_1137_080720401
crossref_primary_10_1109_TCBB_2011_21
crossref_primary_10_1007_s00453_014_9952_y
crossref_primary_10_1016_j_dib_2020_105216
crossref_primary_10_1137_080739069
crossref_primary_10_1145_1921659_1921672
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1142/s0219720005001508
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
ExternalDocumentID 16278946
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
0R~
36B
4.4
53G
5GY
ADSJI
AENEX
ALMA_UNASSIGNED_HOLDINGS
CAG
CGR
COF
CS3
CUY
CVF
DU5
EBS
ECM
EIF
EJD
EMOBN
ESX
F5P
HZ~
IL9
NPM
O9-
P71
RWJ
TWZ
7X8
ID FETCH-LOGICAL-c3528-6958a354aaf87152b02a8dc5e6a6741aae086a43c6d0927085a48d8fcd2980f12
IEDL.DBID 7X8
ISSN 0219-7200
IngestDate Fri Jul 11 09:43:52 EDT 2025
Sat Sep 28 08:48:21 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3528-6958a354aaf87152b02a8dc5e6a6741aae086a43c6d0927085a48d8fcd2980f12
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 16278946
PQID 68783524
PQPubID 23479
ParticipantIDs proquest_miscellaneous_68783524
pubmed_primary_16278946
PublicationCentury 2000
PublicationDate 2005-Oct
20051001
PublicationDateYYYYMMDD 2005-10-01
PublicationDate_xml – month: 10
  year: 2005
  text: 2005-Oct
PublicationDecade 2000
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
PublicationTitle Journal of bioinformatics and computational biology
PublicationTitleAlternate J Bioinform Comput Biol
PublicationYear 2005
SSID ssj0024518
Score 1.795209
Snippet Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1039
SubjectTerms Algorithms
Conserved Sequence
Data Interpretation, Statistical
DNA - chemistry
DNA - genetics
Likelihood Functions
Models, Genetic
Models, Statistical
Sequence Alignment - methods
Sequence Analysis, DNA - methods
Sequence Homology, Nucleic Acid
Title Randomized algorithms for motif detection
URI https://www.ncbi.nlm.nih.gov/pubmed/16278946
https://www.proquest.com/docview/68783524
Volume 3
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB5WV8GL78f67MGLh2KbJmkKgoi4eNBlEZW9LbNJqgtuu9pV0F_vpA88iQcJ5BBoCZNJ8s0j3wAckw5QY-ibQAQ-5yL2R5GytOOFTZAwhS4juo83ca-nBoOk34Kz5i2MS6tszsTyoDa5dj7yU6mcj4Lx8-mr72pGudhqXUBjDtoRARmn0_FA_TDtidK7R5dY4sekDHVMM-TstHCDbowm6Ex-9Tu-LO-Z7sr_ZrgKyzW-9C4qhViDls3WYbGqOPm5ASd3mJl8Mv6yxsOXJ_rB7HlSeIRcPZeVl3rGzsrkrGwTHrpX95fXfl0twdeOocWXiVAYCY6YkhEk2ChgqIwWVqIk2IBoyXpBHmlpgoTFBLWQK6NSbViigjRkWzCf5ZndAY8LHqGSWiYouLSoLDeR5o7cTduR4R04aiQwJG10IQbMbP5eDBsZdGC7EuJwWpFmDEPp3txyufvnt3uw1PCjBuE-tFPah_YAFvTHbFy8HZaLTH2vf_sNOG2uSw
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Randomized+algorithms+for+motif+detection&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Wang%2C+Lusheng&rft.au=Dong%2C+Liang&rft.date=2005-10-01&rft.issn=0219-7200&rft.volume=3&rft.issue=5&rft.spage=1039&rft_id=info:doi/10.1142%2Fs0219720005001508&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0219-7200&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0219-7200&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0219-7200&client=summon