The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Background Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	BMC genomics Ročník 10; číslo 1; s. 463
Hlavní autoři:	Lichtenberg, Jens, Yilmaz, Alper, Welch, Joshua D, Kurz, Kyle, Liang, Xiaoyu, Drews, Frank, Ecker, Klaus, Lee, Stephen S, Geisler, Matt, Grotewold, Erich, Welch, Lonnie R
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London BioMed Central 08.10.2009 BioMed Central Ltd Springer Nature B.V BMC
Témata:	3' Untranslated Regions 5' Untranslated Regions Algorithms Animal Genetics and Genomics Arabidopsis - genetics Arabidopsis thaliana Bacteria Binding sites Biology Biomedical and Life Sciences Biotechnology Computational Biology - methods Computer engineering Computer science DNA, Plant - genetics Gene expression Gene Expression Regulation, Plant Genetic aspects Genome, Plant Genomes Genomics Hypothesis testing Introns Life Sciences Markov Chains Messenger RNA Methods Microarrays Microbial Genetics and Genomics Models, Statistical Physiological aspects Plant Genetics and Genomics Promoter Regions, Genetic Promoters (Genetics) Proteins Proteomics R&D Regulation Research & development Research Article Sequence Analysis, DNA United States Core Promoter Proximal Promoter Transcription Start Site Word Pair Arabidopsis Genome
ISSN:	1471-2164, 1471-2164
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Background Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. Results Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana . Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. Conclusion Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2164 1471-2164
DOI:	10.1186/1471-2164-10-463