|
CSE Home | About Us | Search | Contact Info |
Tetrahymena thermophila Functional ncRNA Analysis
This page is a supplement to "Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote" by Eisen JA, Coyne RS, Wu M, Wu D, Thiagarajan M, et al. (2006), PLoS Biol 4(9): e286. We provide here detailed information on search for non-coding RNA elements in the T. thermophila macronuclear genome.
Most ncRNA annotations (Table S6) were generated using covariance model (CM) scans [174]. Transfer RNA annotations were those provided by the CM-based tRNAscan-SE program [175] run with default parameters. Results of these scans were reported in the paper. Most other scans were based on CMs defined by the Rfam database [176], [177] (release 7.0, March 2005; 503 families). With a few exceptions, we used rigorous filters [178] built from the Rfam models to identify exactly those sequences that match the Rfam models with scores at or above Rfam's family-specific "gathering" cutoff. Exceptions were:
The tables below gives details of these scans. All columns except "Hits" are drawn directly from Rfam. "Global/Local" indicates which Rfam families are defined by global versus local CMs; as mentioned above, global families were scanned rigorously, local families were scanned using ML heuristic filters. "Threshold" is the Rfam gathering threshold; in most cases this is only slightly lower than the Rfam "Trusted" threshold, and all but 33 of the 1249 hits reported in the tables below are above this more stringent threshold. "Window" is the Rfam window length parameter. In all cases "Hits" are described by summary and detail files, which are ".csv" and ".cmzasha" files, respectively. The former is an Excel "comma separated values" file; the later is a plain text file. The contents of each is described in section 5 of the Ravenna reference manual. Note that the tRNAs reported below are matches to the Rfam model (RF00005), rather than the tRNAscan-SE results reported in the paper; the later are considered more reliable; we include the former for completeness. It also should be noted that our annotation approach may be prone to reporting ncRNA pseudogenes and that its accuracy may be affected by the high AT content of the genome.
Related to the later point, the graph at right plots GC content vs CM score for selected Rfam families found in T. thermophila. Note that most have GC content well above the genomic average (the horizontal black line). Sequences with both low score and low GC content are less credible.
Hits against the Rfam T_box (RF00230), group I self-splicing introns (RF00028), and ctRNA_pND324 (RF00238) involved in bacterial plasmid copy control all appear implausible on inspection, are unexpected by phylogenetic criteria, and also have low GC content. Hits against Rfam small nucleolar RNAs (RF00086, RF00133, RF00309) also appeared to be false positives, again having low GC content, for example.
Although T. thermophila contains components of the iron response system, the 6 hits to RF00037, the Iron Response Element, do not appear to be near related genes.
The SECIS family, RF00031, is poorly conserved and difficult to model. The reported SECIS hits appear to include many false positives. Several of them, however, are 50-70 bases downstream of UGA codons of predicted selenoproteins, e.g., hit 8254823/100276-100214, near 71.m00138 "selT/selW/selH selenoprotein domain containing protein."
References
Families with hits, generally trustworth:
|
Families with hits, but many false positives:
|
Families with hits, but all are probable false positives:
|
Families with no hits:
|
Page maintained by W.L. Ruzzo, ruzzo at cs.washington.edu.