Transcriptional Strand Bias (TSB)
- Using the Tool - Output
TSB Categorization
RNA polymerase uses the template strand to transcribe DNA into RNA. The strand upon which the gene is located is referred to as the coding strand. All regions outside of the coding sequence of a gene are referred to as non-transcribed regions. Single point substitutions are oriented based on their pyrimidine base and the strand of the reference genome. When a gene is found on the reference strand an A:T>T:A substitution in the footprint of the gene is classified as transcribed T>A (example indicated by circle) while a C:G>G:C substitution in the footprint of the gene is classified as un-transcribed C>G (example indicated by star). Mutations outside of the footprints of genes are classified as non-transcribed (example indicated by square). Classification of single base substitutions is shown both in regard to SBS-24 and SBS-384.
Transcriptional Strand Bias Categories
These are the 4 transcriptional strand bias categories.
* T: Transcribed
The variant is on the transcribed strand.
* U: Untranscribed
The variant is on the untranscribed strand.
* B: Bidirectional
The variant is on both strands and is transcribed either way.
* N: Nontranscribed
The variant is in a non-coding region and is untranslated.
There is one additional transcriptional strand bias category, Q: Questionable. This category is used to classify any mutations that are a mix of purines and pyrimidines and thus can't be classified into one of the above 4 categories.
The TSB files and classification only considers the first 4 categories.
Output Folder
This output folder contains the results of the transcriptional strand bias test. The test compares the number of translated and untranslated mutations for each mutational context and outputs the enrichment value, a p-value, and a corrected p-value for multiple-hypothesis testing for each comparison. The significant results from the tests are returned in a separate file significantResults_strandBiasTest.txt. The file will be empty if there are no significant enrichment values.
The ouput files contain the following information: * the mutation type * the enrichment value (translated/untranslated) * p-value * false discovery rate (FDR) q-value.
Overview
| File | # of sequences |
|---|---|
| strandBiasTes_24.txt | Stats of the pyrimidine nucleotide variants (6) x TBS categories (4) = 24 |
| strandBiasTes_384.txt | Stats of the possible ending nucleotides (4) x strandBiasTes_24.txt (24) x possible ending nucleotides (4) = 312 |
| strandBiasTes_6144.txt | Stats of the possible ending nucleotides (4) x strandBiasTes_384.txt (1248) x possible ending nucleotides (4) = 6144 |
TSB-24
The strandBiasTes_24.txt file summarizes the information discussed above (the mutation type, the enrichment value, p-value, and FDR q-value) of each of the 6 pyrimidine single nucleotide variants, C > {A, G, or T} and T > {A, G, or C} detected in each input sample.
6 x 4 = 24 total combinations
Output of strandBiasTes_24.txt for a single analyzed sample is shown in the table below.
| Sample | MutationType | Enrichment [Trans/UnTrans] |
p.value | FDR_q.value |
|---|---|---|---|---|
| PD10010a | C>A | 2.1429 | 0.1338 | 0.8028 |
| PD10010a | C>G | 2.0 | 0.0407 | 1.0 |
| PD10010a | C>T | 1.0 | 1.0 | 1.0 |
| PD10010a | T>A | 0.6667 | 1.0 | 1.0 |
| PD10010a | T>C | 1.5 | 0.7539 | 1.0 |
| PD10010a | T>G | 0 | 0.5 | 1.0 |
In this example table, the second row has a significant p value (<.05) and this result would be returned in the significantResults_strandBiasTest.txt file.
The above image is a screenshot of the generated file. Here line 4 corresponds to a T>A mutation with an enrichment rate of 6.0, p value equal to 0.9007479747784868, and false discovery rate (FDR) q-value of 1.0 in the MELA_0004 sample.
TSB-384
The strandBiasTes_384.txt file summarizes the information discussed above (the mutation type, the enrichment value, p-value, and FDR q-value) for the following pyrimidine single nucleotide variants, N[{C > A, G, or T} or {T > A, G, or C}]N.
4 starting nucleotides x 24 combinations x 4 ending nucleotides = 384 total combinations
| Sample | MutationType | Enrichment[Trans/UnTrans] | p.value | FDR_q.value |
|---|---|---|---|---|
| PD10010a | A[C>A]A | 0 | 1.0 | 1.0 |
The above image is a screenshot of the generated file. Here line 6 corresponds to a ACC to AGC mutation with an enrichment rate of 6.0, p value equal to 0.15158963203430173, and false discovery rate (FDR) q-value of 1.0 in the MELA_0004 sample.
TSB-6144
The strandBiasTes_6144.txt file summarizes the information discussed above (the mutation type, the enrichment value, p-value, and FDR q-value) for the following pyrimidine single nucleotide variants, NN[{C > A, G, or T} or {T > A, G, or C}]NN.
6 (4x4) possible starting dinucleotides x 24 combinations x 16 (4x4) possible ending dinucleotides = 6144 total combinations.
| Sample | MutationType | Enrichment[Trans/UnTrans] | p.value | FDR_q.value |
|---|---|---|---|---|
| PD10010a | AA[C>A]AA | 0 | 1.0 | 1.0 |
The above image is a screenshot of the generated file. Here line 8 corresponds to a AACCG to AAACG mutation with an enrichment rate of 6.0, p value equal to 0.125, and false discovery rate (FDR) q-value of 1.0 in the MELA_0004 sample.