Using SigProfilerTopography - Input

Refer to this section for details about generating topography analyses for nucleosome occupancy, histone occupancy, CTCF transcription factor occupancy, replication timing, replication strand asymmetry, transcription strand asymmetry, genic versus intergenic regions and strand-coordinated mutagenesis. These analyses can be generated with the runAnalyses function. This section goes over the available functions and detailed list of valid parameter values.

Functions¶

There are five functions supported by SigProfilerTopography.

install_nucleosome
install_atac_seq
install_repli_seq
install_example_data
runAnalyses

install_nucleosome¶

The function install_nucleosome imports the nucleosome library file that is necessary for nucleosome occupancy analyses.

To install the nucleosome data, first import the package within your python script or from within an interactive python3 session:

$ python3
>> from SigProfilerTopography import Topography as topography

Next, choose the genome that you would like to import:
By default, install_nucleosome imports nucleosome data of K562 cell line for GRCh37 and GRCh38 genome assemblies.

topography.install_nucleosome(genome)

Parameter	Variable Type	Optional/Required	Parameter Description
genome	String	Required	The genome assembly of nucleosome data to be imported. Accepted values include: {"GRCh37", "GRCh38", "mm10"}.
biosample	String	Optional	The biosample of nucleosome data to be imported. Accepted values include: {"K562", "GM12878"}.

install_atac_seq¶

The function install_atac_seq imports the open chromatin library file that is necessary for epigenomics analyses.

To install the open chromatin data, first import the package within your python script or from within an interactive python3 session:

$ python3
>> from SigProfilerTopography import Topography as topography

Next, choose the genome that you would like to import:
By default, install_atac_seq imports open chromatin data of breast epithelium tissue for GRCh37 and left lung tissue for GRCh38.

topography.install_atac_seq(genome)

Parameter	Variable Type	Optional/Required	Parameter Description
genome	String	Required	The genome assembly of open chromatin data to be imported. Accepted values include: {"GRCh37", "GRCh38", "mm10"}.

install_repli_seq¶

The function install_repli_seq imports the replication timing library file that is necessary for replication timing analyses.

To install the replication data, first import the package within your python script or from within an interactive python3 session:

$ python3
>> from SigProfilerTopography import Topography as topography

Next, choose the genome that you would like to import:
By default, install_repli_seq imports replication time data of MCF7 and IMR90 for GRCh37 and GRCh38, respectively.

topography.install_repli_seq(genome, biosample=None)

Parameter	Variable Type	Optional/Required	Parameter Description
genome	String	Required	The genome assembly of replication time data to be imported. Accepted values include:
biosample	String	Optional	The biosample of replication time data to be imported. Accepted values include: {"MCF7", "HEPG2", "HELAS3", "SKNSH", "K562", "IMR90", "NHEK", "BJ", "HUVEC", "BG02ES", "GM12878", "GM06990", "GM12801", "GM12812", "GM12813", "HEK293", "HCT116", "A549", "CAKI2", "G401", "T47D", "SKNMC", "NCIH460" }.

install_example_data¶

The function install_example_data imports the example data that is provided by SigProfilerTopography. This data can be used to run the example program and ensure that the environment is set up.

$ python3
>> from SigProfilerTopography import Topography as topography

Next, import the example data:

topography.install_example_data()

Imports 21BRCA.zip under the current working directory. Once 21BRCA.zip has been downloaded, unzip the file. The unzipped 21BRCA folder contains two folders: 21BRCA_vcfs and 21BRCA_probabilities. The folder 21BRCA_vcfs contains 21 VCF files (one per each breast cancer sample) and 21BRCA_probabilities` contains probability matrix files for single base substitutions and doublet base substitutions.

runAnalyses¶

$ python3
>> from SigProfilerTopography import Topography as topography

Now, you are able to run topography analyses for your samples. Here is an example of a call to runAnalyses that generates all of the different analyses.

topography.runAnalyses(genome, 
            inputDir, 
            outputDir, 
            jobname, 
            numofSimulations, 
            epigenomics=True, 
            nucleosome=True, 
            replication_time=True, 
            strand_bias=True, 
            processivity=True)

SigProfilerTopography's runAnalyses function makes it possible to produce different analyses in the same run. Depending on which parameters are provided, the function will generate a combination of analyses from: nucleosome occupancy, epigenomics occupancy, replication timing, replication strand asymmetry, transctiption strand asymmetry, genic versus intergenic regions and strand-coordinated mutagenesis (processivity).

The full list of parameters are detailed in the following table.

Parameter Name	Variable Type	Optional/Required	Function Description
genome	String	Required	The reference genome used for the topography analyses. Accepted values include: {"GRCh37", "GRCh38", "mm10"}.
inputDir	String	Required	The path to the directory containing the input files. SigProfilerTopography accepts all input files that SigProfilerMatriXGenerator can process.
outputDir	String	Required	The path of the directory where the output will be saved. If this directory doesn't exist, a new one will be created.
jobname	String	Required	The name of the directory containing all of the outputs under `outputDir/jobname`. If this directory doesn't exist, a new one will be created.
numofSimulations	Integer	Required	The number of simulations to be created.
epigenomics	Boolean	Optional	Generate epigenomics analysis when True. By default, this is set to False.
nucleosome	Boolean	Optional	Generate nucleosome occupancy analysis when True. By default, this is set to False.
replication_time	Boolean	Optional	Generate replication timing analysis when True. By default, this is set to False.
strand_bias	Boolean	Optional	Generate replication and transcription strand asymmetry analysis when True. By default, this is set to False.
replication_strand_bias	Boolean	Optional	Generate replication strand asymmetry analysis when True. By default, this is set to False.
transcription_strand_bias	Boolean	Optional	Generate transcription strand asymmetry analysis (including genic versus intergenic regions) when True. By default, this is set to False.
processivity	Boolean	Optional	Generate strand-coordinated mutagenesis when True. By default, this is set to False.
epigenomics_files	List of Strings	Optional	Python list of paths for each epigenomics library file utilized in the epigenomics analysis. By default, epigenomics files of open chromatin, CTCF and histone modifications attained from "breast_epithelium" and "lung" tissue are utilized for GRCh37 and GRCh38, respectively.
epigenomics_dna_elements	List of Strings	Optional	Python list of unique DNA element names for the epigenomics files utilized in the epigenomics analysis. Each DNA element name must be contained in at least one epigenomics library filename. E.g., DNA element is 'CTCF' for the epigenomics file of 'ENCFF782GCQ_breast_epithelium_Normal_CTCF-human.bed'. By default, DNA elements of ['H3K27me3', 'H3K36me3', 'H3K9me3', 'H3K27ac', 'H3K4me1', 'H3K4me3', 'CTCF', 'ATAC'] are utilized for GRCh37 and GRCh38. If user provided `epigenomics_files` is provided, then `epigenomics_dna_elements` is mandatory.
epigenomics_biosamples	List of Strings	Optional	Python list of unique biosample names for the epigenomics files utilized in the epigenomics analyses. Each biosample name must be contained in at least one epigenomics library filename. E.g., biosample is 'breast_epithelium' for the epigenomics file of 'ENCFF782GCQ_breast_epithelium_Normal_CTCF-human.bed'. By default, "breast_epithelium" and "lung" biosamples are utilized for GRCh37 and GRCh38, respectively. Biosamples are shown in the epigenomics heatmaps if `plot_detailed_epigemomics_heatmaps` is set to True.
nucleosome_biosample	String	Optional	Biosample that will be used for nucleosome occupancy analysis. Analysis can be done by using either K562 or GM12878 cell line from ENCODE. By default, the K562 cell line is used for GRCh37 and GRCh38.
nucleosome_file	String	Optional	The path to the nucleosome occupancy library file that will be used for the analysis. By default, nucleosome occupancy file (MNase-seq) of K562 cell line is used for GRCh37 and GRCh38.
replication_time_biosample	String	Optional	Biosample that will be used to carry out replication timing and replication strand asymmetry analyses. By default, MCF7 and IMR90 cell lines are utilized for GRCh37 and GRCh38, respectively. For the complete list of available replication time biosamples, refer to the Replication Time Biosamples table below.
replication_time_signal_file	String	Optional	The path to the replication time signal file. By default, replication time signal file (wig file) of MCF7 and IMR90 cell lines are utilized for GRCh37 and GRCh38, respectively.
replication_time_valley_file	String	Optional	The path to the replication time valley file. By default, replication time valley file (bed file) of MCF7 and IMR90 cell lines are utilized for GRCh37 and GRCh38, respectively.
replication_time_peak_file	String	Optional	The path to the replication time peak file. By default, replication time peak file (bed file) of MCF7 and IMR90 cell lines are utilized for GRCh37 and GRCh38, respectively.
samples_of_interest	List of Strings	Optional	Conduct topography analyses for these samples of interest only. By default, it is set to None and topography analyses are carried out for all samples.
discreet_mode	Boolean	Optional	Each mutation contributes to the topography analyses either with 1 or 0 when True; otherwise, each mutation contributes with its probability when False. By default, this is set to True.
average_probability	Float	Optional	The average probability of the mutations assigned to a SBS, DBS, and ID signature. By default, it is set to 0.90. The `average_probability` applies when `discreet_mode` is True. We set signature specific cutoffs, such that for the mutations satisfying mutation_signature_probability >= cutoff, average probability of these mutations must be at least 0.90.
num_of_sbs_required	Integer	Optional	The minimum required number of mutations for a SBS signature. The `num_of_sbs_required` applies when `discreet_mode` is True or when `discreet_mode` is False and `show_all_signatures` is False. By default, it is set to 2000.
num_of_dbs_required	Integer	Optional	The minimum required number of mutations for a DBS signature. The `num_of_dbs_required` applies when `discreet_mode` is True or when `discreet_mode` is False and `show_all_signatures` is False. By default, it is set to 200.
num_of_id_required	Integer	Optional	The minimum required number of mutations for a ID signature. The `num_of_id_required` applies when `discreet_mode` is True or when `discreet_mode` is False and `show_all_signatures` is False. By default, it is set to 1000.
exceptional_signatures	Dictionary	Optional	The dictionary of exceptional signatures. The `exceptional_signatures` applies when `discreet_mode` is True. E.g., `exceptional_signatures` = {"SBS32" : 0.63} is a Python dictionary where key is a mutational signature and value is an average probability. Exceptional signatures are included in the topography analyses if they satisfy `num_of_sbs_required`, `num_of_dbs_required`, and `num_of_id_required` constraints with `average_probability` >= given average probability.
default_cutoff	Float	Optional	The `default_cutoff` applies for all signatures when `discreet_mode` is False. Mutations satisfying mutation_signature_probability >= `default_cutoff` are considered in the topography analyses with their probability. By default, it is set to 0.5.
show_all_signatures	Boolean	Optional	The `show_all_signatures` applies when `discreet_mode` is False. All signatures are considered in the topography analyses when True, otherwise signatures satisfying `num_of_sbs_required`, `num_of_dbs_required`, and `num_of_id_required` are considered in the topography analyses when False. By default, it is set to True.
plot_figures	Boolean	Optional	Generate plots displaying the results of all topography analyses when True. By default, this is set to True.
plot_epigenomics	Boolean	Optional	Generate epigenomics heatmaps and occupancy plots when True. By default, this is set to False.
plot_nucleosome	Boolean	Optional	Generate nucleosome occupancy plots when True. By default, this is set to False.
plot_replication_time	Boolean	Optional	Generate replication timing plots when True. By default, this is set to False.
plot_strand_bias	Boolean	Optional	Generate replication strand asymmetry, transcription strand asymmetry, genic versus intergenic regions plots when True. By default, this is set to False.
plot_replication_strand_bias	Boolean	Optional	Generate replication strand asymmetry plots when True. By default, this is set to False.
plot_transcription_strand_bias	Boolean	Optional	Generate transcription strand asymmetry and genic versus intergenic regions plots when True. By default, this is set to False.
plot_processivity	Boolean	Optional	Generate strand-coordinated mutagenesis plots when True. By default, this is set to False.
step1_matgen_real_data	Boolean	Optional	Run SigProfilerMatrixGenerator to generate matrices for the real mutations when True. By default, this is set to True.
step2_gen_sim_data	Boolean	Optional	Run SigProfilerSimulator to generate simulated mutations when True. By default, this is set to True.
step3_matgen_sim_data	Boolean	Optional	Run SigProfilerMatrixGenerator to generate matrices for the simulated mutations when True. By default, this is set to True.
step4_merge_prob_data	Boolean	Optional	Merge real and simulated mutations with the probabilities files when True. By default, this is set to True.
step5_gen_tables	Boolean	Optional	Generate tables for providing information on mutational signatures, cutoffs, number of mutations and average probability when True. By default, this is set to True.
sbs_probabilities	String	Optional	The path to the probabilities matrix file. The probabilities matrix includes the probabilities of each mutation type in each sample. The first column lists all the samples, the second column lists all the mutation types, and the following columns list the calculated probability value for the respective SBS signatures where the sum of each row is 1. The probabilities file can be in SBS_6, SBS_24 SBS_96, SBS_192, SBS_288, SBS_384, SBS_1536, or SBS_6144 context produced by mutational signature extractor.
dbs_probabilities	String	Optional	The path to the probabilities matrix file. The probabilities matrix includes the probabilities of each mutation type in each sample. The first column lists all the samples, the second column lists all the mutation types, and the following columns list the calculated probability value for the respective DBS signatures where the sum of each row is 1. The probabilities file in DBS-78 context produced by mutational signature extractor.
id_probabilities	String	Optional	The path to the probabilities matrix file. The probabilities matrix includes the probabilities of each mutation type in each sample. The first column lists all the samples, the second column lists all the mutation types, and the following columns list the calculated probability value for the respective ID signatures where the sum of each row is 1. The probabilities file in ID-83 context produced by mutational signature extractor.
sbs_signatures	String	Optional	The path to the signatures matrix file. The signatures matrix contains the distribution of mutation types in the SBS mutational signatures. The first column lists all of the mutation types. e.g., There are 96 possible mutations that are considered for the SBS-96 context. The following columns are the SBS signatures. The sum of each column is 1, and each value in a column indicates the proportion of a mutational context in the signature.
dbs_signatatures	String	Optional	The path to the signatures matrix file. The signatures matrix contains the distribution of mutation types in the DBS mutational signatures. The first column lists all of the mutation types. e.g., There are 78 possible mutations that are considered for the DBS-78 context. The following columns are the DBS signatures. The sum of each column is 1, and each value in a column indicates the proportion of a mutational context in the signature.
id_signatures	String	Optional	The path to the signatures matrix file. The signatures matrix contains the distribution of mutation types in the ID mutational signatures. The first column lists all of the mutation types. e.g., There are 83 possible mutations that are considered for the ID-83 context. The following columns are the ID signatures. The sum of each column is 1, and each value in a column indicates the proportion of a mutational context in the signature.
sbs_activities	String	Optional	The path to the activities matrix file. The activity matrix for the selected SBS signatures. The first column lists all of the samples and the second and the following columns list the calculated activity value (number of mutations) for the respective SBS signatures.
dbs_activities	String	Optional	The path to the activities matrix file. The activity matrix for the selected DBS signatures. The first column lists all of the samples and the second and the following columns list the calculated activity value (number of mutations) for the respective DBS signatures.
id_activities	String	Optional	The path to the activities matrix file. The activity matrix for the selected ID signatures. The first column lists all of the samples and the second and the following columns list the calculated activity value (number of mutations) for the respective ID signatures.
verbose	Boolean	Optional	Set to True for detailed debugging messages. By default, this is set to False.
parallel_mode	Boolean	Optional	Set to True for running SigProfilerTopography using multiprocessing. By default, this is set to True.
plusorMinus_epigenomics	Integer	Optional	The number of bases considered before and after mutation start for epigenomics occupancy analysis.
plusorMinus_nucleosome	Integer	Optional	The number of bases considered before and after mutation start for nucleosome occupancy analysis.
epigenomics_heatmap_significance_level	Float	Optional	Corrected p-values <= `epigenomics_heatmap_significance_level` are considered statistically significant. By default, this is set to 0.05.
fold_change_window_size	Integer	Optional	In epigenomics analysis, fold change of real versus simulated mutations is calculated for the window size centered at the mutation start. E.g., for window size of 100 bases, ± 50 bases are considered before and after mutation start. By default, this is set to 100.
num_of_avg_overlap_required	Integer	Optional	The minimum required average number of overlaps between the mutations and the regions outlined in the epigenomics files. By default, set to 100.
plot_detailed_epigemomics_heatmaps	Boolean	Optional	Plot detailed epigenomics heatmaps when True. By default, set to False.
remove_dna_elements_with_all_nans_in_epigemomics_heatmaps	Boolean	Optional	Remove the DNA elements from the epigenomics heatmap if no result exists. By default, set to True.
odds_ratio_cutoff	Float	Optional	Strand asymmetries with odd ratio >= `odds_ratio_cutoff` are shown in the strand asymmetry circle plots. By default, set to 1.1.
percentage_of_real_mutations_cutoff	Float	Optional	Strand asymmetries of the SBS signatures with percentage of the mutations >= `percentage_of_real_mutations_cutoff` are shown in the plots. By default, set to 5.
ylim_multiplier	Float	Optional	Multiply the y-axis view limits with `ylim_multiplier` in strand asymmetry bar plots. By default, set to 1.25.
processivity_inter_mutational_distance	Integer	Optional	Consecutive mutations with distance <= `processivity_inter_mutational_distance` are considered for the strand-coordinated mutagenesis. By default, set to 10000.
processivity_significance_level	Float	Optional	Corrected p-values <= `processivity_significance_level` are considered statistically significant for strand coordinated mutagenesis. By default, this is set to 0.05.
delete_chrbased_files	Boolean	Optional	To reduce the disk space usage of the tool, SigProfilerTopography deletes the chrbased files under `outputDir/jobname/data/chrbased`. By default, set to True.
exome	Boolean	Optional	SigProfilerSimulator simulates on the exome of the reference genome. By default, set to None.
updating	Boolean	Optional	SigProfilerSimulator updates the chromosome with each mutation. By default, set to False.
bed_file	String	Optional	SigProfilerSimulator simulates on custom regions of the genome. Requires the full path to the BED file. By default, set to None.
overlap	Boolean	Optional	SigProfilerSimulator allows overlapping of mutations along the chromosome. By default, set to False.
gender	String	Optional	SigProfilerSimulator simulates male or female genomes. By default, set to 'female'.
seed_file	String	Optional	SigProfilerSimulator uses this path to user defined seeds. One seed is required per processor. Uses a built in file by default. By default, this is set to None.
noisePoisson	Boolean	Optional	SigProfilerSimulator adds poisson noise to the simulations. By default, set to False.
noiseUniform	Integer	Optional	SigProfilerSimulator adds a noise dependent on a +/- allowance of noise (e.g., noiseUniform=5 allows +/-2.5% of mutations for each mutation type). By default, this is set to 0.
cushion	Integer	Optional	SigProfilerSimulator allows cushion when simulating on the exome or targetted panel. By default, this is set to 100 base pairs.
region	String	Optional	For SigProfilerSimulator. Path to targetted region panel for simulated on a user-defined region. Default is whole-genome simulations.
vcf	Boolean	Optional	SigProfilerSimulator outputs simulated samples as vcf files with one file per iteration per sample when True. SigProfilerSimulator outputs all samples from an iteration into a single maf file when False. By default, this is set to False.
mask	String	Optional	For SigProfilerSimulator. Path to probability mask file. A mask file format is tab-separated with the following required columns: Chromosome, Start, End, Probability. Note: Mask parameter does not support exome data where bed_file flag is set to true, and the following header fields are required: Chromosome, Start, End, Probability. By default, this is set to None.

** If replication_time_signal_file or replication_time_signal_file, replication_time_valley_file, and replication_time_peak_file files are provided, then the parameter replication_time_biosample is not used.

Replication Time Biosamples¶

Included in the table below is a list of available parameter values for replication_time_biosample.

Biosample	Organism	Tissue	Cell Type	Diseases
MCF7	human	breast	mammary	Cancer
HEPG2	human	liver	liver cells (hepatocytes)	Cancer
HELAS3	human	cervix	epithelial-like cervical cells	Cancer
SKNSH	human	brain	neuronal-like cells	Cancer
K562	human	bone marrow	lymphoblast cells	Cancer
IMR90	human	lung	fibroblast	Normal
NHEK	human	skin	keratinocyte	Normal
BJ	human	skin	fibroblast	Normal
HUVEC	human	skin	fibroblast	Normal
BG02ES	human	early developmental stage of an embryo, not from a differentiated tissue	embyronic stem cell	None reported
GM12878	human	blood	B-Lymphocyte	Normal
GM06990	human	blood	B-Lymphocyte	Unknown
GM12801	human	blood	B-Lymphocyte	Unknown
GM12812	human	blood	B-Lymphocyte	Unknown
GM12813	human	blood	B-Lymphocyte	Unknown
HEK293	human	kidney	embryonic kidney cells	Normal
HCT116	human	colon	colorectal carcinoma cell	Cancer
A549	human	lung	epithelial cell	Cancer
CAKI2	human	kidney	papillary renal cell carcinoma cell	Cancer
G401	human	kidney	epithelial kidney cells	Cancer
T47D	human	breast; mammary gland	epithelial cell	Cancer
SKNMC	human	brain	peripheral primitive neuroectodermal	Cancer (Askin tumor)
NCIH460	kuman	lung	lung carcinoma cell	Cancer

REPLICATION TIMING and REPLICATION STRAND ASYMMETRY

By default, SigProfilerTopography carries out replication timing and replication strand asymmetry analyses using Repli-seq of MCF7 and IMR90 cell line for GRCh37 and GRCh38, respectively.
If you want to run SigProfilerTopography with Repli-seq of e.g., HELAS3 cell line, you may first install replication timing data for the genome of interest e.g.: GRCh37 as follows:
```
$ python
>> from SigProfilerTopography import Topography as topography
>> topography.install_repli_seq('GRCh37', 'HELAS3')
```

Then you have to include replication_time_biosample='HELAS3' in the runAnalyses call as follows:

>>> from SigProfilerTopography import Topography as topography

>>> genome = "GRCh37"
>>> inputDir = "path/to/21BRCA_vcfs"
>>> outputDir = "path/to/results"
>>> jobname = "21BRCA_SPT_with_probability_matrices"
>>> numofSimulations = 5
>>> sbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_SBS96_Decomposed_Mutation_Probabilities.txt"
>>> dbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_DBS78_Decomposed_Mutation_Probabilities.txt"

>>> if __name__ == "__main__":
        topography.runAnalyses(genome,
                        inputDir,
                        outputDir,
                        jobname,
                        numofSimulations,
                        sbs_probabilities = sbs_probability_file, 
                        dbs_probabilities = dbs_probability_file, 
                        replication_time_biosample='HELAS3',
                        epigenomics=True,
                        nucleosome=True, 
                        replication_time=True, 
                        strand_bias=True, 
                        processivity=True)

If you do not install replication timing file before the run, SigProfilerTopography downloads replication timing files from ftp://alexandrovlab-ftp.ucsd.edu/ under .../SigProfilerTopography/lib/replication/ for the replication_time_biosample of interest during runtime which requires ~20-100 MB of storage.

If you have a replication timing file, you can set the replication_time_signal_file and run replication timing and replication strand asymmetry analyses using your own replication timing file.

We require a tab-separated file with four columns for replication_time_signal_file. No header line is required. The columns should contain the following information: 1. Chromosome (e.g., chr1) 2. Start position (e.g., 10000) 3. End position (e.g., 15000) 4. Signal value (e.g., 1.0343)

Then you have to set replication_time_signal_file in the runAnalyses call as follows:

>>> from SigProfilerTopography import Topography as topography

>>> genome = "GRCh37"
>>> inputDir = "path/to/21BRCA_vcfs"
>>> outputDir = "path/to/results"
>>> jobname = "21BRCA_SPT_with_probability_matrices"
>>> numofSimulations = 5
>>> sbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_SBS96_Decomposed_Mutation_Probabilities.txt"
>>> dbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_DBS78_Decomposed_Mutation_Probabilities.txt"

>>> if __name__ == "__main__":
        topography.runAnalyses(genome,
                        inputDir,
                        outputDir,
                        jobname,
                        numofSimulations,
                        sbs_probabilities = sbs_probability_file, 
                        dbs_probabilities = dbs_probability_file, 
                        replication_time_signal_file="path/to/replication_timing_file",
                        epigenomics=True,
                        nucleosome=True, 
                        replication_time=True, 
                        strand_bias=True, 
                        processivity=True)

SigProfilerTopography, annotates each mutation with its replication strand.

Replication strand can be one of the below:

A: Lagging
E: Leading
U: Unknown
B: Bidirectional (Both lagging and leading can happen for long indels).

You can reach them under outputDir/jobname/data/chrbased, if you set delete_chrbased_files=False as follows.

>>> from SigProfilerTopography import Topography as topography

>>> genome = "GRCh37"
>>> inputDir = "path/to/21BRCA_vcfs"
>>> outputDir = "path/to/results"
>>> jobname = "21BRCA_SPT_with_probability_matrices"
>>> numofSimulations = 5
>>> sbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_SBS96_Decomposed_Mutation_Probabilities.txt"
>>> dbs_probability_file = "path/to/21BRCA_probabilities/COSMIC_DBS78_Decomposed_Mutation_Probabilities.txt"

>>> if __name__ == "__main__":
        topography.runAnalyses(genome,
                        inputDir,
                        outputDir,
                        jobname,
                        numofSimulations,
                        sbs_probabilities = sbs_probability_file, 
                        dbs_probabilities = dbs_probability_file, 
                        replication_time_biosample="T47D",
                        epigenomics=True,
                        nucleosome=True, 
                        replication_time=True, 
                        strand_bias=True, 
                        processivity=True,
                        delete_chrbased_files=False)