Using SigProfilerMatrixGenerator
SigProfilerMatrixGenerator works in conjunction with other SigProfiler tools but can also be run alone on input datasets. This section goes over the function arguments, input files, and all the output folders and files in detail.
- Home
- Installing SigProfilerMatrixGenerator
- Using SigProfilerMatrixGenerator - Output
- Quick Start Example for SigProfilerMatrixGenerator
- Currently Supported Genomes
From within a python session, you can now generate the matrices as follows:
$ python3
>>from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
>>matrices = matGen.SigProfilerMatrixGeneratorFunc(project, genome, vcfFiles, exome=False, bed_file=None, chrom_based=False, plot=False, tsb_stat=False, seqInfo=False)
From within a R session, you can now generate the matrices as follows:
$ R
>> library("reticulate")
>> use_python("path_to_your_python3")
>> py_config()
>> library("SigProfilerMatrixGeneratorR")
>> matrices <- SigProfilerMatrixGeneratorR("BRCA", "GRCh37", "/Users/ebergstr/Desktop/BRCA/", plot=T, exome=F, bed_file=NULL, chrom_based=F, tsb_stat=F, seqInfo=F, cushion=100)
Function Arguments ###
These are the acceptable parameters that can be passed into the function call.
Required:
- project: Project name for this instance of matrix generation.
Type: string
Example: "alexandrov_lab_test_1"
-
genome: Reference genome to use for the matrix generation.
Type: string
Example: "GRCh37" -
vcfFiles: Full path of the saved input files in the desired output folder.
Type: string
Example: "/Users/test/Desktop/alexandrov_lab_test_1"
Optional:
- exome: Downsamples mutational matrices to the exome regions of the genome.
Type: boolean
Default: False
Example: exome=True
-
bed_file: Downsamples mutational matrices to custom regions of the genome. Requires the full path to the BED file.
Type: string
Default: None
Example: bed_file="/Users/test/Desktop/bed_files/sample_1.bed" -
chrom_based: Outputs chromosome-based matrices.
Type: boolean
Default: False
Example: chrom_based=True -
plot: Integrates with SigProfilerPlotting to output all available visualizations for each matrix.
Type: boolean
Default: False
Example: plot=True -
tsb_stat: Outputs the results of a transcriptional strand bias test for the respective matrices.
Type: boolean
Default: False
Example: tsb_stat=True -
seqInfo: Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation.
Type: boolean
Default: False
Example: seqInfo=True -
cushion: Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations.
Type: integer
Default: 100
Example: cushion=250
All string arguments must be surrounded by quotation marks ex. "test" and all boolean arguments must be True or False.
Input File
This tool currently supports the following formats:
* MAF
Mutation Annotation Format [example.maf]
* VCF
Variant Call Format [example.vcf]
If files are in .vcf format, each sample must be saved as a separate file.
* ICGC
* Simple text file [example.txt]
The user must provide variant data adhering to one of these four formats.
Folder Structure ###
The final output is divided into three folders:
* Input: Contains copies of the user-provided input files.
- Logs: Contains the error and log files for the submitted job.
All errors are saved in the sigProfilerMatrixGenerator_[project]_[genome].err file and all progress checkpoints are saved in the sigProfilerMatrixGenerator_[project]_[genome].out file within the specified output folder. - Output: Contains the DBS, SBS, INDEL, TSB, plots, and vcf_files folders. All matrices are saved in the appropriate folders.
File Extensions
All output files will have a file extension indicative of which arguments were passed in as True. By default, the files will have .all file extension. The rest of the file extensions are explained below.
* .exome
exome argument was passed in as True and contains all the mutations mapped out to the exome.
-
.region
bed_file argument was passed in as string and contains all the mutations mapped out to the input bed_file regions. -
.chrx where x denotes which chromosome i.e. chr1, chrA, etc.
chrom_based argument was passed in as True and contains all the mutations mapped out to each chromosome.