Using SigProfilerAssignment¶
This section describes SigProfilerAssignment's main function for mutational signatures assignment, as well as the different parameters accepted.
Function¶
The main function available in SigProfilerAssignment to perform refitting of known mutational signatures is the cosmic_fit function.
Input files¶
Two main input files are needed to use this function:
-
Somatic mutations: three different formats are allowed (the selected format should be specified in the
input_typeparameter:- Mutation calling files (VCFs, MAFs, or simple text files, as described here). One file per sample is required. Example input vcf files. Use
input_file = "vcf". - Segmentation files. Used for copy number analysis. Only one multi-sample file is allowed. An example segmentation file. Use
input_file = "seg:TYPE"(Check the segmentation files that are supported by SigProfilerMatrixGenerator). - Mutational matrices. From different mutational classifications and generated by SigProfilerMatrixGenerator. An example mutational matrix file. Use
input_file = "matrix"(default option).
- Mutation calling files (VCFs, MAFs, or simple text files, as described here). One file per sample is required. Example input vcf files. Use
-
Set of known mutational signatures: COSMIC v3.5 mutational signatures are used by default as the input reference signatures. Custom signature databases can also be used, and should be provided to the
cosmic_fitfunction using thesignature_databaseparameter.
Required parameters¶
To run the cosmic_fit function, first import the package within your python script or from within an interactive python session:
$ python
>>> from SigProfilerAssignment import Analyzer as Analyze
Now, you are able to assign known signatures to your sample/s. The required parameters for the cosmic_fit function are:
Analyze.cosmic_fit(samples, output, input_type = input_type)
You can also run SigProfilerAssignment cosmic_fit function from command line:
$ SigProfilerAssignment cosmic_fit samples output --input_type input_type
Full list of parameters¶
The full list of parameters is included in the following table:
| Parameter | Variable Type | Parameter Description |
|---|---|---|
| samples | String | Path to the input somatic mutations file (if using segmentation file/mutational matrix) or input folder (mutation calling file/s) |
| output | String | Path to the output folder |
| input_type | String | Three accepted input types:
|
| context_type | String | Required context type if input_type is "vcf". context_type takes which context type of the input data is considered for assignment. Valid options include "96", "288", "1536", "DINUC", and "ID". The default value is "96" |
| cosmic_version | Float | Defines the version of the COSMIC reference signatures. Takes a positive float among 1, 2, 3, 3.1, 3.2, 3.3, 3.4 and 3.5. The default value is 3.5 |
| exome | Boolean | Defines if the exome renormalized COSMIC signatures will be used. The default value is False |
| genome_build | String | The reference genome build, used for select the appropriate version of the COSMIC reference signatures, as well as processing the mutation calling file/s. Supported genomes include "GRCh37", "GRCh38", "mm9", "mm10", "rn6" and "rn7". The default value is "GRCh37". If the selected genome is not in the supported list, the default genome will be used |
| signature_database | String | Path to the input set of known mutational signatures (only in case that COSMIC reference signatures are not used), a tab delimited file that contains the signature matrix where the rows are mutation types and columns are signature IDs |
| exclude_signature_subgroups | List | Removes the signatures corresponding to specific subtypes to improve refitting (only available when using default COSMIC reference signatures). The usage is explained below. The default value is None, which corresponds to use all COSMIC signatures |
| export_probabilities | Boolean | Defines if the probability matrix per mutational context for all samples is created. The default value is True |
| export_probabilities_per_mutation | Boolean | Defines if the probability matrices per mutation for all samples are created. Only available when input_type is "vcf". The default value is False |
| make_plots | Boolean | Toggle on and off for making and saving plots. The default value is True |
| sample_reconstruction_plots | String | Select the output format for sample reconstruction plots. Valid inputs are 'pdf', 'png', 'both'and None. The default value is None |
| verbose | Boolean | Prints detailed statements. The default value is False |
Signature subgroups¶
When using COSMIC reference signatures, some subgroups of signatures can be removed to improve the refitting analysis. To use this feature, the exclude_signature_subgroups parameter should be added, following the syntax below:
exclude_signature_subgroups = ['MMR_deficiency_signatures',
'POL_deficiency_signatures',
'HR_deficiency_signatures' ,
'BER_deficiency_signatures',
'Chemotherapy_signatures',
'Immunosuppressants_signatures'
'Treatment_signatures'
'APOBEC_signatures',
'Tobacco_signatures',
'UV_signatures',
'AA_signatures',
'Colibactin_signatures',
'Artifact_signatures',
'Lymphoid_signatures']
The full list of signature subgroups is included in the following table:
| Signature subgroup | SBS signatures excluded | DBS signatures excluded | ID signatures excluded |
|---|---|---|---|
| MMR_deficiency_signatures | 6, 14, 15, 20, 21, 26, 44 | 7, 10 | 7 |
| POL_deficiency_signatures | 10a, 10b, 10c, 10d, 28 | 3 | - |
| HR_deficiency_signatures | 3 | - | 6 |
| BER_deficiency_signatures | 30, 36 | - | - |
| Chemotherapy_signatures | 11, 25, 31, 35, 86, 87, 90 | 5 | - |
| Immunosuppressants_signatures | 32 | - | - |
| Treatment_signatures | 11, 25, 31, 32, 35, 86, 87, 90 | 5 | - |
| APOBEC_signatures | 2, 13 | - | - |
| Tobacco_signatures | 4, 29, 92, 100, 109 | 2 | 3 |
| UV_signatures | 7a, 7b, 7c, 7d, 38 | 1 | 13 |
| AA_signatures | 22 | - | - |
| Colibactin_signatures | 88 | - | 18 |
| Artifact_signatures | 27, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 95 | - | - |
| Lymphoid_signatures | 9, 84, 85 | - | - |