Skip to content

SigProfilerExtractor Output

This section goes over the different files that are generated when running SigProfilerExtractor. The example files throughout this section were generated from the Quick Start Example.


Output

The output directory results will contain subdirectories for each mutational context passed in as parameters, a JOB_METADATA.txt file, and a Seeds.txt file. Below is a preliminary view of the files that will be generated in results. OutputHierarchy

JOB_METADATA.txt

This file contains all the metadata about the system and runtime of the job.

The main sections of the file include the following: - System Info - Python and Package Versions - Execution Parameters - Analysis Progress - Job Status

Below is an example of what the metadata file contains.

OutputHierarchy

Seeds.txt

Seeds.txt is a tab seperated text file with two columns containing the replicate IDs and preset seeds. This file can be passed through the "seed" parameter in order to reproduce a run.

Below is an example of what the Seeds.txt file contains.

OutputHierarchy

Mutational Context Subdirectory

For this section, the subdirectory SBS96 and its contents will be used as an example. Different mutational contexts will share the same file structure.

Each mutational context subdirectory (ex. SBS96, ID83, DBS78) contains the following files: - All_solutions subdirectory - Suggested_Solution subdirectory - All_solutions_stat.csv - SBS96_selection_plot.pdf - Samples.txt

All_Solutions subdirectory

The All_Solutions subdirectory contains the results from running extractions at each rank within the range of the input. To learn more about the output found in All_Solutions, refer to the section Output - Suggested Solution.

Suggested_Solution subdirectory

The Suggested_Solution subdirectory contains the optimal solution. To learn more about the output found in the Suggested_Solution, refer to the section Output - Suggested Solution.

All_solutions_stat.csv

This file contains the record of the relative reconstruction error (the squared distance between the original data and its “estimate”) and process stability. This file contains columns with the following values for each signature identified from the input samples: - Stability (calculated average silhouette coefficient) - Minimum Stability - Considerable Solution - P-value - Matrix Frobenius % - Mean sample L1% (calculated as the sum of the absolute values of the vector) % - Maximum sample L1% - Mean sample L2% (calculated as the square root of the sum of the squared vector values) % - Maximum sample L2% - Mean sample KL (Kullback-Leibler divergence) - Maximum sample KL - Mean Cosine Distance - Max Cosine Distance - Mean Correlation - Minimum Correlation
All_solutions_stat.csv

SBS96_selection_plot.pdf

In this example the file is a SBS96_selection_plot.pdf. This file contains a plot between the mean sample cosine distance and the average stability. The vertical gray bar indicates the optimal number of signatures selected by SigProfilerExtractor.
SBS96_selection_plot

Samples.txt

This file contains the matrix of the number of mutations found in each of the samples corresponding to that mutational context. For example, in the file below, each row corresponds to a SBS96 mutational context A[C>A]A and every column corresponds to a sample (PD10010a). Thus the numbers represent the number of mutations found in that context in each sample.
Samples.txt