Quick Start Example
This section provides an example for users to quickly get started with using SigProfilerExtractor tool. The following example will use somatic mutational data from 21 breast and will showcase how to start with a text or VCF input file.
Prerequisites¶
This tutorial requires that you have completed all steps in the installation guide:
- Python version >= 3.4.0
- SigProfilerExtractor
- SigProfilerMatrixGenerator reference genome download
- GRCh37
Downloading Input Example Data¶
This example uses somatic mutational data from 21 breast cancer genomes. Download the example dataset 21BRCA.zip at the following location or use the commandline:
ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerExtractor/Example_data/
If using the command line, then enter the following command in bash on OS X or Unix systems:
$ wget ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerExtractor/Example_data/21BRCA.zip
Once 21BRCA.zip has been downloaded, unzip the file. The unzipped 21BRCA folder contains 21BRCA.txt and another folder 21BRCA_vcf. The file 21BRCA.txt is a mutational matrix defined using SBS-96 classification and 21BRCA_vcf contains 21 VCF files (one per each breast cancer sample).
Running SigProfilerExtractor (VCF)¶
You will be using the 21 VCF files located in the subfolder 21BRCA_vcf as input for this example.
First, start a Python interactive shell and import the SigProfilerExtractor library.
$ python3
>>> from SigProfilerExtractor import sigpro as sig
Next, extract the signatures by running the following command. Note: Update "path/to/21BRCA_vcf" with the actual path to the 21BRCA_vcf folder.
>>> sig.sigProfilerExtractor("vcf", "results", "path/to/21BRCA_vcf", genome_build="GRCh37", minimum_signatures=1, maximum_signatures=10, nmf_replicates=100)
After the program has finished running, there will be an output directory named results that will contain the output and is located in the directory where the Python instance was started. To learn more about the output produced by SigProfilerExtractor, refer to Using the Tool - Output.
Running SigProfilerExtractor (Matrix)¶
You will be using the mutational matrix defined using SBS-96 classification named 21BRCA.txt as input for this example.
First, start a Python interactive shell and import the SigProfilerExtractor library.
$ python3
>>> from SigProfilerExtractor import sigpro as sig
Next, extract the signatures by running the following command. Note: Update "path/to/21BRCA.txt" with the actual path to 21BRCA.txt.
>>> sig.sigProfilerExtractor("matrix", "results", "path/to/21BRCA.txt", reference_genome="GRCh37", minimum_signatures=1, maximum_signatures=10, nmf_replicates=100, cpu=-1)
After the program has finished running, there will be an output directory named results that will contain the output and is located in the directory where the Python instance was started. To learn more about the output produced by SigProfilerExtractor, refer to Using the Tool - Output.
Additional Information¶
In the above examples, the other non specified parameters are passed in with their default values. All of the function arguments and their types are explained in detail in the Using the Tool - Input section. To learn more about the files that were produced, you can refer to Using the Tool - Output.
Video Tutorials¶
Learn more about SigProfilerExtractor by following along with our video tutorials. In these tutorials we install SigProfilerExtractor, run the quick start example program, and review the output from the example program.