Quick Start Example¶
This section provides a minimal example to get started with SigProfilerClusters. The following example uses a breast cancer sample (BRCA) or a melanoma sample (MELA) and demonstrates the complete workflow: generating simulations with SigProfilerSimulator and detecting clustered mutations with SigProfilerClusters.
Prerequisites¶
This tutorial requires that you have completed all steps in the installation guide, specifically:
- Installed SigProfilerClusters
- Installed SigProfilerSimulator
- Downloaded the GRCh37 reference genome using SigProfilerMatrixGenerator
Downloading input example data¶
Download one of the example VCF files below and place it in a new project directory (e.g., path/to/data/):
- BRCA_example_subs.vcf — breast cancer substitutions
- BRCA_example_indels.vcf — breast cancer indels
- MELA_example.vcf — melanoma substitutions
These samples and their expected outputs are also available under the examples/ directory within the GitHub repository.
Note
The simulations/ folder is not included in the GitHub repository to reduce its memory footprint.
Running SigProfilerClusters¶
First, start a Python interactive shell and import SigProfilerMatrixGenerator, SigProfilerSimulator, and SigProfilerClusters.
$ python
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> from SigProfilerClusters import SigProfilerClusters as hp
Step 1: Run SigProfilerSimulator¶
Generate a background model by running at least 100 simulations of inter-mutational distances for your data. Note: Update "path/to/data" with the actual path to the directory containing the VCF file.
>>> sigSim.SigProfilerSimulator("BRCA", "path/to/data", "GRCh37", contexts=["288"], chrom_based=True, simulations=100)
Step 2: Run SigProfilerClusters¶
Cluster mutations based on the simulated background distribution. Note: Update "path/to/data" with the actual path to your project directory.
>>> hp.analysis("BRCA", "GRCh37", "96", ["288"], "path/to/data",
analysis="all", sortSims=True, subClassify=True,
correction=True, calculateIMD=True, max_cpu=4,
variant_caller="standard")
After SigProfilerClusters has finished running, the output will be organized under path/to/data/output/. Partitioned mutations are placed under output/clustered/ and output/nonClustered/, and visualizations are found under output/plots/. To learn more about all output files, please refer to the Using the Tool - Output section.
Additional Information¶
In the above example, unspecified parameters use their default values. All function arguments and their types are described in detail in the Using the Tool - Input section. To learn more about the generated output files, refer to Using the Tool - Output.