Skip to content

Quick Start Example


This section provides a minimal example to get started with SigProfilerClusters. The following example uses a breast cancer sample (BRCA) or a melanoma sample (MELA) and demonstrates the complete workflow: generating simulations with SigProfilerSimulator and detecting clustered mutations with SigProfilerClusters.


Prerequisites

This tutorial requires that you have completed all steps in the installation guide, specifically:

Downloading input example data

Download one of the example VCF files below and place it in a new project directory (e.g., path/to/data/):

These samples and their expected outputs are also available under the examples/ directory within the GitHub repository.

Note

The simulations/ folder is not included in the GitHub repository to reduce its memory footprint.

Running SigProfilerClusters

First, start a Python interactive shell and import SigProfilerMatrixGenerator, SigProfilerSimulator, and SigProfilerClusters.

$ python
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> from SigProfilerClusters import SigProfilerClusters as hp

Step 1: Run SigProfilerSimulator

Generate a background model by running at least 100 simulations of inter-mutational distances for your data. Note: Update "path/to/data" with the actual path to the directory containing the VCF file.

>>> sigSim.SigProfilerSimulator("BRCA", "path/to/data", "GRCh37", contexts=["288"], chrom_based=True, simulations=100)

Step 2: Run SigProfilerClusters

Cluster mutations based on the simulated background distribution. Note: Update "path/to/data" with the actual path to your project directory.

>>> hp.analysis("BRCA", "GRCh37", "96", ["288"], "path/to/data",
                analysis="all", sortSims=True, subClassify=True,
                correction=True, calculateIMD=True, max_cpu=4,
                variant_caller="standard")

After SigProfilerClusters has finished running, the output will be organized under path/to/data/output/. Partitioned mutations are placed under output/clustered/ and output/nonClustered/, and visualizations are found under output/plots/. To learn more about all output files, please refer to the Using the Tool - Output section.

Additional Information

In the above example, unspecified parameters use their default values. All function arguments and their types are described in detail in the Using the Tool - Input section. To learn more about the generated output files, refer to Using the Tool - Output.