Skip to content

Using the Tool - SV Input

This section describes how to generate Structural Variant (SV) matrices using SigProfilerMatrixGenerator.

Input Format

The first six columns are required, and either the column svclass OR the columns strand1 & strand2 must be present.

Example with SV Class (TSV or CSV file)

chrom1 start1 end1 chrom2 start2 end2 svclass
19 21268384 21268385 19 21327858 21327859 deletion

Example with Strand Information (TSV or CSV file)

chrom1 start1 end1 chrom2 start2 end2 strand1 strand2
19 21268384 21268385 19 21327858 21327859 + +

SV Classes

  • deletion
  • translocation
  • tandem-duplication
  • inversion

Python Usage

# Navigate to SVMatrixGenerator directory and start python3 interpreter
from SigProfilerMatrixGenerator.scripts import SVMatrixGenerator as sv

input_dir = "./SigProfilerMatrixGenerator/references/SV/example_input/560-Breast"
output_dir = "./SigProfilerMatrixGenerator/references/SV/"
project = "560-Breast"

sv.generateSVMatrix(input_dir, project, output_dir)

Command Line Usage

python3 ./SigProfilerMatrixGenerator/scripts/SVMatrixGenerator.py \
    ./SigProfilerMatrixGenerator/references/SV/example_input/560-Breast \
    560-Breast \
    ./SigProfilerMatrixGenerator/references/SV/example_output/

Arguments: input_dir, project, output_dir


R Usage

conda create --name spmg_r -y
conda activate spmg_r
conda install python=3.10 r-base r-devtools r-reticulate -c conda-forge -y
pip install SigProfilerMatrixGenerator
echo 'devtools::install_github("SigProfilerSuite/SigProfilerMatrixGeneratorR")' | Rscript -

R Example

R
> library("reticulate")
> use_python("path_to_your_python3")
> py_config()
> library("SigProfilerMatrixGeneratorR")
> sv <- SVMatrixGenerator("[your_repo]/test_data/SV", "test_SV", "[your_repo]/test_data/")

Function Arguments

Required Parameters

Parameter Type Description Example
input_dir string Path to directory containing SV bedpe files (one per sample) "./references/SV/example_input/560-Breast/"
project string Project name for this instance "560-Breast"
output_dir string Path to directory for output files "./references/SV/example_output/"

Note: If the output directory doesn't exist, a new one will be created.


SV Classification Schema (SV32)

The SV32 classification schema bins structural variants by:

  1. Size categories:
  2. 0-10kb
  3. 10kb-100kb
  4. 100kb-1Mb
  5. 1Mb-10Mb
  6. 10Mb

  7. Clustering status:

  8. Clustered
  9. Non-clustered

  10. SV type:

  11. Deletion
  12. Tandem duplication
  13. Inversion
  14. Translocation (size-independent)

This creates a 32-channel classification matrix for comprehensive structural variant analysis.