The main goal of the developing a ranking scheme in translational cancer research, is to aid the biological interpretation of lists of annotated cancer variants at the single patient resolution. Briefly, taking into account different annotation resources, such as variant effect prediction, cancer evidence, expression and systems biology properties, an integrated scoring value is assigned to each variant as a hollistic score, ranking variants from a single patient mutations list. In addition, the first version of the scoring process is based on single nucleotide changes that occur in the protein-coding space; that is, somatic mutations that can lead to several possible changes to a protein. Overall, the main rationale is despite the fact that a significant number of variant annotation tools in cancer research are available for a robust exploitation of putative somatic variants, no direct score or assessment is available for a simple ranking or prioritization of the interrogated variants, based on the plethora of distinct evidence.
The ranking score is based on the output of the OpenCRAVAT annotation platform (https://doi.org/10.1200/cci.19.00132) and can serve as an additional plug-in, aiding in the interpretation of the annotated variant calling results. The open-source OpenCRAVAT toolkit possesses significant and important features, by making it possible to integrate multiple sources of evidence for variant annotation and exploitation.
In addition, for the construction and the relative score cut-offs, various guidelines and publications were considered for estimating the oncogenicity of somatic mutations, based on big consortia, such as:
-
The user has to initially run OpenCRAVAT web server (https://run.opencravat.org/) or install locally (https://open-cravat.readthedocs.io/en/latest/quickstart.html). The input can be a vcf file, or a txt with necessary columns (https://open-cravat.readthedocs.io/en/latest/File-Formats.html)
-
The following annotators should run for SNVs: gnomAD, ClinVar, CIViC, FATHMM XF Coding, VEST4, SpliceAI, COSMIC, CScape Coding, Cancer Gene Census, Cancer Gene Landscape, Cancer Hotspots, SiPhy and Phast Cons (14 annotators if having hg19 as the reference genome, to also include hg19 coordinates).
-
Similarly, for InDels: gnomAD, ClinVar, COSMIC, Cancer Gene Census, Cancer Gene Landscape, LoFtool and MutPred-Indel.
-
Next, an RData file has to be created either from the download section of the web server, or locally using the installed version of OpenCRAVAT
oc report example_input.sqlite -t rdata
For more details see here
- Before running the scoring functions, for installation initially the SVRACAS github repository needs to be downloaded/cloned. Then, in R the following should be run:
source("Scoring.SNVs.R")
source("Scoring.InDels.R")
- Finally, after creating the necessary RData file including the variants from one patient/sample, the main functions in R to run are:
ranked_snvs = scoring.func.snvs.core(rdata_dir, exp.genes=NULL,
ref.genome=c("hg19","hg38"),
sample.name.output,
w1=0.7, w2=0.8, w3=0.9, w4=1)
ranked_indels = scoring.func.indels.core(rdata_dir, exp.genes=NULL,
ref.genome=c("hg19","hg38"),
sample.name.output,
w1=0.7, w2=0.8, w3=0.9, w4=1)
install.packages(c("DT","tidyverse","jsonlite"))
Here we present a simple example using the mutations from a randomly selected colorectal cancer patient sample ("crc4") from published Reiter et al., 2018 study (https://doi.org/10.1126/science.aat7171), mainly utilizing the Kim et al., 2015 publication (https://clincancerres.aacrjournals.org/content/21/19/4461#:~:text=10.1158/1078-0432.CCR-14-2413). Then, the web version of OpenCRAVAT was used to perform integrative variant annotation using the 15 aforementioned annotators, and the relative RData file was created. Below, a snapshot of the created html file with the top 10 hits are depicted (based on the initial snv version:
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] htmlwidgets_1.5.3 DT_0.18 jsonlite_1.7.2 forcats_0.5.1 stringr_1.4.0
[6] dplyr_1.0.6 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.2
[11] ggplot2_3.3.4 tidyverse_1.3.1
For any questions, suggestions or issues please directly use my email or the github issue page
Stefan Wiemann
Kym Pagel
Rick Kim
Olga Papadodima