GithubHelp home page GithubHelp logo

sciclone's Introduction

An R package for inferring the subclonal architecture of tumors

Installation instructions:

As of mid-2022, the NORMT3 package, which is a dependency of sciclone/bmm, has been removed from CRAN. It can be installed manually by doing something like:

$ wget https://cran.r-project.org/src/contrib/Archive/NORMT3/NORMT3_1.0.4.tar.gz
$ R CMD install NORMT3_1.0.4.tar.gz

Then proceed with the below instructions:

Both the 'sciClone' package and it's 'bmm' dependency can be installed by doing the following:

#install IRanges from bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
#install devtools if you don't have it already
install.packages("devtools")
library(devtools)
install_github("genome/bmm")
install_github("genome/sciClone")

If you prefer to build the package by hand, follow these steps:

  • Make sure that you have the dependencies from the CRAN and BioConductor repos: IRanges, rgl, RColorBrewer, ggplot2, grid, plotrix, methods, NORMT3, MKmisc, TeachingDemos

  • install the bmm package from https://github.com/genome/bmm

  • Download and build from source:

      git clone [email protected]:genome/sciclone.git
      R CMD build sciclone
      R CMD INSTALL sciClone_1.1.0.tar.gz
    

Usage

library(sciClone)

#read in vaf data from three related tumors
#format is 5 column, tab delimited: 
#chr, pos, ref_reads, var_reads, vaf

v1 = read.table("data/vafs.tumor1.dat",header=T);
v2 = read.table("data/vafs.tumor2.dat",header=T);
v3 = read.table("data/vafs.tumor3.dat",header=T);

#read in regions to exclude (commonly LOH)
#format is 3-col bed
regions = read.table("data/exclude.loh")

#read in segmented copy number data
#4 columns - chr, start, stop, segment_mean   
cn1 = read.table("data/copy_number_tum1")
cn2 = read.table("data/copy_number_tum2")
cn3 = read.table("data/copy_number_tum3")

#set sample names
names = c("Sample1","Sample2","Sample3")


#Examples:
#------------------------------------
#1d clustering on just one sample
sc = sciClone(vafs=v1,
         copyNumberCalls=cn1,
         sampleNames=names[1],
         regionsToExclude=reg1)
#create output
writeClusterTable(sc, "results/clusters1")
sc.plot1d(sc,"results/clusters1.1d.pdf")

#------------------------------------
#2d clustering using two samples:
sc = sciClone(vafs=list(v1,v2),
              copyNumberCalls=list(cn1,cn2),
              sampleNames=names[1:2],
               regionsToExclude=regions)
#create output
writeClusterTable(sc, "results/clusters2")
sc.plot1d(sc,"results/clusters2.1d.pdf")
sc.plot2d(sc,"results/clusters2.2d.pdf")


#------------------------------------
#3d clustering using three samples:
sc = sciClone(vafs=list(v1,v2,v3),
              copyNumberCalls=list(cn1,cn2,cn3),
              sampleNames=names[1:3],
               regionsToExclude=regions)
#create output
writeClusterTable(sc, "results/clusters2")
sc.plot1d(sc,"results/clusters2.1d.pdf")
sc.plot2d(sc,"results/clusters2.2d.pdf")
sc.plot3d(sc, sc@sampleNames, size=700, outputFile="results/clusters3.3d.gif")

#This pattern generalizes up to N samples, except for plotting, which caps out at 3d for obvious reasons.

Visualization

single-tumor plot

1d plot

2d comparison plot

2d plot

3d comparison plot

3d plot

Notes

  • Requires host system to have imagemagick installed before it can produce animated gif output of 3d plots.

  • Input formats described in more detail in the R documentation (see ?sciClone)

  • Many questions regarding sciClone usage have been asked and answered on Biostar: https://www.biostars.org/t/sciclone/

Accessory Scripts and Data

The sciClone-meta repo contains all data and scripts used to create the figures in the manuscript. It also contains a small suite of tests that demonstrate the capabilities of sciClone and verify that it is installed correctly.

Reference

Manuscript published at PLoS Computational Biology (doi:10.1371/journal.pcbi.1003665)

SciClone: Inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution

Christopher A. Miller1*, Brian S. White2*, Nathan D. Dees1, John S. Welch2,3, Malachi Griffith1, Obi Griffith1, Ravi Vij2,3, Michael H. Tomasson2,3, Timothy A. Graubert2,3, Matthew J. Walter2,3, William Schierding1, Timothy J. Ley1,2,3, John F. DiPersio2,3, Elaine R. Mardis1,3,4, Richard K. Wilson1,3,4, and Li Ding1,2,3,4

1The Genome Institute

2Department of Medicine

3Siteman Cancer Center

4Department of Genetics Washington University, St. Louis, MO 63110, USA

* These authors contributed equally to this work

sciclone's People

Contributors

brummett avatar bswhite avatar chrisamiller avatar jingquanlim avatar malachig avatar ndees avatar nnutter avatar sakoht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sciclone's Issues

how integrate CNV results into clonal evolution by sciclone

Hi, Chrisamiller.
I carefully read your paper about sciclone. But i am confused how to add CNVs into clones. your paper has a simply description: THetA was used to detect clonal and subclonal copy-number events in a multiple myeloma sample, then converted to pseudo-VAFs and co-clustered with SNV data using SciClone.
So, Firstly how to converted CNV results from THetA to preudo-VAFs?
secondly, if the CNVs are domaint, the preudo-VAFs are 0.5?
Thirdly, if the SNVs locate into the CNV regions, why we excluded them when we run sciClone?

looking forward to your apply. Thanks
Shanlan

Plotting code for Fig3b, 3c

Hi, I'm trying to plot the posterior density as shown in Figures 3b and 3c. However, I cannot figure out what the posterior for each cluster should be given (alpha, beta, mu, nu). I was wondering if you could point me to an explicit formulation for the posterior, either in your paper or the Ma paper. Even better would be the code that you used to make the figure.

Thanks!

Running sciClone on cell prevalence (CP) data

Hi,

As I understand, sciClone uses estimates of variant allele frequency (VAF) in copy-neutral regions to estimate subclonality (thus, in a manner, not using CNV info; please correct me if I am wrong).

The exome sequencing data I have got has relatively fewer mutations with higher variation in copy number across the genome. When I remove the non-neutral copy number regions (copy number != 2), I am left with very few mutations. Given that I have 5 or more samples for each case, the resulting VAF matrix becomes very sparse leading to poor results.

I was wondering if I can complement the VAF with copy number and ploidy to compute cancer cell fraction (http://www.nature.com/leu/journal/v28/n1/fig_tab/leu2013248f1.html#figure-title) or, alternatively, compute cell prevalence values using PyClone or ASCAT, and feed that into sciClone. Will sciClone clustering work as it does when using VAFs? If yes, that's excellent; if not, can you recommend some alternative tool for subclonal reconstruction.

Thanks,
Ikram

Error in .normargSEW0(start, "start") error

I am using SciClone to treat my own data but I got the following error message:

Error in .normargSEW0(start, "start") :
'start' must be a numeric vector (or NULL)

My vafs file and copy number file look like:
WeChat293ffa70c7770587d03102b4c9d2585b
WeChat36af60fce12ba2c1b92d80f7102e297d
Why is this error occur?

Too few clusters

Hi,

I am trying to use SciClone to infer heterogeneity of tumours. I am using the TCGA database for this (10.000+samples), however I only seem to get 1, very rarely 2 clusters for almost all samples. I assume this is incorrect, as in the original SciClone paper the analysis was also done on the TCGA and most examples in this paper actually show 4+ clusters (figure 4a). Also, using the R package Maftools inferHeterogeneity function which uses a similar method actually does give many clusters as output for most samples. Do you have any idea what could cause this? My code is fairly simple:

sciClone(vafs, copyNumberCalls = calls, sampleNames = samp, verbose=FALSE)

example data:

vafs[1:10,]
chromosome position t_ref t_alt vaf n_alt_count n_depth
1 10 123810032 4 38 0.8837209 0 74
2 11 124489539 127 89 0.4101382 0 188
3 11 47380512 6 6 0.5000000 0 15
4 11 89868837 34 36 0.5142857 0 81
5 12 107371855 75 53 0.4140625 0 138
6 12 108012011 61 34 0.3578947 0 92
7 12 7980269 43 32 0.4210526 0 93
8 12 8082458 206 28 0.1191489 0 283
9 14 100363606 48 49 0.5051546 0 83
10 14 33290999 75 48 0.3902439 0 121
calls[1:10,]
Chromosome Start End Segment_Mean
629305 1 3218610 16137774 2.0084743
629306 1 16138059 16138095 0.6436284
629307 1 16142960 180271108 2.0139111
629308 1 180275654 180493841 2.8774682
629309 1 180493944 180494500 5.6768869
629310 1 180495670 180908017 2.9071389
629311 1 180912006 247813706 2.0080567
629312 2 484222 236771407 1.9928042
629313 2 236772143 237408806 2.8661220
629314 2 237410658 242476062 1.9841210

I do this for each sample in a loop instead of all samples in the same data frame, as my computer was unable to work when I tried to run the whole dataset in one go. The mean number of variants in each sample is 210. I have tried changing the copyNumberMargins parameter or leaving out copy numbers alltogether but this does not change much.

The exact files that I used are the broad.mit.edu_PANCAN_Genome_Wide_SNP_6_whitelisted.seg for copy numbers and mc3.v0.2.8.PUBLIC.maf.gz for mutations from https://gdc.cancer.gov/about-data/publications/pancanatlas.

Problem including copynumber data into Sciclone clustering for WES

Hi,
Can anyone help?
I'm using Sciclone since few weeks ago and now I want to include copynumber file into clustering.
Copynumber calls were made by VarScan2 and segmentation by DNAcopy accordingly VarScan parameters:
dkoboldt.github.io/varscan/copy-number-calling.html#copy-number-caller

However, the segment_mean values from my copynumber input (generated by DNAcopy) is quite different from the one at sciclone-meta, ranging from ~15 to ~35 instead of 0 to ~3

Here's the head of my copynumber.dat:

1 13377 13677 42.025
1 13777 721826 22.5625
1 721926 1190560 15.7044
1 1190660 1197650 23.7857
1 1197686 1580647 16.5196
1 1581343 1582332 30.28
1 1585595 1588914 21.825

Should I make some processing before loading into Sciclone?

Thanks,

Marco

Question about 3d plot

Hi Chris,

Attached is a snap of my 3d plot:

There is a big cluster in the bottom left corner but then nothing in a large area upper right. Many of the circles are cut off. But when I review the demo of 3d plot in the README, I found no circles are cut off.

What's the matter of my plot?

Furthermore, the labeling isn’t completely straightforward with the arrows placed at the end of the axes. I wonder if this can’t be fixed up a bit more and some kind of legend provided.

Thank you!!😊

Topological tumour samples

Hi, I am trying to run 4 tumour samples from different cancer regions. I would like to know which parameters do you recommend to use.

  • The data was cleaned from possible CNAs
  • The study can use all variants in the input files "minimum depth = 0"

I used this code, but cluster 1 only have 3 variants, How can I removed this cluster? Maybe the method? ..
sc = sciClone(vafs=list(dfVAF1,dfVAF2,dfVAF3,dfVAF4),cnCallsAreLog2=FALSE,copyNumberCalls=NULL, regionsToExclude=NULL,useSexChrs=TRUE,doClusteringAlongMargins=FALSE,minimumDepth=0,sampleNames=samples,clusterMethod="bmm",doClustering=TRUE, annotation=NULL, maximumClusters=10, verbose=TRUE)

Thanks for your help,
Michelle.

how to solve the bias of clusters

hi, chris.
Now, I have converted some CNV to pseudo-VAF. And I combined the SSM with pseudo-VAF for clonal analysis by Sciclone. I always get 3 clones by different maxmumClusters (see fig as below). But I think some clusters should be separeted into 2 clusters. For example, cluster2 included some low-VAF SSM and high-VAF, the high-VAF should be specific domaint clone at R7 and the low-VAF should be belong to it. So, do we seperated them into two clusters by myself? Looking forward to your apply. Thanks.

P.s. The data from 30X WGS, the minmumDepth=10. I improve the mimDepth:20, then I get same results)

1539329321 1

can't do clustering - no copy number 2 regions to operate on in sample 15

Hi. @chrisamiller
I have run sciClone on my own data. Here is the data for sample 15.

> vaf_list[[15]]    # Yes. Only one variants.
   Chromosome Start_Position bam_ref_count bam_alt_count        VAF
1:          1       46258772           208            11 0.05022831


> cnv_list[[15]][1:10]
    V2      V3      V4          V6
 1:  1  818812 2844810  0.15639900
 2:  1 2872543 3638424  0.32069612
 3:  1 3642918 3662783  0.08472943
 4:  1 3664010 3738752  0.43624381
 5:  1 3741219 5730074 -0.15601310
 6:  1 5739571 6152783  0.09711385
 7:  1 6162015 6276497 -0.20277001
 8:  1 6283765 6481110 -0.28512172
 9:  1 6483947 9651286  0.05055026
10:  1 9655351 9846645 -0.08805264

Below are the sciClone codes I used. I set the cnCallsAreLog2 as TRUE since my seg.mean contained both positive and negative values.

sc = sciClone(vafs = vaf_list, copyNumberCalls = cnv_list, sampleNames = Clin$Tumor_Sample_Barcode,
              minimumDepth = 20, cnCallsAreLog2 = T, doClustering = T, clusterMethod = 'bmm',
              verbose = T, doClusteringAlongMargins = T, copyNumberMargins = 0.25)

And it reported a warning and stopped.

[1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
can't do clustering - no copy number 2 regions to operate on in sample 15 

It seems that sciClone worked well for the first 14 samples so possibly no format issue of the input data existed. I have referred the issue #23 and tried the argument of copyNumberMargins with 0.3 and even 1 but the problem still existed.

VAF almost zero in pltos

I am not sure what may be going wrong - but the plots dont really display the VAF of the variants. Is anyone experiencing this problem? What may cause it?

Thanks for any hints

> head(muts_rel)
  chr     pos ref_reads var_reads        vaf
1   1  894890         6         0 1.00000000
2   1  992846         7         0 1.00000000
3   1 1068848         5         0 1.00000000
4   1 1933035         5         2 0.71428571
5   1 2520872         6        85 0.06593407
6   1 2585425         6        19 0.24000000

> sc = sciClone(vafs=muts_rel,
+               copyNumberCalls=cnasegs_rel, 
+               sampleNames=p.rel, 
+               cnCallsAreLog2 = TRUE, 
+               clusterMethod = 'binomial.bmm',
+               minimumDepth = 50, maximumClusters = 3)
> sc.plot1d(sc,"~/Desktop/clusters.rel.1d.pdf")

clusters rel 1d pdf

option 'binomial.bmm' not working

Hi!
I'm trying to compare different tools/methods to estimate clonality in tumors with TCGA data in order to see which samples have a clustering consensus between PyClone, Sciclone and probably Expands too. Not only methods but models too. Beta model as 'bmm' worked fine but I've tried to use 'binomial.bmm' model like this:

sc <- sciClone(vafs=vafs, copyNumberCalls=copyNumberCalls', regionsToExclude=NULL,
               sampleNames='sample_1', minimumDepth=10, clusterMethod="binomial.bmm",
               clusterParams='empty',
               cnCallsAreLog2=FALSE, useSexChrs=FALSE, doClustering=TRUE,
               verbose=TRUE, 
               copyNumberMargins=0.4, maximumClusters=20,
               annotation=NULL, doClusteringAlongMargins=TRUE,
               plotIntermediateResults=0)

And I got the following error:

[1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
56 sites (of 83 original sites) are copy number neutral and have adequate depth in all samples
27 sites (of 83 original sites) were removed because of copy-number alterations
0 sites (of 83 original sites) were removed because of inadequate depth
27 sites (of 83 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
kmeans initialization:
V1
0.0519283013246183
0.142023395832538
0.336833578224629
0.419642857142857
0.733333333333333
0.251004016064257
0.296137404232378
0.15625
0.0676238184942296
0.286014546319517
0.212617983109786
0.0441841565047945
0.0937402190923318
0.13343949044586
0.127155172413793
0.380713489409142
0.266546587505349
0.2320029563932
0.166202620930319
0.325301204819277
Using threshold:  0.7 
lb decreased from -375.904937 to -550.464025!
lb decreased from -550.464025 to -1305714.532528!
lb decreased from -1305714.532528 to -1958467.476601!
lb decreased from -1958467.476601 to -4554357.319464!
lb decreased from -4554357.319464 to -5222001.150376!
lb decreased from -5222001.150376 to -5222019.306761!
Error in binomial.bmm.filter.clusters(vafs.merged, vafs, vars, total.trials,  : 
  Not implemented because std dev not implemented for binomial!

Here is an example of the data frames I'm using:

> vafs

   Chromosome  Position Ref_Count Alt_Count  Var_freq
1           1  37948064       273        15  5.208333
2           1  48771547        78         6  7.142857
3           1  55464936       121        43 26.219512
4           1  91818104        26         4 13.333333
5           1 100327202        57         4  6.557377
6           1 109439595       106         5  4.504505
7           1 145537822       136        21 13.375796
8           1 156255757       497        12  2.357564
9           1 157738319       240       158 39.698492
10          1 183520990        69         6  8.000000
11          1 210010489        39        10 20.408163
12          1 248004586       130        94 41.964286
13          2  25048948        43        26 37.681159
14          2  42990448        27         5 15.625000
15          2  99013095        64         7  9.859155
16          2 220047043       203       105 34.090909
17          2 220164096       168         7  4.000000
18          2 227985759       101        15 12.931034
19          2 238443205       124        42 25.301205
20          3  95374425       158        35 18.134715
> copyNumberCalls

   Chromosome     Start       End        cn
1           1   3218610  29121711 2.0734022
2           1  29123985  29125119 0.4408012
3           1  29131442  45626175 2.0797354
4           1  45634781  45652241 1.0721450
5           1  45665580  50697381 2.1361308
6           1  50699438  50701121 5.0925915
7           1  50702054  71793221 2.1779940
8           1  71793222  71796991 0.8512177
9           1  71797759  80464059 2.2256088
10          1  80467161  80469130 0.8779435
11          1  80470816  91418026 2.2192928
12          1  91419132  91427460 1.2402234
13          1  91429035  96476493 2.1854040
14          1  96476513  96476570 0.8682004
15          1  96478125 115974088 2.1914716
16          1 115985027 115986104 0.8019588
17          1 115990063 120527361 2.1860100
18          1 149879545 183765232 2.6667822
19          1 183765806 183765914 1.1654604
20          1 183771039 219399794 2.7109528

Thanks!

PS: To my knowledge PyClone do not use a minimum depth for clustering. Do you recommend to set the minimumDepth parameter to 0 in order to have better comparable results?

Can I still use sciclone?

Hello

I have non matched tumour samples of responders to chemotherapy and non-responders where responders and non-responders are different patients

I want to know why some patients response and some don't

as my samples are not matched, can I still use your software?

Thanks

Backfilling ref and alt depth for variants only detected in one sample of multi-sample clustering

For example, If a variant is only detected in the relapse sample, but not in the primary tumor sample, the code will filter this variant because there's "no" adequate depth in all samples (primary and relapse) as indicated in sciClone.R line 120-123.

So should I backfill the ref and alt count for this site in the primary sample using a pileup method, so that every sample specific variant will have a ref and alt count and thus could be used in the clustering?

tumor purity

Hi,

Thanks for this tool. I was wondering how sciclone takes into account of tumor purity when inferring clonal events. I saw this commits that you removed it e60f6b7

Now, I have tumor purity calculated from other tools such as ESTIMATE, ABSOLUTE and Sequenza.
How should I use it in the context of sciclone?

Thanks!
Tommy

sc.plot2dWithMargins Error

Hi,the func "sc.plot2dWithMargins " was okay before, but now, it always report the error as below,
Error: $ operator is invalid for atomic vectors
hope you reply. Many thanks.
Mo

Question about the result of sciclone

Hi, I have some questions about the results when using sciclone for analysis and would like to ask for advice:
(1) What is the meaning of cluster.means, cluster.upper, and cluster.lower, do they represent the mean, maximum, and minimum VAF values in the cluster?
(2) I got my vcf file from Mutect2, and the file is simply read as follows:

> head(vcf)
    V1     V2                           V9                                                        V10
1 chr1  69511 GT:AD:AF:DP:F1R2:F2R1:FAD:SB          0/1:0,480:0.997:480:0,175:0,145:0,327:0,0,275,205
2 chr1 930314 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:288,246:0.448:534:121,97:86,71:212,172:139,149,117,129
3 chr1 941119 GT:AD:AF:DP:F1R2:F2R1:FAD:SB             0/1:0,226:0.994:226:0,97:0,52:0,156:0,0,137,89
4 chr1 942382 GT:AD:AF:DP:F1R2:F2R1:FAD:SB             0/1:115,10:0.056:125:28,3:25,0:96,5:38,77,10,0
5 chr1 942383 GT:AD:AF:DP:F1R2:F2R1:FAD:SB               0/1:120,3:0.03:123:26,0:25,0:100,3:44,76,3,0
6 chr1 942391 GT:AD:AF:DP:F1R2:F2R1:FAD:SB            0/1:132,11:0.063:143:32,3:28,1:106,8:42,90,10,1

Could I use the AF value as VAF value? And I found that a large number of AF values are greater than 0.5, and the CCF values calculated from them may be greater than 1. Are these normal?
(3) Can I use CCF values as input for VAF values?

Thank you very much!

result inconsistency during clustering step

Hi,

I have been testing SciClone as a solution to make clonality predictions for one of the datasets I am working with.
I noticed that if I run SciClone multiple times on the same sample data I get a prediction consistency of 90% but then there is this 10% that produces a completely different prediction.
My test case has 212 variant of which only 22 are selected for clustering. After running with the code below I get 2 clusters predicting most of the time but in that 10% of the runs I get 4 clusters predicted.

Not sure if this is a bug, something expected or maybe a problem caused by other library but thought on reporting just in case you want to investigate. I may be able to send my test files if needed.

Thanks

sc = sciClone(vafs=v1,
              copyNumberCalls=cn1,
              sampleNames=names[1],
              cnCallsAreLog2 = TRUE,
              copyNumberMargins = 0.50,
              minimumDepth = 100,
              maximumCluster = 10
              )

My session information:

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] sciClone_1.1.0 TeachingDemos_2.10 MKmisc_0.991
[4] plotrix_3.6-1 RColorBrewer_1.1-2 rgl_0.95.1441
[7] bmm_0.3.1 NORMT3_1.0-3 ggplot2_2.1.0
[10] IRanges_2.4.8 S4Vectors_0.8.11 BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 plyr_1.8.3 gtable_0.2.0 scales_0.4.0
[5] robustbase_0.92-5 DEoptimR_1.0-4 munsell_0.4.3 colorspace_1.2-6

can't do clustering

Hi!

I'm trying to run SciClone for a WGS sample (7476 SNV). For the cna calling of this sample I do not have log2 segment mean values but rather 0,1,2 or 3 from Control-FREEC 9.1 software. Therefore, I only have the segments aberrant ploidy but not diploid segments. However, according to the documentation SciClone assumes that mutations that are not within those altered segments are diploid and gives a cn of 2.

segments file as copyNumberCalls dataframe is :

  chromosome    start      end cn
1          9 21900000 22000000  1
2         12 66450000 66500000  0
3         20 47350000 47500000  3

When I tried to run SciClone like this:


sc <- sciClone(vafs=vafs, copyNumberCalls=copyNumberCalls, regionsToExclude=NULL,
                 sampleNames=id_pat, minimumDepth=0, clusterMethod='bmm',
                 clusterParams="no.apply.overlapping.std.dev.condition",
                 cnCallsAreLog2=FALSE, useSexChrs=FALSE, doClustering=TRUE,
                 verbose=TRUE, 
                 copyNumberMargins=NULL, maximumClusters=15,
                 annotation=NULL, doClusteringAlongMargins=TRUE,
                 plotIntermediateResults=0)

I got the following error:

[1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
can't do clustering - no copy number 2 regions to operate on in sample 1 

Then I decided to download the chromosome lengths like it was suggested here (https://www.biostars.org/p/16396/ ):

$ curl -s ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz | gunzip -c 

And complete the copyNumberCalls data frame with the missing segments like this:

   chromosome    start       end cn
4           1        0 249250621  2
5           2        0 243199373  2
6           3        0 198022430  2
7           4        0 191154276  2
8           5        0 180915260  2
9           6        0 171115067  2
10          7        0 159138663  2
11          8        0 146364022  2
1           9 21900000  22000000  1
12          9        0  21899999  2
13          9 22000001 141213431  2
14         10        0 135534747  2
15         11        0 135006516  2
2          12 66450000  66500000  0
16         12        0  66449999  2
17         12 66500001 133851895  2
18         13        0 115169878  2
19         14        0 107349540  2
20         15        0 102531392  2
21         16        0  90354753  2
22         17        0  81195210  2
23         18        0  78077248  2
24         19        0  59128983  2
3          20 47350000  47500000  3
25         20        0  47349999  2
26         20 47500001  63025520  2
27         21        0  48129895  2
28         22        0  51304566  2

And then I've tried to run SciClone again and still got the same error.

I cannot provide the SNV mutation calling file because of confidentiality issues. But it has low coverage and as I mentioned before there are a lot of mutations (7476 SNV).

I'm aware I am running SciClone with min depth of 0 which is definitely not recommended but I also tried to run it with at least 10 of min depth and still got the same error.

I also checked whether this is a data type issue but I do not think is the case...

sapply(copyNumberCalls, class)
 chromosome       start         end          cn 
"character"   "integer"   "integer"   "numeric" 
sapply(vafs, class)
 chromosome    position   ref_reads   var_reads         vaf 
"character"   "integer"   "integer"   "integer"   "numeric"

I would be grateful if you could give me some clue about this error.

Install hang and manual build trouble

My R on my Macbook hung while installing sciClone, after displaying the message "** preparing package for lazy loading".

I then decided to build it myself manually, and followed the instructions, but the command:

R CMD build sciClone

gave the error message: cannot change to directory ‘sciClone’

I eventually realized the problem was that I cloned the sciClone repository by downloading the zip file, and this created directory "sciclone-master" rather than sciclone, and I had already changed into that directory, and was able to build with:

R CMD build .

So, to avoid confusion, you might change the README file to spell it out:

R CMD build

e.g.

R CMD build sciclone
or perhaps:
R CMD build sciclone-master
or if you are in the directory already:
R CMD build .

Finally, after I was able to run the 'build' successfully, I tried installing with:

R CMD INSTALL sciClone_1.1.0.tar.gz

and once again it hung while preparing package for lazy loading:

* installing to library ‘/Library/Frameworks/R.framework/Versions/3.3/Resources/library’
* installing *source* package ‘sciClone’ ...
** R
** preparing package for lazy loading

"usage" example refers to files no longer present in sciclone-meta repo

It's a bit confusing to discover that the per-tumor VAF files referred to in Usage have been merged into one file in sciclone-meta/tests/data. So even if one starts out in sciclone-meta/tests, the code in Usage will not run. It might be a good idea to update this since it documents the known working input format for point mutations and SVs (which is of interest to me as I'd like to replace the latter with GRanges-based ploidy calls).

can't do clustering

Hi!

I'm trying to use sciClone and I got this issue :

can't do clustering - no copy number 2 regions to operate on in sample 1 

From here genome/sciclone-meta#1 it might be that I have a version issue.
Just to make sure, do I have it? Or is it just a data issue?

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] devtools_1.12.0     sciClone_1.1.0      TeachingDemos_2.10  MKmisc_0.991        plotrix_3.6-2      
 [6] RColorBrewer_1.1-2  rgl_0.95.1201       bmm_0.3.1           NORMT3_1.0-3        ggplot2_2.1.0      
[11] IRanges_2.0.1       S4Vectors_0.4.0     BiocGenerics_0.12.1 biomaRt_2.22.0      plyr_1.8.4         

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.2 Biobase_2.26.0       bitops_1.0-6         colorspace_1.2-6    
 [5] DBI_0.4-1            DEoptimR_1.0-6       digest_0.6.9         GenomeInfoDb_1.2.5  
 [9] gtable_0.2.0         memoise_1.0.0        munsell_0.4.3        Rcpp_0.11.4         
[13] RCurl_1.95-4.8       robustbase_0.92-6    RSQLite_1.0.0        scales_0.4.0        
[17] tools_3.1.2          withr_1.0.2          XML_3.98-1.4    

I just put the default parameters of SciClone except the fact that I have log2 cna values, the vector with the sample names and two lists of data frames with vafs and CNA info per sample each one with the columns specified in the documentation.

sc <- sciClone(vafs=vafs, copyNumberCalls=copyNumberCalls, regionsToExclude=NULL,
         sampleNames=sampleNames, minimumDepth=100, clusterMethod="bmm",
         clusterParams=NULL,
         cnCallsAreLog2=TRUE, useSexChrs=TRUE, doClustering=TRUE,
         verbose=TRUE, 
         copyNumberMargins=0.25, maximumClusters=10,
         annotation=NULL, doClusteringAlongMargins=TRUE,
         plotIntermediateResults=0)

Sorry if this is a stupid question or if I'm missing something trivial.

Incorporating CNA events

Hi, I'm curious if your suggested strategy for incorporating CNA variants into a clonality analysis has changed since the the Sciclone paper in 2014. I'm looking to assign clone numbers to mutations from tumor samples using sciclone but I don't want to exclude variants in CNA regions.

Is it reasonable to simply "adjust" a VAF directly by the copy number of that variant?
ie VAFnew = VAF/(CN/2)

Application on targeted NGS data

Hi, I am trying to apply sciclone to targeted NGS panel data, but seems to be difficult and unreliable. Do you have any insights and suggestions? Thank you!!

How to get five columns of vaf data for sciclone

Hi:
I may be new to bioinformatics, and I don’t know much about some knowledge. I now have matched bam files and vcf files. How can I use these data to get the five-column vaf data required by sciclone? There is also the copy number. Can these files be omitted when running sciclone in LOH? Looking forward to your reply, thank you very much!

2D plot doesn't show all clusters

Hi Chris,

I found the 2D result doesn't show all clusters. In the below image, it only appears cluster# 1, 4, 6.

image

My code is:

# sample_01 vs sample_03
sc1_3 = sciClone(vafs=list(v1,v3),
              copyNumberCalls=list(cn1,cn3),
              sampleNames=names[c(1,3)],
              cnCallsAreLog2=TRUE,
              minimumDepth=5)
#              regionsToExclude=reg1)


writeClusterTable(sc1_3, "../results/D-01-01_D-01-03_SciClone.2d")
sc.plot2d(sc1_3,"../results/D-01-01_D-01-03_SciClone.2d.pdf")

The first lines of my input:

> head(v1)
  V1        V2  V3 V4   V5
1  1  16956761  33  5 13.2
2  1  17215684 201  8  4.7
3  1  36202125  71 14 14.7
4  1 120594740  48  6 13.3
5  1 143282519  31  4 10.3
6  1 145296448 103  5  5.7

> head(v3)
  V1       V2  V3 V4   V5
1  1  9776429  38  4 10.3
2  1 16908892  21  3 12.5
3  1 16908894  21  3 12.5
4  1 16910063 195 15 10.0
5  1 16914393  33  6 13.3
6  1 16956761  26  7 21.2

> head(cn1)
  V1        V2        V3       V4
1  1     14597  53236566 2.262630
2  1  53237847  53238074 0.535404
3  1 121351809 121482587 0.810228
4  1 143278556 143744052 0.885952
5  1 144598400 148247541 2.157410
6  1 152079714 153507465 1.991010

> head(cn2)
  V1        V2        V3       V4
1  1     14597  53236566 2.308250
2  1  53237847  53238163 0.762231
3  1 121351809 143537963 0.956697
4 10     93083  38736684 2.321730
5 10  39078787  42817808 0.814677
6 10  47708884  49245684 1.852790

Is there any way to show all clusters?

Thanks,
Yifei

What is inadequate depth?

I am learning to run sciclone with a sample pair from the same patient. I have done the CNA with copywriteR and have the log2-segmented-values as input (as also specified by cnCallsAreLog2=TRUE).

First, I observed that the algorithm discards the majority of the variants due to inadequate depth. How is inadequate depth defined - what are the margins? Are there any recommendations on how I can better prepare the input data?

Secondly, even more variants are discarded when doing a joint analysis - is this because a 'adequate' depth is required for both samples?

Thanks already for any explanations! Here below the details of my input.

Best,
Michael

initial diagnosis sample

104 sites (of 850 original sites) are copy number neutral and have adequate depth in all samples
12 sites (of 850 original sites) were removed because of copy-number alterations
745 sites (of 850 original sites) were removed because of inadequate depth
746 sites (of 850 original sites) were removed because of copy-number alterations or inadequate depth

relapse

158 sites (of 972 original sites) are copy number neutral and have adequate depth in all samples
13 sites (of 972 original sites) were removed because of copy-number alterations
809 sites (of 972 original sites) were removed because of inadequate depth
814 sites (of 972 original sites) were removed because of copy-number alterations or inadequate depth

joint run

27 sites (of 1658 original sites) are copy number neutral and have adequate depth in all samples
1500 sites (of 1658 original sites) were removed because of copy-number alterations
1631 sites (of 1658 original sites) were removed because of inadequate depth
1631 sites (of 1658 original sites) were removed because of copy-number alterations or inadequate depth

here is a histogram of the segmeans values (log2) as obtained from copywriteR
segmeans

sc.plot1d fails when Gaussian cluster method used.

Finding the sc.plot1d function fails when the Gaussian clustering was used in the fit.

Here is an example, and it has happened on 4 of my multi-sample subjects, so I think it

should be easily reproducible with test data.

scifit.gauss <- sciClone(vaf=somlist, copyNumberCalls=cnalist, cnCallsAreLog2=TRUE,

  •          regionsToExclude=lohlist, clusterMethod="gaussian.bmm",
    
  •          sampleNames=samnames, minimumDepth=40)
    
    [1] "checking input data..."
    ...
    [1] "finished clustering full-dimensional data..."
    [1] "found 2 clusters using gaussian.bmm in full dimensional data"

sc.plot1d(scifit.gauss,"sciresults/clusters.gauss.1d.pdf")
Error in clust$individual.fits.y[[i]][d, ] :
incorrect number of dimensions
In addition: Warning message:
In max(clust$fit.y[d, ]) : no non-missing arguments to max; returning -Inf

Checking gauss results to other results. Problem seems to be in the dimension

of scifit.gauss@clust$individual.fits.y, has multiple rows for other cluster results,

but only a vector for gaussian.

length(scifit.gauss@clust$individual.fits.y)
[1] 2
dim(scifit.gauss@clust$individual.fits.y[[1]])
NULL
dim(scifit.gauss@clust$individual.fits.y[[2]])
NULL
length(scifit.gauss@clust$individual.fits.y[[2]])
[1] 1
dim(scifit.binom@clust$individual.fits.y[[2]])
[1] 2 81
dim(scifit.beta@clust$individual.fits.y[[2]])
[1] 2 999

Question about sciclone output

Hi

I have found a sciclone output via internet like attached

sciclone_out.txt

> head(sciclone_out[1:2,])
  chr        st Time_1.ref Time_1.var Time_1.vaf Time_1.cn Time_1.cleancn
1   1 160771583       8788       6182   41.29593         2              2
3   1 220276043       4557       3464   43.18664         2              2
  Time_1.depth Time_2.ref Time_2.var Time_2.vaf Time_2.cn Time_2.cleancn
1        14970       8606          9 0.10446895         2              2
3         8021      22588         19 0.08404476         2              2
  Time_2.depth Time_3.ref Time_3.var Time_3.vaf Time_3.cn Time_3.cleancn
1         8615       6938       3137   31.13648         2              2
3        22607       2406       1108   31.53102         2              2
  Time_3.depth Time_4.ref Time_4.var Time_4.vaf Time_4.cn Time_4.cleancn
1        10075      10541         10 0.09477775         2              2
3         3514      13074          9 0.06879156         2              2
  Time_4.depth Time_5.ref Time_5.var Time_5.vaf Time_5.cn Time_5.cleancn
1        10551       6684        700   9.479957         2              2
3        13083       3161        422  11.777840         2              2
  Time_5.depth adequateDepth cluster cluster.prob.1 cluster.prob.2 cluster.prob.3
1         7384             1       1              1          1e-09          1e-09
3         3583             1       1              1          1e-09          1e-09
  cluster.prob.4 gene_name   Time_1     Time_2   Time_3     Time_4    Time_5
1          1e-09       LY9 41.29593 0.10446895 31.13648 0.09477775  9.479957
3          1e-09     IARS2 43.18664 0.08404476 31.53102 0.06879156 11.777840
> 

From this data, how I know if a variant (gene) is clonal or sub-clonal?

I have two groups of patients and my goal is comparing the number of clonal and sub-clonal variants between them or may be further connection sub-clonal fraction to clinal data

Here we have clustered of VAF but I can not figure out how to say if a genes having a variant is clonal or sub-clonal

Can you help me?

Thanks

CCF value in sciclone

HI
I am running sciclone on just one sample. I want to get the ccf for each mutation, but cannot find it in the output file generated by writeClusterTable. Could you tell me where to find ccf values?

Thank you very much !

Yang

Cannot install NORMT3

It appeared that NORMT3 has several dependencies, like gfortran and c-compiler . Even after installing these dependencies, NORMT3 cannot be installed with the following errer:

$ R CMD install NORMT3_1.0.4.tar.gz
* installing to library ‘/Library/Frameworks/R.framework/Versions/4.1/Resources/library’
* installing *source* package ‘NORMT3’ ...
** package ‘NORMT3’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c IPerfcvec.c -o IPerfcvec.o
clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c IPwofzvec.c -o IPwofzvec.o
clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c NORMT3_init.c -o NORMT3_init.o
gfortran -mmacosx-version-min=10.13 -fno-optimize-sibling-calls  -fPIC  -Wall -g -O2  -c toms680-1.f -o toms680-1.o
toms680-1.f:164:72:

  164 |             QLAMBDA = QLAMBDA/H2
      |                                                                        ^
Warning: ‘h2’ may be used uninitialized in this function [-Wmaybe-uninitialized]
toms680-1.f:164:72: Warning: ‘qlambda’ may be used uninitialized in this function [-Wmaybe-uninitialized]
clang -mmacosx-version-min=10.13 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o NORMT3.so IPerfcvec.o IPwofzvec.o NORMT3_init.o toms680-1.o -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0'
ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
ld: library not found for -lgfortran
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [NORMT3.so] Error 1
ERROR: compilation failed for package ‘NORMT3’
* removing ‘/Library/Frameworks/R.framework/Versions/4.1/Resources/library/NORMT3’

Any help is highly appreciated?

MacOS Monterey
R version 4.1.1 (2021-08-10) -- "Kick Things"

Sample data files

Are the sample data files from the usage example available somewhere?

#read in vaf data from three related tumors
#format is 5 column, tab delimited: 
#chr, pos, ref_reads, var_reads, vaf

v1 = read.table("data/vafs.tumor1.dat",header=T);
v2 = read.table("data/vafs.tumor2.dat",header=T);
v3 = read.table("data/vafs.tumor3.dat",header=T);

I did not see them in https://github.com/genome/sciclone-meta

Input purity dara

Hi,

I'm working with spatial tumour samples, and I would like to know how I have to input de purity data and inform sciclone to use it.

Thanks,
Michelle.

Not all mutations are clustered

Hi Miller.
I use SciClone to test my own data. I run ten samples at a time and each sample contains ~100 mutations. But the result only returns two clusters and certain mutations are dropped, which means many of them do not belong to any clusters...
How to explain this result? do you have any ideas?

Possible bias related to tumour purity?

Hello,

I would like to know if purity affects the ability of SciClone to identify subclones in solid tumors. In other words, does SciClone tends to report more subclones in high purity samples than in low purity samples, assuming the true number of subclones is the same?

Thank you

regions to exclude file format

Hi, what do the 2 last columns of the regions to exclude "loh" file supposed to contain? The examples are not very clear on the contents of the last two columns.
Thanks.

Not mapping relapse/primary specific variants

Hello,
I am trying to make a 2d plot for primary versus relapse samples, but it does not plot varaitns which are specific to a particular sample, i.e not shared between the samples. I checked the cluster file and it designates varaitns with no vaf as NA instead of 0. Is there a way to chacnge this? on a normal datafram is.na would work but doesnt seem to be the case here. Could you please help? Below is the command

sc <- sciClone(vafs=list(tum,rel), sampleNames=samples, maximumClusters = 5, minimumDepth=20, verbose=1, doClusteringAlongMargins=TRUE, clusterMethod = "binomial.bmm")

sc.plot2d(sc, "Patient1.pdf")

Description for [email protected]

Hi (again) @chrisamiller,

If you please, I've got a (basic) question about sciClone outputs, specifically about the [email protected] data frame.

  chr        st 545_D.ref 545_D.var 545_D.vaf 545_D.cn 545_D.cleancn 545_D.depth
1   1  22987700        27         7 0.2058824        2             2        3400
2   1  22987701        28         7 0.2000000        2             2        3500
3   1  47548113        19         3 0.1363636        2             2        2200

What are *cn cols, please? I've read the code but I'm not able to identify them.

Thanks again for your help.

Error in xtfrm.data.frame(x)

When i run sciClone function,
sc = sciClone(vafs=vaf_list, sampleNames=sample, maximumClusters = 2)
ERROR comes:
Error in xtfrm.data.frame(x) : cannot xtfrm data frames

I'm sure my commands worked in the previous version, one year ago.
My version now: R 4.3.0 SciClone 1.1.0

Required packages not documented

When installing sciclone, I got error messages about the following missing packages, which should be documented as requirements:

MKmisc, TeachingDemos

can't do clustering - no copy number 2 regions to operate on in sample 1

Hi @chrisamiller,

I'm having some issues in the clustering process... I've read some posts on the internet but without success: the error is very strange because if I work (incorrectly) with VAF in values per 1 and not per 100, I can run the script correctly, but if transform the VAF to percentage I get this error.

This'd be my data (values per 1):

> head(v1)
  chr       pos ref_reads var_reads       vaf
1   1  22987700        27         7 0.2058824
2   1  22987701        28         7 0.2000000
3   1  47548113        19         3 0.1363636
4   1  47610400        27         5 0.1562500
5   1 248458201         9        10 0.5263158
6   2  74691711        25        20 0.4444444
>str(v1)
'data.frame':	37 obs. of  5 variables:
 $ chr      : Factor w/ 24 levels "1","10","11",..: 1 1 1 1 1 12 12 12 3 19 ...
 $ pos      : int  22987700 22987701 47548113 47610400 248458201 74691711 220416423 191940981 119996467 44220929 ...
 $ ref_reads: int  27 28 19 27 9 25 20 12 9 8 ...
 $ var_reads: int  7 7 3 5 10 20 5 6 5 7 ...
 $ vaf      : num  0.206 0.2 0.136 0.156 0.526 ...

And the other one with values per 100:

>head(v1)
  chr       pos ref_reads var_reads      vaf
1   1  22987700        27         7 20.58824
2   1  22987701        28         7 20.00000
3   1  47548113        19         3 13.63636
4   1  47610400        27         5 15.62500
5   1 248458201         9        10 52.63158
6   2  74691711        25        20 44.44444

The table with copy number info is like this (I'm not working with the mean but with the absolute copy number):

   Chrom    Start       End Copy.Number
12    13 19747993 115089308        1.22
13    16 48580048  90142240        1.14
14    22 16266919  51220606        0.88
15     6 31753336  43968347        0.92

Any idea what I'm missing, please? Thanks in advance

How to distinguish between clones and subclones

Hi, thank you so much for developing such a great software! I know that sciclone can divide mutations into multiple clusters(subclone), but it's not clear to me how to distinguish between clones and subclones. Looking forward to your help. Thaks.

suggestions/fix for SNVs with 0 alt reads for 1 sample, but sufficient in other sample

For multiple tumor samples, one research question is to track any new variants or clones that arise in later visits. Currently, if an SNV has no alt reads in the first sample, but sufficient alt reads in a later sample, sciclone gets an error with "a<=0". Can the code make an exception for these kinds of variants if the variant has alt reads in a later sample? Right now a possible workaround is to add 'pseudo-reads' to both the ref and alt reads where there are zero alt reads, so as to make the variant "look" like it is there, but at very low percentage.

Install doesn't work in Mac OS X R version 3.1.1

sciClone Install fails in Mac OS X 10.9.5 R version 3.1.1
The error message implicates rgl, however rgl works OK when I test that package separately.

install_github("genome/sciClone")
Downloading github repo genome/sciClone@master
Installing sciClone
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL
'/private/var/folders/q8/gtgpy2n515d2lf1mj77pv4t00000gp/T/Rtmp3ZShVl/devtools2dce448a6a3e/genome-sciclone-6aba18d'
--library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library' --install-tests

installing source package ‘sciClone’ ...
R
preparing package for lazy loading
Error : .onLoad failed in loadNamespace() for 'rgl', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgl/libs/rgl.so':
dlopen(/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgl/libs/rgl.so, 6): Library not loaded: /opt/X11/lib/libGLU.1.dylib
Referenced from: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgl/libs/rgl.so
Reason: image not found
Error : package ‘rgl’ could not be loaded
ERROR: lazy loading failed for package ‘sciClone’
removing ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/library/sciClone’
Error: Command failed (1)

writeClusterTable not working

Hi!
I managed to run SciClone with my data but when trying to output the results into a file with the function writeClusterTable it gives me the following error:

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  : 
  unimplemented type 'list' in 'EncodeElement'

Here is a copy of the data:

> vafs
   Chromosome  Position Ref_Count Alt_Count  Var_freq
1           1  41503205        55         3  5.172414
2           1 203277995        41        22 34.920635
3           2  60684822        27        13 32.500000
4           2 179584492         8        13 61.904762
5           3  38622694        13        17 56.666667
6           3  49400049        31        24 43.636364
7           5 179728342        31        14 31.111111
8           6  26250543        33        32 49.230769
9           6  32185850        18         8 30.769231
10          6 117674228        61         4  6.153846
11          7  26678937        19         4 17.391304
12          7  74146970       325       203 38.446970
13          8  24774812        47        29 38.157895
14          8 125107171        21        20 48.780488
15          9  21331505        19         7 26.923077
16          9  91606794        37         4  9.756098
17         10 121580376        50         3  5.660377
18         11    533877        11        16 59.259259
19         11   2443403        13         9 40.909091
20         11 111164668        26        31 54.385965
21         11 111179016        13         9 40.909091
22         13  26789705        23        13 36.111111
23         14 100792558        14         4 22.222222
24         15  65622917        14         3 17.647059
25         15  74967474        10         5 33.333333
26         15  83686290        87        62 41.610738
27         16    784523         4         4 50.000000
28         16  87723875        31        25 44.642857
29         17  38345579        25         4 13.793103
30         17  43190522        32         3  8.571429
31         17  71257869        26         4 13.333333
32         19    991253        11         9 45.000000
33         19  52716021        27        34 55.737705
> copyNumberCalls
  Chromosome     Start       End         cn
1           1   3218610  11395771   2.057226
2           1  11396834  11398386   0.816769
3           1  11400409 103783851   2.059937
4           1 103784405 103789772  0.7621252
5           1 103806285 247813706   2.055515
6           2    484222 242476062   2.053379
7           3   2212571 197538677   2.057511
8           4   1053934  99712077   2.058509
9           4  99712121  99744855   2.777525
10          4  99745304 178037786   2.054233
11          4 178044840 178075230   1.160463
12          4 178075857 188763651   2.079447
13          5    914233 180360469   2.056655
14          6   1014281 170903919   2.060936
15          7    705284   7296520    2.08161
16          7   7299636   7317605   1.254054
17          7   7324800 158385118   2.060365
18          8    617667 128442469   2.062937
19          8 128447549 128452766  0.8280559
20          8 128453765 145232496   2.055658
21          9    789932 140938752   2.063939
22         10    415240   1560338   1.131393
23         10   1560743   1560898  0.1053017
24         10   1561211 120568325   1.158213
25         10 120568512 120570510  0.1443357
26         10 120570830 135225087   1.135006
27         11    456120 134142530   2.054518
28         12    889902 133161346   2.057226
29         13  19450806 114987458   2.060651
30         14  20501368 105988038   2.062651
31         15  23687685 101884307   2.067948
32         16    653459  89379936    2.07743
33         17    987221  21303007    1.14298
34         17  21305392  38466347   2.887058
35         17  38466377  38506146   2.368051
36         17  38527770  80917016   2.894873
37         18    329586  77109240   2.068808
38         19    284018  37753239    2.05281
39         19  37753941  37755731  0.9014379
40         19  37794850  40261323   2.033549
41         19  40263913  40264738  0.9745445
42         19  40265709  58878226   2.037359
43         20    455764  62219837    2.09159
44         21  15347621  47678774   2.059366
45         22  17423930  49331012   2.066372

Here is how I run sciClone:

sc <- sciClone(vafs=vafs, copyNumberCalls=copyNumberCalls, regionsToExclude=NULL,
               sampleNames='Patient1', minimumDepth=20, clusterMethod="bmm",
               clusterParams='empty',
               cnCallsAreLog2=FALSE, useSexChrs=FALSE, doClustering=TRUE,
               verbose=TRUE, 
               copyNumberMargins=0.25, maximumClusters=10,
               annotation=NULL, doClusteringAlongMargins=TRUE,
               plotIntermediateResults=0)

I also tried this but it gave me the same error:

results<-sc@"vafs.merged"

write.table(results, file = "cluster_results", append = FALSE, quote = FALSE, sep = "\t",
            eol = "\n", na = "NA", dec = ".", row.names = FALSE,
            col.names = FALSE, qmethod = c("escape", "double"),
            fileEncoding = "")

I would be grateful if you could help me to figure out how to obtain the file with the results. By the way, function writeClusterSummaryTable works and sc.plot1d too.

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.