single-cell-genetics / cardelino Goto Github PK

Clone identification from single-cell data

R 100.00%

single-cell scrna-seq somatic-mutations clonal-clustering gibbs-sampling

cardelino's Introduction

cardelino: clone identification from single-cell data

cardelino contains a Bayesian method to infer clonal structure for a population of cells using single-cell RNA-seq data (and possibly other data modalities).

In its main mode cardelino requires:

An imperfect clonal tree structure inferred using, for example Canopy, from bulk or single-cell DNA sequencing data (e.g. bulk whole exome sequencing data).
Single-cell RNA sequencing data from which cell-specific somatic variants are called using, for example cellsnp-lite.

Installation

Release version

You can install the release version of cardelino from BioConductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cardelino")

Development version

The development version of cardelino can be installed using the remotes package thus:

# install.packages("remotes")
remotes::install_github("single-cell-genetics/cardelino")

Getting started

The best place to start are the vignettes. From inside an R session, load cardelino and then browse the vignettes:

library(cardelino)
browseVignettes("cardelino")

Notes on donor deconvolution

The donor demultiplex function, named Vireo, was previously supported in this R package, but now has been re-implemented in Python, which is more memory efficient and easier to run via a command line. We, therefore, highly recommend you switch to the Python version: https://vireoSNP.readthedocs.io.

The vireo function is not supported from version >=0.5.0. If you want to use the R functions, please use the version ==0.4.2 or lower. You can also find it in a separate branch in this repository: with_vireo branch or use the donor_id.R file directly. However, using the Python implementation of Vireo is strongly recommended.

Code of Conduct

Please note that the cardelino project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Citation

If you find cardelino helpful please consider citing:

McCarthy, D.J., Rostom, R., Huang, Y. et al. (2020) Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nature Methods

About the name

cardelino is almost an anagram of "clone ID R" and is almost the same as the Italian for "goldfinch", a common and attractive European bird, pictured below and used in cardelino's hex sticker. In the Western art canon, the goldfinch is considered a "saviour" bird and appears in notable paintings from the Italian renaissance and the Dutch Golden Age. Perhaps this package may prove a saviour for certain single-cell datasets!

Acknowledgement: The cardelino image was produced by Darren Bellerby. It was obtained from Flickr and is reproduced here under a CC-BY-2.0 licence.

cardelino's People

Contributors

Stargazers

Watchers

Forkers

zorrodong mengchengyao shians noahpieta shulp2211 junseonghwan xiaomeili1 jeffreypullin zktuong qindan2008 yilinzhao1615 nicola-calonaci

cardelino's Issues

installation issues

Hi!
I am really interested in trying this tool, but I had not been able to install it.

I have tried via devtools and get the following error:
Error: Failed to install 'cardelino' from GitHub:
(converted from warning) dependencies ‘XML’, ‘conquer’, ‘pbkrtest’ are not available

If I try to install the dependencies, I got the following error (same for all three):
Warning in install.packages :
package ‘conquer’ is not available (for R version 3.5.0)

I tried the singularity (within conda environment, since I am not allowed to install anything on the server outside conda env), but I kept getting the following error:
singularity exec rsc.img R
INFO: Convert SIF file to sandbox...
FATAL: while extracting rsc.img: root filesystem extraction failed: could not extract squashfs data, unsquashfs not found

I am not familiar with singularity and have tried quite a lot of stuffs from gooling, but cannot get passed this error.

Would you please help me with the installation?
Thank you.

Remove commented out code

Add GitHub Actions

Is cardelino applicable to 10X or DropSeq data?

I'm very interested in cardelino.
I wonder if cardelino is applicable to 10X/dropseq data which suffers more from data sparsity and dropout events.
Since 10X technique is inferior to SmartSeq2 in sequencing depth, could it's higher throughput be a possible compensate?

Thank you.

Output prob matrix from cell_assign_*() lacks column names when using relax_Config option

Fix NEWS.md

Currently causing a warning in BiocCheck

check failed with warnings from multiple thresholds with foreach and doMC

Build #73 fails due to warnings, where it says

Undefined global functions or variables: %dopar%

The %dopar% is used in multiple thresholds with foreach and doMC in function vireo_flock in donor_id.R. Any idea on how to avoid the warning?

Yuanhua

Remove date from DESCRIPTION

Bug report: clone_id and cell_assign_Gibbs

Hi,

I am running into an error when using clone_id.

assignments <- cardelino::clone_id(input_data$A,
input_data$D,
Config=config.tree,
inference = "sampling")

Input A and D have the same dimension.

The error message:
Error in { :
task 1 failed - "new columns would leave holes after existing columns"

After debugging, I think it might be caused by the cell_assign_Gibbs(0.4.2 release)/clone_id_Gibbs(in the main branch) function.

A[which(D == 0)] <- NA
D[which(D == 0)] <- NA

Using which here will cause undefined columns in A and D, thus causing the error message.

Could you please confirm and fix this?

Thanks,

Yilin

scRNA-based variants ?

It seems that would be a powerful tool for the scRNA-seq data.
One of main concern is that acquiring reliable variants set for scRNA seq is challenge due to the high false positive and the limited sensitivity. I wonder that do you have any reliable/recommended pipeline to get variants from scRNA-seq data? It would good to have a vignette about preparing variant data I guess