GithubHelp home page GithubHelp logo

morris-lab / celltagr Goto Github PK

View Code? Open in Web Editor NEW
18.0 5.0 9.0 39.94 MB

This repository contains the CellTag data analysis R package to support clone calling and lineage reconstruction.

R 100.00%
human-cell-atlas

celltagr's People

Contributors

babiddy avatar ebutka avatar jindalk avatar kaethekong avatar sam-morris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

celltagr's Issues

CellTagPostCollapsing returns collapsed.count with 1 row when reading in multiple BAM files

Hello, I've run the pipeline on a number of bam files going from 1 to 3, and whenever I attempt to run the pipeline on more than one bam file, CellTagPostCollapsing assigns a collapsed.count slot with 1 row. During the run of CellTagPostCollapsing. I can run the pipeline on my individual bam files without issue, but when I try to include multiple, I run into this. I think that the aggregate function call is missing during the step that combines the results of starcode together, as this warning pops up.

Aggregate function missing, defaulting to 'length'

I'm not a data.table expert, and I've not tried to tear apart the source yet, but will add on if I figure out anything.

Best way to subset CellTag object

Hello,

I would like to subset my CellTag object to contain only cells that pass certain RNA QC thresholds. When I try to subset the CellTag object to contain only a list of cells, I repeatedly receive the error object of type 'S4' is not subsettable.

I would like to subset using the whitelisted.data slot. If it is not possible to subset the entire CellTag object, do you recommend a way to subset the whitelisted.data matrix, which would then be compatible with downstream clonecalling analyses?

Thank you!

CellTagR support for Drop-seq BAM files

Dear morris-lab,

Thank you for a great tool. The CellTagR workflow appears to be directed towards 10X-generated BAM files. When following the workflow for CellTag extraction and quantification using Drop-seq BAM files however, I am unable to extract any CellTags. CellTags are present however, since extraction from the R2 fastq file appears successful. The Drop-seq BAM files were generated using the Drop-seq_Alignment_Cookbook. I assume this is due to the bam.process function not being able to parse the BAM tags for cell barcode (XC) and UMI (XM) contained within the Drop-seq BAM file? Will there be support for Drop-seq BAM files in addition to 10X BAM files?

Also, in preparing Drop-seq BAM files for CellTag extraction, if one does not perform alignment using a custom reference (by concatenating the 3' UTR sequence of the CellTag transcript and the GFP coding sequence of the CellTag transcript to the reference genome), will CellTags be lost post-alignment? Is it recommended to keep unmapped reads within the aligned BAM file when not using a custom reference?

Thank you for the support. Much appreciated.

Overlaying cell source (i.e. which bam) over graph networks

First, thank you both for this tool & for all of your previous help to read in multiple bam files!

Now, I am trying to figure out how to color the resultant single force-directed graph network based on that cell or clones belonging to either sample-1 or sample-2. Any ideas?

Thank you again!
Rob

statistical analysis to detect fate bias in clones

Hi Morris Lab!

I've run the CellTagR pipeline on my test dataset. I'd now like to do some statistical analyses and power calculations to look for lineage fate bias in my clones and the required cell number to robustly detect biases.

In your Nature paper there is a description of comparing trajectories in the paragraph "Trajectory discovery via randomized testing." Would you be able to share the R script used to run this analysis?

Did you perform power analysis to determine the minimum number of cells needed to compare clonal fates? I have a case where I have relatively small clone sizes (2-8 cells), so I am trying to determine how I can ensure enough statistical power to identify clonal bias. With smaller clones sizes, I am sure I will need higher numbers of total cells, but I need to figure out how much higher.

Thanks for any advice!

Sincerely,
Lauren

SingleCellDataBinatization error?

Hi CellTag team,

I've run through the CellTagR tutorial with my data and created a bam.test.obj with the v1 and v3 celltag libraries (I did not transduce the v2 library). I've been using the starcode colIapsing of celltags. I noticed after running the v3 library that I was getting this output after running SingleCellDataBinatization:

[1] "Binarized" Average: NA Frequency: NA

for the v1 library it looked like this:
[1] "Binarized" Average: 14.61329 Frequency: 19.40946

I was able to continue forward and running the Jaccard analysis and call clones. I noticed when comparing my object to your provided V1V2V3 object, that the binary.mtx slot looks different, specifically the @p slot, where I think it should be 0 and 1s for the matrix. I have numbers > 1.

str(v1v2v3 object from tutorial)
..@ binary.mtx :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. .. ..@ i : int [1:21330] 1444 1181 3221 1274 1145 9 36 225 295 368 ... .. .. ..@ p : int [1:19196] 0 0 0 0 0 1 1 1 1 1 ... .. .. ..@ Dim : int [1:2] 3812 19195 .. .. ..@ Dimnames:List of 2 .. .. .. ..$ : chr [1:3812] "AAACCTGAGTATGACA-1" "AAACCTGCAGCCTATA-1" "AAACCTGGTAAGTAGT-1" "AAACCTGTCACAACGT-1" ... .. .. .. ..$ : chr [1:19195] "v1.AAAAAAGA" "v1.AAAAAAGC" "v1.AAAAAATA" "v1.AAAAACTC" ... .. .. ..@ x : num [1:21330] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..@ factors : list()

str(my_object)
..@ binary.mtx :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. .. ..@ i : int [1:2991588] 3040 242 1165 3051 4110 4333 4736 4814 4971 7886 ... .. .. ..@ p : int [1:32203] 0 1 1 21 22 22 23 24 79 80 ... .. .. ..@ Dim : int [1:2] 18652 32202 .. .. ..@ Dimnames:List of 2 .. .. .. ..$ : chr [1:18652] "AAACCCAAGACTACCT-1" "AAACCCAAGACTTGTC-1" "AAACCCAAGCGGGTTA-1" "AAACCCAAGGTTAGTA-1" ... .. .. .. ..$ : chr [1:32202] "v1.v1.AAAAAAAA" "v1.v1.AAAAAAAC" "v1.v1.AAAAAAGA" "v1.v1.AAAAAAGC" ... .. .. ..@ x : num [1:2991588] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..@ factors : list()

I'm not sure what to think of this, if I should be concerned or it means something went wrong in my collapsing? I'm currently running now without any starcode collapsing to see if I get the same thing.

Thanks for any advice!

Sincerely,
Lauren

Starcode issue with CellTag object of 12 samples and binarization error

I have 12 different samples generated from the same starting pool of cells (same CellTag V1 labeling). I created the CellTag object by reading the 12 bam files in the correct order and proceeded to the starcode step.

Issue 1:
12 collapsing files with suffix Sample-1, Sample-2, ...., Sample-12 were generated and I starcoded them one-by-one and put the resulting files in one folder:
collapsing_result_Sample-1.txt
collapsing_result_Sample-2.txt
collapsing_result_Sample-3.txt
collapsing_result_Sample-4.txt
collapsing_result_Sample-5.txt
collapsing_result_Sample-6.txt
collapsing_result_Sample-7.txt
collapsing_result_Sample-8.txt
collapsing_result_Sample-9.txt
collapsing_result_Sample-10.txt
collapsing_result_Sample-11.txt
collapsing_result_Sample-12.txt

However, when doing
My.obj <- CellTagDataPostCollapsing(celltag.obj = My.obj, collapsed.rslt.file = list.files(collapsed.rslt.dir, full.names = T))
The collapsing result files were processed in the wrong order as:
collapsing_result_Sample-1.txt
collapsing_result_Sample-10.txt
collapsing_result_Sample-11.txt
collapsing_result_Sample-12.txt
collapsing_result_Sample-2.txt
collapsing_result_Sample-3.txt
collapsing_result_Sample-4.txt
collapsing_result_Sample-5.txt
collapsing_result_Sample-6.txt
collapsing_result_Sample-7.txt
collapsing_result_Sample-8.txt
collapsing_result_Sample-9.txt

which gave me different sample order in the collapsed.count matrix, following Sample-1 is Sample-10 but not Sample-2.
Screenshot 2024-03-27 at 12 22 00 PM

I tried rename the collapsing result files for the code to process from 1-10, still, however, after collapsing, the order in collasped.count matrix is still the same wrong order (as the picture shown above).

Issue 2:

I tried to skip starcode and proceeded to binarization
My.obj <- SingleCellDataBinarization(My.obj, 2)
which give me this error:

Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In int2i(as.integer(i), n) : NAs introduced by coercion to integer range

This error also showed up when I did binarization with the collapsed.count matrix though the order is wrong.

Please indicate:

  1. does it matter that the collapsed.count matrix order is not strictly from 1 to 12? If it matters, how to correct this?
  2. How to fix the binarization error?

Many thanks!
Feng

Running multiple BAM does not seem to capture all cellTags present in the data

I have a cell tag experiment with five BAM files. I've attempted to run the pipeline to completion several times, and only find clones in the first file. I noticed that this persists when I change the order of the files as well. When I run each of the BAM files independently, I get completely different number of cell tags except for the first and last bam file. This appears to happen extremely early in the pipeline, before collapsing. If I run wc -l on the files produced by the combined approach:

101731 _Sample-1.txt
46 _Sample-2.txt
117 _Sample-3.txt
40 _Sample-4.txt
71502 _Sample-5.txt

If I do the same, but for the files produced by running CellTagExtraction, CellTagMatrixCount on each file individually:

101731 scaf1120_collapsing.txt (_Sample-1)
49098 scaf1121_collapsing.txt (_Sample-2)
83778 scaf1122_collapsing.txt (_Sample-3)
36874 scaf1152_collapsing.txt (_Sample-4)
71502 scaf1153_collapsing.txt (_Sample-5)

As I mentioned above, the first and last files seem to have the same number of rows, but the middle three are way off and extremely low. I think this was part of the confusion I had from #6

Is this my data, or do you seem similar issues when running more than two BAM files using the current documentation?

For reference, here is my celltag code for running multiple bam files.

library(CellTagR)

ct35_celltag_0 <- CellTagObject("allruns", fastq.bam.directory = "ct35-all/")
ct35_celltag_1 <- CellTagExtraction(celltag.obj = ct35_celltag_0,
                                     celltag.version = "v1")

barcodes_files <- list("/data/capaldobj/CS026448_Beshiri_cell_tagging_2019_december_18/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/SCAF1120_35-1A-CTL/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",
                    "/data/capaldobj/CS026448_Beshiri_cell_tagging_2019_december_18/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/SCAF1152_CT35-1-CTL/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",
                    "/data/capaldobj/CS026448_Beshiri_cell_tagging_2019_december_18/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/SCAF1121_35-2A-CTL/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",
                    "/data/capaldobj/CS026448_Beshiri_cell_tagging_2019_december_18/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/SCAF1122_35-3A-CTL/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",
                    "/data/capaldobj/CS026448_Beshiri_cell_tagging_2019_december_18/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/SCAF1153_CT35-2-CTL/outs/filtered_feature_bc_matrix/barcodes.tsv.gz")


Barcode.Aggregate(barcodes_files, "./barcodes_all.tsv")

celltag_whitelist <- "/data/capaldobj/Cell_tagging_prelim_2019_December_09/v1_whitelist.csv"

ct35_celltag_2 <- CellTagMatrixCount(ct35_celltag_1, barcodes.file = "./barcodes_all.tsv")

ct35_celltag_3 <- CellTagDataForCollapsing(ct35_celltag_2, "./collapsing.txt")

system("~/starcode/starcode -s --print-clusters _Sample-1.txt > collapsing_result_Sample-1.txt")
system("~/starcode/starcode -s --print-clusters _Sample-2.txt > collapsing_result_Sample-2.txt")
system("~/starcode/starcode -s --print-clusters _Sample-3.txt > collapsing_result_Sample-3.txt")
system("~/starcode/starcode -s --print-clusters _Sample-4.txt > collapsing_result_Sample-4.txt")
system("~/starcode/starcode -s --print-clusters _Sample-5.txt > collapsing_result_Sample-5.txt")

ct35_celltag_4 <- CellTagDataPostCollapsing(ct35_celltag_3, collapsed.rslt.file = list.files(".", pattern = "collapsing_result_Sample-[1-5].txt$", full.names = T))

ct35_celltag_5 <- SingleCellDataBinatization(ct35_celltag_4, 2)

ct35_celltag_6 <- SingleCellDataWhitelist(ct35_celltag_5, celltag_whitelist)

ct35_celltag_7 <- MetricBasedFiltering(ct35_celltag_6, 20, comparison = "less")
ct35_celltag_8 <- MetricBasedFiltering(ct35_celltag_7, 2, comparison = "greater")

ct35_celltag_9 <- JaccardAnalysis(ct35_celltag_8)
ct35_celltag_9 <- CloneCalling(ct35_celltag_9, 0.7)

Subscript out of bounds when convertCellTagMatrix2LinkList data

Hello, I have tried to tracing my cells using only v1 library, but the error "Error in *tmp*[[jj]] : Subscript out of bounds" appears whenever I convert the matrix, I guess the function of convertCellTagMatrix2LinkList contains v2 and v3 libraries, but I don’t have them in my data. Can you tell me how to run a single library?

bam.test.obj <- CellTagMatrixCount(celltag.obj = bam.test.obj, barcodes.file = "./barcodes_all.tsv")

bam.test.obj <- CellTagMatrixCount(celltag.obj = bam.test.obj, barcodes.file = "./barcodes_all.tsv")
Warning message:
In [.data.table(alltagCounts, , :=((tagsRemove), NULL)) :
length(LHS)==0; no columns to delete or assign RHS to.
The above error was reported in this step,I think the number of rows may be exceed the boundary, so how can I deleted some number of rows to adapt to the conditions ? thanks

Error in running ./starcode code line

Hi,

I am getting this error in the code line below:

./starcode -s --print-clusters ~/Desktop/collapsing.txt> ~Desktop/collapsing_result.txt

Error: unexpected '/' in "./starcode -s --print-clusters ~/"

on MacOSX. Do you have any idea how I can solve it?
Thank you.

Question about percentile cutoffs in CellTagWhitelistFiltering function

First of all, thank you for this valuable package! We are beginning to use this tech in our lab quite frequently.

I wanted to re-plot the barcode rank / cutoff plot output during the CellTagWhitelistFiltering step, and noticed a discrepancy when I re-rendered the 90th percentile count cutoff. In CellTagWhitelistGeneration.R, the count cutoff is determined as follows:

count.cutoff <- quantile(count.sorted.table$Count, probs = percentile)
count.true.cut <- floor(count.cutoff/10)

I'm wondering what the rationale is behind dividing the count cutoff by 10? Likely there is something I'm missing.

On an unrelated note, is there any interest in switching to ggplot-based graphics? If so, I would be happy to submit a PR.

Error in JaccardAnalysis: negative length vectors are not allowed

Hi Wenjun,

I have come across another error. This time in the Jaccard function for v2. Are you able to help me investigate this? The error is:

Error in asMethod(object) : negative length vectors are not allowed
Calls: JaccardAnalysis -> as -> asMethod

It occurs in this line from the JaccardAnalysis() function: [email protected] <- as(Jac, "dgCMatrix")

As with the previous bug this step ran fine with v1.

Looking online it looks like one possibility is that the merged data.frame has exceeded the maximum length that is allowed for a vector in R 2^31 - 1.

If I run str(Jac) then this is the result:

"","x"
"1","Formal class 'dsTMatrix' [package ""Matrix""] with 7 slots"
"2"," ..@ i : int [1:1835604345] 0 0 1 0 1 2 0 1 2 3 ..."
"3"," ..@ j : int [1:1835604345] 0 1 1 2 2 2 3 3 3 3 ..."
"4"," ..@ Dim : int [1:2] 60590 60590"
"5"," ..@ Dimnames:List of 2"
"6"," .. ..$ : chr [1:60590] ""Sample-10_AAACCCAAGCGTACAG-1"" ""Sample-10_AAACCCATCTAAGCCA-1"" ""Sample-10_AAACGAAAGGGAGGAC-1"" ""Sample-10_AAACGAAGTTAACCTG-1"" ..."
"7"," .. ..$ : chr [1:60590] ""Sample-10_AAACCCAAGCGTACAG-1"" ""Sample-10_AAACCCATCTAAGCCA-1"" ""Sample-10_AAACGAAAGGGAGGAC-1"" ""Sample-10_AAACGAAGTTAACCTG-1"" ..."
"8"," ..@ x : num [1:1835604345] 1 0 1 0 0 1 0 0 0 1 ..."
"9"," ..@ uplo : chr ""U"""
"10"," ..@ factors : list()"

The binary data in Jac@x has 1,835,604,345 entries, which is close to the 2^31 - 1 limit (2,147,483,647). This means that it could plausibly exceed the limit when added to the bam.test.obj that includes the v1 data, but I can't work out if this is the issue. If it is, then has it happened because of erroneous duplication of the data? Or do I need to restart the analysis with data from a subset of my samples?

Thank you,

James

allow .fq file extension

Hello. Firstly thanks for a great tool :)

One very brief feature request that might catch others out as it did for us - we always use .fq rather than .fastq as an extension on our sequencing data and it took us a while to realise that this was why we couldn't read in the file to create the whitelist.

Barcode extraction and clone counting

I’m finding the CellTag system very easy to use (thank you!), but have run into hopefully a few small but solvable problems.

Pertinent information – I am using the CellTag V2 library and performed bulk sequencing on my cells to confirm that the barcode complexity of my cells matches that of the starting library. Having done this, I’m not concerned that my issues below are related to reduced barcode number after transduction. I performed single cell RNAseq on three samples – a day 0, day 7 and day 30 timepoint and am currently working on extracting/analyzing the CellTags. My cells had been transplanted in mice, so in order to reisolate them, I sorted for the GFP-expressing cells (thus also still retaining our CellTags).

Issue 1 - When I execute the barcode extraction for my Day 30 sample, I lose ~75% of the cells sequenced. This is in contrast to my Day 0 and Day 7 samples, where the presence of CellTags does filter out some cells, but nowhere near the majority. I'm confused why there would be so much loss in this specific sample when the number of cells sequenced are similar between Day 0 and Day 30, as are the file sizes (rough proxy for the reads) when I've executed the mapping steps just prior to extraction. I'm also not as worried about experimental loss of barcode because these cells were sorted off of the GFP transgene where the CellTag exists, so they should in theory all have at least one CellTag. Any thoughts on why the extraction might be faulty? Or how I might be able to check/troubleshoot this?

Issue 2 - I seem to have some sequence being counted in almost all of the cells. Any idea where this is coming from and how to remove it? I'm guessing this is an error on my end (mapping error? whitelist error?) since it would be incredibly unlikely that one single barcode exists in all of the cells. Have you ever seen something like this before?

Screen Shot 2022-12-02 at 9 36 31 AM

Issue 3 - The number of clonal families that I find in my samples is quite low, but I also notice that these families can't exist as single cells (always have 2 or more cells). I think it's possible that there do exist single clones within my population of cells and at present I'm filtering these out. Is there any way to edit the code to allow for this possibility?

Thank you so much for your time and help! Happy to clarify any of the above. Any and all suggestions are incredibly welcome!

Thanks,
Sarah

error in Network construction

Thanks for your terrific software.

I have an experiment where I used V1 and V2 at two different time points to infect cultured cells. My celltag object was built running V1 and then V2 analyses sequentially through clone calling and contains both versions in the single object it appears. Jaccard built with fast = TRUE.

I get an error when I try to call the convertCellTagMatrix2LinkList = "Error in 'tmp'[[jj]] : subscript out of bounds. I don't get the error with the bam.test.obj file.

I think it may relate to the "celltag.aggr.final" slot in my object is empty. I don't know what command fills this slot. Any ideas?

I don't know if it matters but I used the old workflow whitelists as generating whitelists from the fast took forever on my server and never completed. The distribution of clones seems reasonable.

I appreciate in advance any assistance.

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.4 (Ootpa)

Matrix products: default
BLAS: /sw/pkgs/arc/stacks/gcc/10.3.0/R/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /sw/pkgs/arc/stacks/gcc/10.3.0/R/4.2.0/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 tools stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] Rsamtools_2.12.0 Biostrings_2.64.1 XVector_0.36.0
[4] GenomicRanges_1.48.0 GenomeInfoDb_1.32.4 IRanges_2.30.1
[7] S4Vectors_0.34.0 BiocGenerics_0.42.0 CellTagR_0.0.0.9000
[10] proxyC_0.3.3 networkD3_0.4 foreach_1.5.2
[13] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[16] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
[19] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2
[22] tidyverse_2.0.0 Matrix_1.5-4.1 reshape_0.8.9
[25] plyr_1.8.8 data.table_1.14.8 igraph_1.4.3
[28] corrplot_0.92 proxy_0.4-27 gridExtra_2.3

loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.6.2 usethis_2.1.6
[4] devtools_2.4.5 profvis_0.3.8 utf8_1.2.3
[7] R6_2.5.1 colorspace_2.1-0 urlchecker_1.0.1
[10] withr_2.5.0 tidyselect_1.2.0 prettyunits_1.1.1
[13] processx_3.8.1 compiler_4.2.0 cli_3.4.1
[16] scales_1.2.1 callr_3.7.3 digest_0.6.31
[19] pkgconfig_2.0.3 htmltools_0.5.5 sessioninfo_1.2.2
[22] fastmap_1.1.1 htmlwidgets_1.6.2 rlang_1.1.1
[25] rstudioapi_0.14 shiny_1.7.4 generics_0.1.3
[28] BiocParallel_1.30.4 RCurl_1.98-1.12 magrittr_2.0.3
[31] GenomeInfoDbData_1.2.8 Rcpp_1.0.10 munsell_0.5.0
[34] fansi_1.0.4 lifecycle_1.0.3 stringi_1.7.12
[37] zlibbioc_1.42.0 pkgbuild_1.4.0 grid_4.2.0
[40] parallel_4.2.0 promises_1.2.0.1 crayon_1.5.2
[43] miniUI_0.1.1.1 lattice_0.20-45 hms_1.1.3
[46] ps_1.7.5 pillar_1.9.0 codetools_0.2-18
[49] pkgload_1.3.2 glue_1.6.2 remotes_2.4.2
[52] BiocManager_1.30.20 RcppParallel_5.1.7 vctrs_0.6.2
[55] tzdb_0.4.0 httpuv_1.6.11 gtable_0.3.3
[58] cachem_1.0.8 mime_0.12 xtable_1.8-4
[61] later_1.3.1 iterators_1.0.14 memoise_2.0.1
[64] timechange_0.2.0 ellipsis_0.3.2

whitelist error

Hi,

I am trying to use the cellTagR to call clones on our three samples. I started my analysis on 3 sample in one go from a bam files,
at one of the step, I need the whitelist file for v1 sample. can you please let me know how can i get that for all 3 samples.

Thanks

how long will it take to generate the sparse count matrix usually

hello,
i am trying CelltagR package with my own data. My object is 665MB in size, and i am stuck in this line "bam.test.obj <- CellTagMatrixCount(celltag.obj = bam.test.obj, barcodes.file = "./barcodes.tsv")"
there is no warning and error.
how long will it take to generate the matrix? i am worried that i may have messed up.
thank you

Error in CellTagMatrixCount when generating the count matrix for v2

Hi, I am getting this error:

Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
invalid character indexing
Calls: CellTagMatrixCount ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_cols -> intI

In addition to the expected warning message at this stage.

As far as I can tell it appears to be related to the sparseMatrix function or perhaps SetCellTagCurrentVersionWorkingMatrix, although these steps work when isolated from CellTagMatrixCount.

The function works if I start from a new bam.test.obj rather than the one generated from v1.

Can you help?

Thanks!!

Error in JaccardAnalysis

Hello!

I have been coming across this error when running the JaccardAnalysis() step on the new CellTag workflow.

Error in as(Jac, "dsTMatrix") :
no method or default for coercing “simil” to “dsTMatrix”

I initially thought this was a size issue, although I was able to reproduce the same error with a very small subset of my data. Notably, I am using the "slow" version of the function. I tried a few troubleshooting steps from Issue #15, including migrating my object to a new object with no success. I re-installed CellTagR when I first got the error, and that also did not help.

Please let me know how to best address this error.

Thank you!

similarities of the cell tag barcodes within clones

Hi,

Thanks for generating the powerful celltag tools! we are now analyzing our single-cell RNA data!

We only used V1 celltags so we expect to have only clonal analysis.

In the final results, we found that the clonally related cells did not have the exact same barcodes. For example, each of the three cells within a clone may have 2, 3, 4 barcodes, but they did share 2 same barcodes. Is this normal?

We went through the main steps in https://github.com/morris-lab/CellTagR, (note we did not do the following: filter transgene reads or CellTag Error Correction).

Ps: I think one important step might be the following one, but I cannot change the correlation.cutoff to 1, when I change the number from 0.7 to 0.99999999, the clones got smaller, but issues remained:

#Call clones
bam.test.obj <- CloneCalling(celltag.obj = bam.test.obj, correlation.cutoff=0.7)

I am now writing a short code to look cells with the exact same barcodes.

Best,

Li

Paired-end reads & multiple fastq files

Hello!
I was wondering how I should handle fastq files from paired-end sequencing for the whitelisting of cell tags.
Should I consider only the cell tags found in one of the pair of fastq files or all the cell tags? I'm a little confused.
Also I'd appreciate if you could suggest what I should do for multiple fastq files for one sample. Can they be integrated together to a single celltag object?

Thank you!

Merging two CellTag objects?

Hi, I've just realized that CellRanger Aggregate does not output a single bam file of aggregated samples.

I have two different 10X runs (different samples) that should have overlapping Celltag combinations as each is from a common original Celltag transduction.

The goal would be to build a single network across all three samples. Would you have any recommendations as the best way to do this?

I could rerun cellranger count on both samples together for a single bam file, but I think I would then lose the source identity for any cells with a clashing 10X barcode.

Thank you!

SRA data problem

HI~
Recently, I came across an issue when downloading the source data to do some deeper analysis about tracing data. The problem is: e.g. SRR7347028 seems only contain one file not common 3 files when "fastq-dump --split-3 SRR7347028.sra" (resulted only SRR7347028.fastq.gz) or "fastq-dump --split-files SRR7347028.sra" (resulted only SRR7347028_1.fastq.gz).
Screenshot 2023-04-27 at 20 55 08

And this sample was described as follows:

Screenshot 2023-04-27 at 20 51 06

Also I had found the Speicific Notice about 3'-end sequencing in your protocol:
Screenshot 2023-04-27 at 20 53 41

Could you please give me some hints about this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.