GithubHelp home page GithubHelp logo

costalab / scmega Goto Github PK

View Code? Open in Web Editor NEW
32.0 6.0 2.0 141.71 MB

scMEGA: Single-cell Multiomic Enhancer-based Gene regulAtory network inference

Home Page: https://costalab.github.io/scMEGA

License: Other

R 96.71% C++ 3.11% Shell 0.18%
grn single-cell multiomics

scmega's People

Contributors

chengmingbo avatar jsnagai avatar lzj1769 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

scmega's Issues

get cell type-specific peak

Hi!
now I have a file that specifies the cell type,i wonder how to get the cell type specific peak file?

image

thanks!

Install error

Hi, @

We were very interested in Mega, but we had problems with the installation process. As follows:

`devtools: : install_github (" CostaLab/scMEGA ")

Downloading GitHub repo CostaLab/scMEGA@HEAD

Skipping 6 packages not available: SummarizedExperiment, S4Vectors, GenomicRanges, IRanges, ComplexHeatmap, destiny

✔ checking for file '/ home/data/ssy342 / Rtmp/RtmpBelAZf remotes1f338c5ddde3d1 / CostaLab - scMEGA - 205 b3ca/DESCRIPTION'...

─ preparing 'scMEGA:

✔ Checking DESCRIPTION Meta-information...

─ cleaning SRC

─ checking for LF the line - endings in the source and the make files and shell scripts

─ checking for empty or unneeded directories

─ building 'scMEGA_0. 2.0. Tar. Gz'

Installing package into '/ home/data/ssy342 / R/x86_64 - PC - Linux - the gnu library / 4.1'

(as the 'lib' is unspecified)

ERROR: dependency 'destiny' is not available for package 'scMEGA'

  • o '/ home/data/ssy342 / R/x86_64 - PC - Linux - the gnu library / 4.1 / scMEGA'

Warning message:

In i.p(...) :

The installation of package '/ home/data/ssy342 / Rtmp/RtmpBelAZf file1f338c54f19dd1 / scMEGA_0. 2.0. Tar. Gz' had non - zero exit Status"`

When we try to install using "devtools::install_local("./scMEGA-0.2.0.tar.gz"), we still get the same previous error.

So how can we solve this problem? Hope to get your help.

ArchR input

Hi! Congratulations on the paper and the package, it looks like it's going to be very useful.

I would like to run scMEGA with my 10X multiome data, and we normally do the RNA analysis with Seurat, and ATAC with ArchR. Your tutorial has a Seurat object for ATAC (probably generated with Signac). Could you please add specific information on how to generate the Seurat object from an ArchR project, rather than Signac? We prefer ArchR for ATAC analysis and I can't find an ArchR function to generate a Seurat object with ATAC peaks/counts.

Thank you so much!

get the gene activity matrix from ArchR

hi!
I would like to know how to get the gene activity matrix in the example from ArchR, since I used the function gene activity matrix
to get a different object like this:

gene.activity
class: SummarizedExperiment
dim: 14103 6791
metadata(0):
assays(1): GeneScoreMatrix
rownames: NULL
rowData names(6): seqnames start ... name idx
colnames(6791): #AAACGAAAGGATCCTT-1
#AAACGAAAGGCATGCA-1 ... #TTTGTGTTCGATCGCG-1
#TTTGTGTTCGATCTTT-1
colData names(15): DoubletEnrichment DoubletScore ... Clusters Celltype

Using scMEGA for two treatment groups?

Thanks for creating this wonderful tool! In the vignettes, you add the trajectory before selecting TFs. In my case, I'm not as interested in comparing between cell types but rather a control and treated group. Is this possible and if so, how would I go about doing that? Am I able to skip the trajectory step?

Big data takes too long for cells to pair

hi! Thanks for this great work!
Since this is the data of merge in three periods, and I was using so many cells, this step took so long to get nowhere

df.pair <- PairCells(object = coembed.sub, reduction = "harmony",pair.by = "tech", ident1 = "ATAC", ident2 = "RNA")
Getting dimensional reduction data for pairing cells...
Pairing cells using geodesic mode...
Constructing KNN graph for computing geodesic distance ..
Computing graph-based geodesic distance ..
KNN subgraphs detected: 1
Skipping subgraphs with either ATAC/RNA cells fewer than: 50
Pairing cells for subgraph No.1
Total ATAC cells in subgraph: 26748
Total RNA cells in subgraph: 29030
Subgraph size: 26748
Search threshold being used: 10700
Constructing KNN based on geodesic distance to reduce search pairing search space
Number of cells being paired: 26748 ATAC and 26748 RNA cells
Determing pairs through optimized bipartite matching ..

I would like to know whether I can get the obj.pair of each period first and then merge them together

SelectGenes use other genome

Hi!
i noticed that SelectGenes Available genome are: hg19, hg38, mm9, and mm10
can i use the genomes of other species?
If so, how exactly should I do it

problem with SelectTFs

Hello there
I'm tryingto run ScMega on 10X multiome RNA+ATAC data obtained in Rats tissus. When I run the :
res <- SelectTFs(object = objG,
tf.assay = "chromvar",
rna.assay = "RNA",
atac.assay = "ATAC",
trajectory.name = "Trajectory",
return.heatmap = TRUE,
cor.cutoff = 0.1)
I get
Avis : No layers found matching search pattern provided
Error in GetAssayData():
! No layers are found
Here are the backtrace :
Backtrace:

  1. └─scMEGA::SelectTFs(...)
  2. ├─base::suppressMessages(...)
  3. │ └─base::withCallingHandlers(...)
  4. └─scMEGA::GetTrajectory(...)
  5. ├─SeuratObject::GetAssayData(object, assay = assay, slot = slot)
    
  6. └─SeuratObject:::GetAssayData.Seurat(object, assay = assay, slot = slot)
    
  7.   ├─SeuratObject::GetAssayData(object = object[[assay]], layer = layer)
    
  8.   └─SeuratObject:::GetAssayData.StdAssay(object = object[[assay]], layer = layer)
    
  9.     └─rlang::abort("No layers are found")
    

Can you help me, please ? Is it because I'm working with the rat genome? Or is it something else entirely?
Best
David

Motif matching when using mouse data

Hi! I was running into an issue when selecting TFs in my mouse data, because the motif names did not match many genes in the gene exp data. I ended up altering the SelectTFs and GetTFGeneCorrelation code to

rownames(trajMM) <- stringr::str_to_title(object@assays[[atac.assay]]@[email protected])

(although maybe a biomaRt matching or something would be more accurate) and am getting better results, so I thought I'd note it!

AddTargetAssay

Hi

I am trying to run
pbmc.t.cells <- AddTargetAssay(object = pbmc.t.cells, df.grn = df.grn2)

Instead of me receiving the warning like in the vignettes

Warning in if (is.na(df.grn)) {: the condition has length > 1 and only the first element will be used

The program is returning:

Error in if (is.na(df.grn)) { : the condition has length > 1

Do you have any idea what is going on?

Thank you,
Debora

snATAC data does not have a $Harmony dimension reduced matrix

Hi!
I noticed that I needed to use $Harmony in the tutorial to prepare the data

#add dimension reduced matrix
harmony_matDR <- proj@reducedDims$Harmony$matDR

but I only had $UMAP in my data ,because my samples are not duplicated.
In this case, can I still use this tutorial? How can I get the final obj.atac?

error in SelectTFs or GetTrajectory

Dear scMEGA team,

Thank you for developing this useful package.
I am working with Arabidopsis datasets - GSE173834 for the scATAC data and my own scRNA-seq data. When I integrated them, at the step of SelectTFs I got this error

Creating Trajectory Group Matrix..
Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions

When I try to debug, it looks like, under GetTrajectory, Matrix::rowMeans fails. However, the cell_names has multiple cell names, so this should not be a vector. I will be grateful if you could help me on that. Please let me know what details you need, I will share.

Regards,
Rahul

Use obj.pair to show the expression of marker gene

Hi!
I use obj.pair to look at the expression of marker gene.
I want to know whether the gene expression matrix in scRNA or the gene score matrix in scATAC is used here ,or the expression quantity after algorithm integration?

obj.pair
An object of class Seurat
67334 features across 5396 samples within 2 assays
Active assay: RNA (17352 features, 0 variable features)
1 other assay present: ATAC
4 dimensional reductions calculated: pca, umap, harmony, umap_harmony

p <- FeaturePlot(obj.pair, features = c("COL1A1"),reduction ="umap_harmony",min.cutoff = "q10", max.cutoff = "q90")

CoembedData - Error: None of the provided refdata elements are valid.

Hi,

Thanks for maintaining a really nice R package. I am trying to use the CoembedData function as follows:

obj.coembed <- CoembedData(
  RNA,
  ATAC, 
  gene.activities, 
  weight.reduction = "umap", 
  verbose = T
)

with the following Seurat objects:

RNA
An object of class Seurat 
29875 features across 4224 samples within 2 assays 
Active assay: integrated (2000 features, 2000 variable features)
 2 layers present: data, scale.data
 1 other assay present: RNA
 2 dimensional reductions calculated: pca, umap


ATAC
An object of class Seurat 
149666 features across 5309 samples within 2 assays 
Active assay: peaks (110042 features, 110041 variable features)
 2 layers present: counts, data
 1 other assay present: RNA
 2 dimensional reductions calculated: lsi, umap

but I get the following error messages:

Performing data integration using Seurat...
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Centering and scaling data matrix
  |====================================================================================================| 100%
Running CCA
Merging objects
Finding neighborhoods
Finding anchors
	Found 10350 anchors
Filtering anchors
	Retained 2825 anchors
Warning: Please provide a matrix that has the same number of columns as the number of reference cells used in anchor finding.
Number of columns in provided matrix : 2976
Number of columns required           : 4224
Skipping element 1.
Error: None of the provided refdata elements are valid.
In addition: Warning messages:
1: In LayerData.Assay5(object = assays[[i]], layer = lyr, fast = TRUE) :
  multiple layers are identified by counts.1 counts.2
 only the first layer is used
2: In LayerData.Assay5(object = object[[assay]], layer = layer, ...) :
  multiple layers are identified by data.1 data.2
 only the first layer is used

I think this error likely stems from some missing metadata processing step shown in the pre-processing script. However, I am unable to follow along with the data processing script because I am processing my ATAC data with Signac, not ArchR (I am working with plant datasets and have been unsuccessful generating the input genome objects required by ArchR). Do you think this is the most likely cause of this error - or am I missing some other crucial component. Thank you very much for your help, please let me know if you need additional info, I'll include my sessionInfo below.

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scMEGA_1.0.2            rtracklayer_1.60.1      hdf5r_1.3.8             GenomicFeatures_1.52.2  AnnotationDbi_1.62.2   
 [6] Biobase_2.60.0          GenomicRanges_1.52.1    GenomeInfoDb_1.36.4     IRanges_2.34.1          S4Vectors_0.38.2       
[11] BiocGenerics_0.46.0     patchwork_1.1.3         readr_2.1.4             readxl_1.4.3            tidyr_1.3.0            
[16] ggplot2_3.4.4           dplyr_1.1.3             magrittr_2.0.3          Signac_1.11.0           Seurat_4.9.9.9081      
[21] SeuratObject_4.9.9.9093 sp_2.1-1               

loaded via a namespace (and not attached):
  [1] destiny_3.14.0              matrixStats_1.0.0           spatstat.sparse_3.0-2       bitops_1.0-7               
  [5] httr_1.4.7                  RColorBrewer_1.1-3          doParallel_1.0.17           tools_4.3.1                
  [9] sctransform_0.4.1           utf8_1.2.3                  R6_2.5.1                    lazyeval_0.2.2             
 [13] uwot_0.1.16                 withr_2.5.1                 prettyunits_1.2.0           gridExtra_2.3              
 [17] progressr_0.14.0            factoextra_1.0.7            cli_3.6.1                   spatstat.explore_3.2-3     
 [21] fastDummies_1.7.3           labeling_0.4.3              robustbase_0.99-0           spatstat.data_3.0-1        
 [25] proxy_0.4-27                ggridges_0.5.4              pbapply_1.7-2               Rsamtools_2.16.0           
 [29] R.utils_2.12.2              parallelly_1.36.0           TTR_0.24.3                  rstudioapi_0.15.0          
 [33] RSQLite_2.3.1               generics_0.1.3              BiocIO_1.10.0               ica_1.0-3                  
 [37] spatstat.random_3.1-6       vroom_1.6.4                 car_3.1-2                   Matrix_1.6-1.1             
 [41] fansi_1.0.5                 abind_1.4-5                 R.methodsS3_1.8.2           lifecycle_1.0.3            
 [45] scatterplot3d_0.3-44        yaml_2.3.7                  carData_3.0-5               SummarizedExperiment_1.30.2
 [49] BiocFileCache_2.8.0         Rtsne_0.16                  grid_4.3.1                  blob_1.2.4                 
 [53] promises_1.2.1              crayon_1.5.2                miniUI_0.1.1.1              lattice_0.20-41            
 [57] cowplot_1.1.1               KEGGREST_1.40.1             pillar_1.9.0                boot_1.3-28                
 [61] rjson_0.2.21                future.apply_1.11.0         codetools_0.2-18            fastmatch_1.1-4            
 [65] leiden_0.4.3                glue_1.6.2                  pcaMethods_1.92.0           data.table_1.14.8          
 [69] remotes_2.4.2.1             vcd_1.4-11                  vctrs_0.6.4                 png_0.1-8                  
 [73] spam_2.9-1                  cellranger_1.1.0            gtable_0.3.4                cachem_1.0.8               
 [77] S4Arrays_1.0.6              mime_0.12                   tidygraph_1.2.3             RcppEigen_0.3.3.9.3        
 [81] survival_3.5-5              SingleCellExperiment_1.22.0 RcppRoll_0.3.0              pheatmap_1.0.12            
 [85] iterators_1.0.14            ellipsis_0.3.2              fitdistrplus_1.1-11         ROCR_1.0-11                
 [89] nlme_3.1-162                xts_0.13.1                  bit64_4.0.5                 progress_1.2.2             
 [93] filelock_1.0.2              RcppAnnoy_0.0.21            rprojroot_2.0.3             irlba_2.3.5.1              
 [97] KernSmooth_2.23-20          colorspace_2.1-0            DBI_1.1.3                   nnet_7.3-18                
[101] smoother_1.1                tidyselect_1.2.0            processx_3.8.2              bit_4.0.5                  
[105] compiler_4.3.1              curl_5.1.0                  xml2_1.3.5                  desc_1.4.2                 
[109] DelayedArray_0.26.7         plotly_4.10.2               scales_1.2.1                hexbin_1.28.3              
[113] DEoptimR_1.1-3              lmtest_0.9-40               callr_3.7.3                 rappdirs_0.3.3             
[117] stringr_1.5.0               digest_0.6.33               goftest_1.2-3               spatstat.utils_3.0-3       
[121] XVector_0.40.0              htmltools_0.5.6.1           pkgconfig_2.0.3             MatrixGenerics_1.12.3      
[125] dbplyr_2.3.4                fastmap_1.1.1               ggthemes_4.2.4              rlang_1.1.1                
[129] htmlwidgets_1.6.2           shiny_1.7.5.1               farver_2.1.1                zoo_1.8-12                 
[133] jsonlite_1.8.7              BiocParallel_1.34.2         R.oo_1.25.0                 RCurl_1.98-1.12            
[137] GenomeInfoDbData_1.2.10     dotCall64_1.1-0             munsell_0.5.0               Rcpp_1.0.11                
[141] viridis_0.6.4               reticulate_1.34.0           stringi_1.7.12              ggraph_2.1.0               
[145] zlibbioc_1.46.0             MASS_7.3-58.3               plyr_1.8.9                  pkgbuild_1.4.2             
[149] parallel_4.3.1              listenv_0.9.0               ggrepel_0.9.4               deldir_1.0-9               
[153] graphlayouts_1.0.1          Biostrings_2.68.1           splines_4.3.1               tensor_1.5                 
[157] hms_1.1.3                   ps_1.7.5                    ranger_0.15.1               igraph_1.5.1               
[161] spatstat.geom_3.2-7         RcppHNSW_0.5.0              reshape2_1.4.4              biomaRt_2.56.1             
[165] XML_3.99-0.14               laeken_0.5.2                tweenr_2.0.2                tzdb_0.4.0                 
[169] foreach_1.5.2               httpuv_1.6.11               VIM_6.2.2                   RANN_2.6.1                 
[173] purrr_1.0.2                 polyclip_1.10-6             future_1.33.0               scattermore_1.2            
[177] ggforce_0.4.1               xtable_1.8-4                restfulr_0.0.15             e1071_1.7-13               
[181] RSpectra_0.16-1             later_1.3.1                 viridisLite_0.4.2           class_7.3-21               
[185] tibble_3.2.1                memoise_2.0.1               GenomicAlignments_1.36.0    cluster_2.1.4              
[189] ggplot.multistats_1.0.0     globals_0.16.2 

NetCentPlot error

Hi! thanks for the jobs
I've got the object to build the network on

head(df.grn2)
tf gene weights
77 ATF4 GABARAPL1 0.8221601
85 ATF4 GRIK3 0.8021985
100 ATF4 KIT 0.8130254
102 ATF4 KRAS 0.8125377
166 ATF4 SEC22B 0.8518962
223 ATF6 ACAP3 0.8103851
V(netobj)$type <- ifelse(V(V(netobj)$type <- ifelse(V(netobj)$name %in% dfgrn$tf,"TF/Gene","Gene"))$name %in% df.grn2$tf,"TF/Gene","Gene")
netobj <- graph_from_data_frame(df.grn2,directed = TRUE)

Looking at the object, it looks normal

netobj
IGRAPH 54cba5f DN-B 355 788 --

  • attr: name (v/c), type (v/c), weights (e/n)
  • edges from 54cba5f (vertex names):
    [1] ATF4 ->GABARAPL1 ATF4 ->GRIK3 ATF4 ->KIT ATF4 ->KRAS
    [5] ATF4 ->SEC22B ATF6 ->ACAP3 BACH2->ALX1 BACH2->ANKH
    [9] BACH2->CDKN1C BACH2->CRISPLD1 BACH2->EBF2 BACH2->ETV6
    [13] BACH2->GLIPR1 BACH2->HECTD2 BACH2->IGFBP7 BACH2->ITGB8
  • ... omitted several edges
    However, when I run

p <- NetCentPlot(netobj, "RUNX1")
Error in layout_with_focus(graph, v = focus, weights = weights, iter = niter, :
g must be a connected graph.

I checked netobj

is_connected(netobj, mode = "weak")
[1] FALSE

I don't know how do I get the final NetCentPlot, Can you offer any help?
thanks a lot!

PairCells error

I am trying your vingette with a Seurat object which is scRNA + scATAC combined. But I get the following error message when I run PairCells(). My scMEGA part of my code is below:

coembed.sub <- RunDiffusionMap(coembed_harmon2, reduction = "harmony")

cols <- ArchR::paletteDiscrete([email protected][, "clusters_merge"])

p1 <- DimPlot(coembed.sub, group.by = "clusters_merge", label = TRUE,
reduction = "dm", shuffle = TRUE, cols = cols) +
xlab("DC 1") + ylab("DC 2")

p1

p2 <- DimPlot(coembed.sub, group.by = "cm_clusters", label = TRUE,
reduction = "dm", shuffle = TRUE, cols = cols) +
xlab("DC 1") + ylab("DC 2")

p2

DimPlot(coembed.sub, reduction = "dm",
group.by = "clusters_merge", split.by = "assay", cols = cols)

DimPlot(coembed.sub, reduction = "dm",
group.by = "cm_clusters", split.by = "assay", cols = cols)

df.pair <- PairCells(object = coembed.sub, reduction = "harmony",
pair.by = "assay", ident1 = "ATAC", ident2 = "RNA")

Getting dimensional reduction data for pairing cells...
Pairing cells using geodesic mode...
Constructing KNN graph for computing geodesic distance ..
Error in diag<-(*tmp*, value = 0) :
only matrix diagonals can be replaced

If you could help I would really appreciate it.
Thanks
Chris

The use of Co-embedding function

Hi, sorry to bother you. When I run Co-embedding(), I have a question. If I don't ran harmony or other methods on snRNA-seq and snATAC-seq datasets for removing batch effect, what weight.reduction value should I use?

Do you have any suggestions? Any help would be greatly appreciated.

GRNSpatialPlot

Hello scMEGA team,

Thank you for developed this tool.
I am trying to plot my samples with GRNSpatialPlot, but the images are coming stretched and deformed. The same happens on Seurat command SpatialFeaturePlot, but I just add crop=F and fixed.
For the GRNSpatialPlot I didn't find a similar option.

Best,
Debora

Error in Running SelectTFs

Hi,Sorry to bother you. When I run SelectTFs function, I met the following error: Error in cor.test.default(mat1[x, ], mat2[x, ]) : not enough finite observations.
I don't know what went wrong, any suggestions?
Thanks!

Error for Install package

Hi! Is there any other way to install scMEGA package?
There are many package conflicts when I install package, even if I create a new conda environment.
Hope to your reply. Thanks for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.