costalab / scmega Goto Github PK
View Code? Open in Web Editor NEWscMEGA: Single-cell Multiomic Enhancer-based Gene regulAtory network inference
Home Page: https://costalab.github.io/scMEGA
License: Other
scMEGA: Single-cell Multiomic Enhancer-based Gene regulAtory network inference
Home Page: https://costalab.github.io/scMEGA
License: Other
Hi, @
We were very interested in Mega, but we had problems with the installation process. As follows:
`devtools: : install_github (" CostaLab/scMEGA ")
Downloading GitHub repo CostaLab/scMEGA@HEAD
Skipping 6 packages not available: SummarizedExperiment, S4Vectors, GenomicRanges, IRanges, ComplexHeatmap, destiny
✔ checking for file '/ home/data/ssy342 / Rtmp/RtmpBelAZf remotes1f338c5ddde3d1 / CostaLab - scMEGA - 205 b3ca/DESCRIPTION'...
─ preparing 'scMEGA:
✔ Checking DESCRIPTION Meta-information...
─ cleaning SRC
─ checking for LF the line - endings in the source and the make files and shell scripts
─ checking for empty or unneeded directories
─ building 'scMEGA_0. 2.0. Tar. Gz'
Installing package into '/ home/data/ssy342 / R/x86_64 - PC - Linux - the gnu library / 4.1'
(as the 'lib' is unspecified)
ERROR: dependency 'destiny' is not available for package 'scMEGA'
Warning message:
In i.p(...) :
The installation of package '/ home/data/ssy342 / Rtmp/RtmpBelAZf file1f338c54f19dd1 / scMEGA_0. 2.0. Tar. Gz' had non - zero exit Status"`
When we try to install using "devtools::install_local("./scMEGA-0.2.0.tar.gz"), we still get the same previous error.
So how can we solve this problem? Hope to get your help.
Hi! Congratulations on the paper and the package, it looks like it's going to be very useful.
I would like to run scMEGA with my 10X multiome data, and we normally do the RNA analysis with Seurat, and ATAC with ArchR. Your tutorial has a Seurat object for ATAC (probably generated with Signac). Could you please add specific information on how to generate the Seurat object from an ArchR project, rather than Signac? We prefer ArchR for ATAC analysis and I can't find an ArchR function to generate a Seurat object with ATAC peaks/counts.
Thank you so much!
hi!
I would like to know how to get the gene activity matrix in the example from ArchR, since I used the function gene activity matrix
to get a different object like this:
gene.activity
class: SummarizedExperiment
dim: 14103 6791
metadata(0):
assays(1): GeneScoreMatrix
rownames: NULL
rowData names(6): seqnames start ... name idx
colnames(6791): #AAACGAAAGGATCCTT-1
#AAACGAAAGGCATGCA-1 ... #TTTGTGTTCGATCGCG-1
#TTTGTGTTCGATCTTT-1
colData names(15): DoubletEnrichment DoubletScore ... Clusters Celltype
Thanks for creating this wonderful tool! In the vignettes, you add the trajectory before selecting TFs. In my case, I'm not as interested in comparing between cell types but rather a control and treated group. Is this possible and if so, how would I go about doing that? Am I able to skip the trajectory step?
hi! Thanks for this great work!
Since this is the data of merge in three periods, and I was using so many cells, this step took so long to get nowhere
df.pair <- PairCells(object = coembed.sub, reduction = "harmony",pair.by = "tech", ident1 = "ATAC", ident2 = "RNA")
Getting dimensional reduction data for pairing cells...
Pairing cells using geodesic mode...
Constructing KNN graph for computing geodesic distance ..
Computing graph-based geodesic distance ..
KNN subgraphs detected: 1
Skipping subgraphs with either ATAC/RNA cells fewer than: 50
Pairing cells for subgraph No.1
Total ATAC cells in subgraph: 26748
Total RNA cells in subgraph: 29030
Subgraph size: 26748
Search threshold being used: 10700
Constructing KNN based on geodesic distance to reduce search pairing search space
Number of cells being paired: 26748 ATAC and 26748 RNA cells
Determing pairs through optimized bipartite matching ..
I would like to know whether I can get the obj.pair of each period first and then merge them together
Hi!
i noticed that SelectGenes
Available genome are: hg19, hg38, mm9, and mm10
can i use the genomes of other species?
If so, how exactly should I do it
Hello there
I'm tryingto run ScMega on 10X multiome RNA+ATAC data obtained in Rats tissus. When I run the :
res <- SelectTFs(object = objG,
tf.assay = "chromvar",
rna.assay = "RNA",
atac.assay = "ATAC",
trajectory.name = "Trajectory",
return.heatmap = TRUE,
cor.cutoff = 0.1)
I get
Avis : No layers found matching search pattern provided
Error in GetAssayData()
:
! No layers are found
Here are the backtrace :
Backtrace:
▆
├─SeuratObject::GetAssayData(object, assay = assay, slot = slot)
└─SeuratObject:::GetAssayData.Seurat(object, assay = assay, slot = slot)
├─SeuratObject::GetAssayData(object = object[[assay]], layer = layer)
└─SeuratObject:::GetAssayData.StdAssay(object = object[[assay]], layer = layer)
└─rlang::abort("No layers are found")
Can you help me, please ? Is it because I'm working with the rat genome? Or is it something else entirely?
Best
David
With the new version of Seurat, some functions are no longer working as expected. For example GetTrajectory
Dear author,
I have noted that the script of cleaning the data and preparing neccessary objects is found in https://costalab.ukaachen.de/open_data/scMEGA/Fibroblast/01_prepare_data.html. However, I failed to identify the project and required folder ("../../../VisiumHeartRevision/IntegrativeAnalysis/Fibroblast/data/snATAC") in the tutorial. Where should I find that?
Thank you.
Hi! I was running into an issue when selecting TFs in my mouse data, because the motif names did not match many genes in the gene exp data. I ended up altering the SelectTFs and GetTFGeneCorrelation code to
rownames(trajMM) <- stringr::str_to_title(object@assays[[atac.assay]]@[email protected])
(although maybe a biomaRt matching or something would be more accurate) and am getting better results, so I thought I'd note it!
Hi
I am trying to run
pbmc.t.cells <- AddTargetAssay(object = pbmc.t.cells, df.grn = df.grn2)
Instead of me receiving the warning like in the vignettes
Warning in if (is.na(df.grn)) {: the condition has length > 1 and only the first element will be used
The program is returning:
Error in if (is.na(df.grn)) { : the condition has length > 1
Do you have any idea what is going on?
Thank you,
Debora
Hi!
I noticed that I needed to use $Harmony in the tutorial to prepare the data
#add dimension reduced matrix
harmony_matDR <- proj@reducedDims$Harmony$matDR
but I only had $UMAP in my data ,because my samples are not duplicated.
In this case, can I still use this tutorial? How can I get the final obj.atac?
Dear scMEGA team,
Thank you for developing this useful package.
I am working with Arabidopsis datasets - GSE173834 for the scATAC data and my own scRNA-seq data. When I integrated them, at the step of SelectTFs I got this error
Creating Trajectory Group Matrix..
Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
When I try to debug, it looks like, under GetTrajectory, Matrix::rowMeans fails. However, the cell_names has multiple cell names, so this should not be a vector. I will be grateful if you could help me on that. Please let me know what details you need, I will share.
Regards,
Rahul
Hi!
I use obj.pair to look at the expression of marker gene.
I want to know whether the gene expression matrix in scRNA or the gene score matrix in scATAC is used here ,or the expression quantity after algorithm integration?
obj.pair
An object of class Seurat
67334 features across 5396 samples within 2 assays
Active assay: RNA (17352 features, 0 variable features)
1 other assay present: ATAC
4 dimensional reductions calculated: pca, umap, harmony, umap_harmony
p <- FeaturePlot(obj.pair, features = c("COL1A1"),reduction ="umap_harmony",min.cutoff = "q10", max.cutoff = "q90")
Hi,
Thanks for maintaining a really nice R package. I am trying to use the CoembedData function as follows:
obj.coembed <- CoembedData(
RNA,
ATAC,
gene.activities,
weight.reduction = "umap",
verbose = T
)
with the following Seurat objects:
RNA
An object of class Seurat
29875 features across 4224 samples within 2 assays
Active assay: integrated (2000 features, 2000 variable features)
2 layers present: data, scale.data
1 other assay present: RNA
2 dimensional reductions calculated: pca, umap
ATAC
An object of class Seurat
149666 features across 5309 samples within 2 assays
Active assay: peaks (110042 features, 110041 variable features)
2 layers present: counts, data
1 other assay present: RNA
2 dimensional reductions calculated: lsi, umap
but I get the following error messages:
Performing data integration using Seurat...
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Centering and scaling data matrix
|====================================================================================================| 100%
Running CCA
Merging objects
Finding neighborhoods
Finding anchors
Found 10350 anchors
Filtering anchors
Retained 2825 anchors
Warning: Please provide a matrix that has the same number of columns as the number of reference cells used in anchor finding.
Number of columns in provided matrix : 2976
Number of columns required : 4224
Skipping element 1.
Error: None of the provided refdata elements are valid.
In addition: Warning messages:
1: In LayerData.Assay5(object = assays[[i]], layer = lyr, fast = TRUE) :
multiple layers are identified by counts.1 counts.2
only the first layer is used
2: In LayerData.Assay5(object = object[[assay]], layer = layer, ...) :
multiple layers are identified by data.1 data.2
only the first layer is used
I think this error likely stems from some missing metadata processing step shown in the pre-processing script. However, I am unable to follow along with the data processing script because I am processing my ATAC data with Signac, not ArchR (I am working with plant datasets and have been unsuccessful generating the input genome objects required by ArchR). Do you think this is the most likely cause of this error - or am I missing some other crucial component. Thank you very much for your help, please let me know if you need additional info, I'll include my sessionInfo below.
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scMEGA_1.0.2 rtracklayer_1.60.1 hdf5r_1.3.8 GenomicFeatures_1.52.2 AnnotationDbi_1.62.2
[6] Biobase_2.60.0 GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2
[11] BiocGenerics_0.46.0 patchwork_1.1.3 readr_2.1.4 readxl_1.4.3 tidyr_1.3.0
[16] ggplot2_3.4.4 dplyr_1.1.3 magrittr_2.0.3 Signac_1.11.0 Seurat_4.9.9.9081
[21] SeuratObject_4.9.9.9093 sp_2.1-1
loaded via a namespace (and not attached):
[1] destiny_3.14.0 matrixStats_1.0.0 spatstat.sparse_3.0-2 bitops_1.0-7
[5] httr_1.4.7 RColorBrewer_1.1-3 doParallel_1.0.17 tools_4.3.1
[9] sctransform_0.4.1 utf8_1.2.3 R6_2.5.1 lazyeval_0.2.2
[13] uwot_0.1.16 withr_2.5.1 prettyunits_1.2.0 gridExtra_2.3
[17] progressr_0.14.0 factoextra_1.0.7 cli_3.6.1 spatstat.explore_3.2-3
[21] fastDummies_1.7.3 labeling_0.4.3 robustbase_0.99-0 spatstat.data_3.0-1
[25] proxy_0.4-27 ggridges_0.5.4 pbapply_1.7-2 Rsamtools_2.16.0
[29] R.utils_2.12.2 parallelly_1.36.0 TTR_0.24.3 rstudioapi_0.15.0
[33] RSQLite_2.3.1 generics_0.1.3 BiocIO_1.10.0 ica_1.0-3
[37] spatstat.random_3.1-6 vroom_1.6.4 car_3.1-2 Matrix_1.6-1.1
[41] fansi_1.0.5 abind_1.4-5 R.methodsS3_1.8.2 lifecycle_1.0.3
[45] scatterplot3d_0.3-44 yaml_2.3.7 carData_3.0-5 SummarizedExperiment_1.30.2
[49] BiocFileCache_2.8.0 Rtsne_0.16 grid_4.3.1 blob_1.2.4
[53] promises_1.2.1 crayon_1.5.2 miniUI_0.1.1.1 lattice_0.20-41
[57] cowplot_1.1.1 KEGGREST_1.40.1 pillar_1.9.0 boot_1.3-28
[61] rjson_0.2.21 future.apply_1.11.0 codetools_0.2-18 fastmatch_1.1-4
[65] leiden_0.4.3 glue_1.6.2 pcaMethods_1.92.0 data.table_1.14.8
[69] remotes_2.4.2.1 vcd_1.4-11 vctrs_0.6.4 png_0.1-8
[73] spam_2.9-1 cellranger_1.1.0 gtable_0.3.4 cachem_1.0.8
[77] S4Arrays_1.0.6 mime_0.12 tidygraph_1.2.3 RcppEigen_0.3.3.9.3
[81] survival_3.5-5 SingleCellExperiment_1.22.0 RcppRoll_0.3.0 pheatmap_1.0.12
[85] iterators_1.0.14 ellipsis_0.3.2 fitdistrplus_1.1-11 ROCR_1.0-11
[89] nlme_3.1-162 xts_0.13.1 bit64_4.0.5 progress_1.2.2
[93] filelock_1.0.2 RcppAnnoy_0.0.21 rprojroot_2.0.3 irlba_2.3.5.1
[97] KernSmooth_2.23-20 colorspace_2.1-0 DBI_1.1.3 nnet_7.3-18
[101] smoother_1.1 tidyselect_1.2.0 processx_3.8.2 bit_4.0.5
[105] compiler_4.3.1 curl_5.1.0 xml2_1.3.5 desc_1.4.2
[109] DelayedArray_0.26.7 plotly_4.10.2 scales_1.2.1 hexbin_1.28.3
[113] DEoptimR_1.1-3 lmtest_0.9-40 callr_3.7.3 rappdirs_0.3.3
[117] stringr_1.5.0 digest_0.6.33 goftest_1.2-3 spatstat.utils_3.0-3
[121] XVector_0.40.0 htmltools_0.5.6.1 pkgconfig_2.0.3 MatrixGenerics_1.12.3
[125] dbplyr_2.3.4 fastmap_1.1.1 ggthemes_4.2.4 rlang_1.1.1
[129] htmlwidgets_1.6.2 shiny_1.7.5.1 farver_2.1.1 zoo_1.8-12
[133] jsonlite_1.8.7 BiocParallel_1.34.2 R.oo_1.25.0 RCurl_1.98-1.12
[137] GenomeInfoDbData_1.2.10 dotCall64_1.1-0 munsell_0.5.0 Rcpp_1.0.11
[141] viridis_0.6.4 reticulate_1.34.0 stringi_1.7.12 ggraph_2.1.0
[145] zlibbioc_1.46.0 MASS_7.3-58.3 plyr_1.8.9 pkgbuild_1.4.2
[149] parallel_4.3.1 listenv_0.9.0 ggrepel_0.9.4 deldir_1.0-9
[153] graphlayouts_1.0.1 Biostrings_2.68.1 splines_4.3.1 tensor_1.5
[157] hms_1.1.3 ps_1.7.5 ranger_0.15.1 igraph_1.5.1
[161] spatstat.geom_3.2-7 RcppHNSW_0.5.0 reshape2_1.4.4 biomaRt_2.56.1
[165] XML_3.99-0.14 laeken_0.5.2 tweenr_2.0.2 tzdb_0.4.0
[169] foreach_1.5.2 httpuv_1.6.11 VIM_6.2.2 RANN_2.6.1
[173] purrr_1.0.2 polyclip_1.10-6 future_1.33.0 scattermore_1.2
[177] ggforce_0.4.1 xtable_1.8-4 restfulr_0.0.15 e1071_1.7-13
[181] RSpectra_0.16-1 later_1.3.1 viridisLite_0.4.2 class_7.3-21
[185] tibble_3.2.1 memoise_2.0.1 GenomicAlignments_1.36.0 cluster_2.1.4
[189] ggplot.multistats_1.0.0 globals_0.16.2
Hi! thanks for the jobs
I've got the object to build the network on
head(df.grn2)
tf gene weights
77 ATF4 GABARAPL1 0.8221601
85 ATF4 GRIK3 0.8021985
100 ATF4 KIT 0.8130254
102 ATF4 KRAS 0.8125377
166 ATF4 SEC22B 0.8518962
223 ATF6 ACAP3 0.8103851
V(netobj)$type <- ifelse(V(V(netobj)$type <- ifelse(V(netobj)$name %in% dfgrn$tf,"TF/Gene","Gene"))$name %in% df.grn2$tf,"TF/Gene","Gene")
netobj <- graph_from_data_frame(df.grn2,directed = TRUE)
Looking at the object, it looks normal
netobj
IGRAPH 54cba5f DN-B 355 788 --
p <- NetCentPlot(netobj, "RUNX1")
Error in layout_with_focus(graph, v = focus, weights = weights, iter = niter, :
g must be a connected graph.
I checked netobj
is_connected(netobj, mode = "weak")
[1] FALSE
I don't know how do I get the final NetCentPlot, Can you offer any help?
thanks a lot!
I am trying your vingette with a Seurat object which is scRNA + scATAC combined. But I get the following error message when I run PairCells(). My scMEGA part of my code is below:
coembed.sub <- RunDiffusionMap(coembed_harmon2, reduction = "harmony")
cols <- ArchR::paletteDiscrete([email protected][, "clusters_merge"])
p1 <- DimPlot(coembed.sub, group.by = "clusters_merge", label = TRUE,
reduction = "dm", shuffle = TRUE, cols = cols) +
xlab("DC 1") + ylab("DC 2")
p1
p2 <- DimPlot(coembed.sub, group.by = "cm_clusters", label = TRUE,
reduction = "dm", shuffle = TRUE, cols = cols) +
xlab("DC 1") + ylab("DC 2")
p2
DimPlot(coembed.sub, reduction = "dm",
group.by = "clusters_merge", split.by = "assay", cols = cols)
DimPlot(coembed.sub, reduction = "dm",
group.by = "cm_clusters", split.by = "assay", cols = cols)
df.pair <- PairCells(object = coembed.sub, reduction = "harmony",
pair.by = "assay", ident1 = "ATAC", ident2 = "RNA")
Getting dimensional reduction data for pairing cells...
Pairing cells using geodesic mode...
Constructing KNN graph for computing geodesic distance ..
Error in diag<-
(*tmp*
, value = 0) :
only matrix diagonals can be replaced
If you could help I would really appreciate it.
Thanks
Chris
Hi, sorry to bother you. When I run Co-embedding(), I have a question. If I don't ran harmony or other methods on snRNA-seq and snATAC-seq datasets for removing batch effect, what weight.reduction
value should I use?
Do you have any suggestions? Any help would be greatly appreciated.
Hello scMEGA team,
Thank you for developed this tool.
I am trying to plot my samples with GRNSpatialPlot, but the images are coming stretched and deformed. The same happens on Seurat command SpatialFeaturePlot, but I just add crop=F and fixed.
For the GRNSpatialPlot I didn't find a similar option.
Best,
Debora
Hi,Sorry to bother you. When I run SelectTFs function, I met the following error: Error in cor.test.default(mat1[x, ], mat2[x, ]) : not enough finite observations
.
I don't know what went wrong, any suggestions?
Thanks!
Hi! Is there any other way to install scMEGA package?
There are many package conflicts when I install package, even if I create a new conda environment.
Hope to your reply. Thanks for your time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.