immunogenomics / harmony Goto Github PK
View Code? Open in Web Editor NEWFast, sensitive and accurate integration of single-cell data with Harmony
Home Page: https://portals.broadinstitute.org/harmony/
License: Other
Fast, sensitive and accurate integration of single-cell data with Harmony
Home Page: https://portals.broadinstitute.org/harmony/
License: Other
Dear all,
'd appreciate having your suggestions on the following case of scRNAseq analysis with HARMONY /Seurat 3.1. ; the question is : what analysis strategy would you recommend (described below) ?
shall we have 4 batches of scRNA-seq data of these experiments :
WT_batch1, WT_batch2, A_batch1, A_batch2
WT_batch3, WT_batch4, B_batch3, B_batch4
what is the optimal way to analyze the data-sets ? Several analysis strategies are possible :
STRATEGY A.
WT_batch1, WT_batch2 : to produce WT_batch_1_2
A_batch1, A_batch2 : to produce A_batch_1_2
WT_batch3, WT_batch4 : to produce WT_batch_3_4
B_batch3, B_batch4 : to produce B_batch_3_4
STRATEGY B.
to use SEURAT MERGE function in order to have all the raw data (WT_batch1, WT_batch2, A_batch1, A_batch2, WT_batch3, WT_batch4, B_batch3, B_batch4) in a large MATRIX
could I apply HARMONY on all the experiments with those 2 replicates, and afterwards, how could I call the function FindMarkers in SEURAT (FindMarkers(object, ident.1, ident.2), in order to specify the REPLICATES in ident.1 and in ident.2 ?
Any other analysis strategy ? Any suggestions, comments would be very welcome ! Thanks a lot !
bogdan
Hi.
First of all, thank you for this nice software.
I'm trying to use the package to integrate the following datasets:
I want to integrate all of them and explore condition 1 vs condition 2 the resulting clusters.
Following the initial steps described in the Seurat V3 Interface, I need to join all the counts matrices. The problem that I'm facing is that the datasets have different number of rows (features). The 10x datasets don't have this problem as they have the same number. However, the Indrops differ (compared to the 10x and between conditions).
How is the recommended way to proceed? Should I use a merged_seurat object instead?
Many thanks in advance.
Best,
Victor.
Hi,
I am trying to use harmony within a jupyter notebook. For the meta_data variable, I have a python factor stored but when I run the following (substituted with my variable names):
HarmonyMatrix(pc_mat, meta_data, vars_use, theta=4)
I get an error saying:
Error in if (ncol(pc_mat) != N) { : argument is of length zero
The issue is that when N <- nrow(meta_data)
is run in the function HarmonyMatrix, it returns 0.
Could you confirm the format that is expected by the variable meta_data
? Should it be a factor? If so, shouldn't the check in HarmonyMatrix be N <- length(meta_data)
?
I hope this makes sense!
Thanks,
Catherine
I can't open page of https://github.com/immunogenomics/harmony2019
Is this page move to another place?
Hi,
I was wondering why you don't include filtering of cells (e.g. nGene, % mitochondrial genes) in your example Seurat workflow?
Thanks
Catherine
Thanks for this excellent software.
I am trying to apply harmony to my dataset , which has to handle at least two covariates.
I was wondering that how to choose suitable theta values for different covariates and how to interpret this parameter ?
When I run 'RunHarmony' for a seurat object, it will return a Warning message:
Quick-TRANSfer stage steps exceeded maximum (= 6687600)
What is this going? Does this will affect the final result?
I am trying to install Harmony and it is running into following error
> install_github("immunogenomics/harmony")
Downloading GitHub repo immunogenomics/harmony@master
√ checking for file 'C:\Users\vg272\AppData\Local\Temp\RtmpiyxWkm\remotes1f94490d690a\immunogenomics-harmony-1a6d77a/DESCRIPTION' ...
- preparing 'harmony':
√ checking DESCRIPTION meta-information ...
- cleaning src
- running 'cleanup.win'
- checking for LF line-endings in source and make files and shell scripts
- checking for empty or unneeded directories
- looking to see if a 'data/datalist' file should be added
- building 'harmony_1.0.tar.gz'
Warning: file 'harmony/cleanup' did not have execute permissions: corrected
Warning: file 'harmony/configure' did not have execute permissions: corrected
Installing package into ‘C:/Users/vg272/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package 'harmony' ...
** using staged installation
** libs
C:/Rtools/mingw_64/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG -I"C:/Users/vg272/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/vg272/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/vg272/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c RcppExports.cpp -o RcppExports.o
C:/Rtools/mingw_64/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-36~1.1/include" -DNDEBUG -I"C:/Users/vg272/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/vg272/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/vg272/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c harmony.cpp -o harmony.o
harmony.cpp: In member function 'void harmony::init_cluster_cpp(unsigned int)':
harmony.cpp:69:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (C > 0 && C < K) {
^
harmony.cpp: In member function 'CUBETYPE harmony::moe_ridge_get_betas_cpp()':
harmony.cpp:234:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (unsigned k = 0; k < K; k++) {
^
C:/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o harmony.dll tmp.def RcppExports.o harmony.o -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lRlapack -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lRblas -lgfortran -lm -lquadmath -LC:/PROGRA~1/R/R-36~1.1/bin/x64 -lR
installing to C:/Users/vg272/Documents/R/win-library/3.6/00LOCK-harmony/00new/harmony/libs/x64
** R
** data
** byte-compile and prepare package for lazy loading
Error: (converted from warning) package 'Rcpp' was built under R version 3.6.2
Execution halted
ERROR: lazy loading failed for package 'harmony'
* removing 'C:/Users/vg272/Documents/R/win-library/3.6/harmony'
Error: Failed to install 'harmony' from GitHub:
(converted from warning) installation of package ‘C:/Users/vg272/AppData/Local/Temp/RtmpiyxWkm/file1f9445a26efc/harmony_1.0.tar.gz’ had non-zero exit status
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_2.2.1 usethis_1.5.1 RcppArmadillo_0.9.800.3.0
[4] RcppAnnoy_0.0.14 Rcpp_1.0.3 monocle3_0.2.0
[7] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.2
[10] BiocParallel_1.20.1 matrixStats_0.55.0 GenomicRanges_1.38.0
[13] GenomeInfoDb_1.22.0 IRanges_2.20.1 S4Vectors_0.24.1
[16] Biobase_2.46.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] viridis_0.5.1 pkgload_1.0.2 viridisLite_0.3.0 gtools_3.8.1
[5] assertthat_0.2.1 GenomeInfoDbData_1.2.2 remotes_2.1.0 sessioninfo_1.1.1
[9] backports_1.1.5 pillar_1.4.3 lattice_0.20-38 glue_1.3.1
[13] digest_0.6.23 RColorBrewer_1.1-2 XVector_0.26.0 colorspace_1.4-1
[17] Matrix_1.2-17 plyr_1.8.5 pkgconfig_2.0.3 zlibbioc_1.32.0
[21] purrr_0.3.3 scales_1.1.0 processx_3.4.1 RANN_2.6.1
[25] gdata_2.18.0 tibble_2.1.3 ggplot2_3.2.1 ellipsis_0.3.0
[29] withr_2.1.2 ROCR_1.0-7 lazyeval_0.2.2 cli_2.0.1
[33] magrittr_1.5 crayon_1.3.4 ps_1.3.0 memoise_1.1.0
[37] fs_1.3.1 fansi_0.4.1 MASS_7.3-51.4 gplots_3.0.1.1
[41] pkgbuild_1.0.6 prettyunits_1.1.0 tools_3.6.1 lifecycle_0.1.0
[45] stringr_1.4.0 munsell_0.5.0 callr_3.4.0 compiler_3.6.1
[49] caTools_1.17.1.3 rlang_0.4.2 grid_3.6.1 RCurl_1.95-4.12
[53] rstudioapi_0.10 bitops_1.0-6 testthat_2.3.1 gtable_0.3.0
[57] codetools_0.2-16 curl_4.3 reshape2_1.4.3 R6_2.4.1
[61] gridExtra_2.3 dplyr_0.8.3 rprojroot_1.3-2 desc_1.2.0
[65] KernSmooth_2.23-15 stringi_1.4.5 tidyselect_0.2.5
Please help!
Hello everyone,
I've been trying to install Harmony, however I always get an error message saying that I don't have necessary tools to compile a package.
I'm using a macOS High Sierra version 10.13.6
I have devtool installed version 2.0.1
The R studio version is 1.1.463
And I have xcode version 10.1 installed.
Any help?
This is the error mesasge I get:
Thank you
Hi,
I am a bit confused as to how I can save multiple harmony reductions in a single Seurat objects. I thought it required me to change the "reduction.save" parameter, but I can't get this to work.
Please could you give me an example of how to do this?
Many thanks,
Lucy
Dear Harmony Team,
Thanks for the tool!
I was running this tool with a plant scRNA-seq data, however this resulted an warning while combining with Seurat.
I tried to use the "new object name", however it also fails. Thanks in advance for any help!
My code and error are given below
Regards,
Rahul
set.seed(42)
ara.data <- Read10X(data.dir = "E:/Arabidopsis datasets/Zhang/root_atlas/root_matrix/TAIR/")
ara <- CreateSeuratObject(counts = ara.data, min.cells = 3, project = "Zhang_SC")
dim(ara)
ara[["percent.mt"]] <- PercentageFeatureSet(ara, pattern = "^ATM")
ara[["percent.ct"]] <- PercentageFeatureSet(ara, pattern = "^ATC")
ara <- subset(ara, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 5 & percent.ct < 5)
VlnPlot(ara, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.ct"), ncol = 4)
dim(ara)
ara <- NormalizeData(ara, normalization.method = "LogNormalize")
ara <- FindVariableFeatures(ara, selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
ara <- ScaleData(ara, verbose = FALSE)
ara <- RunPCA(ara, npcs = 30, verbose = FALSE)
ara <- RunHarmony(ara,"orig.ident",assay.use="RNA")
Harmony converged after 4 iterations
Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity
Hi, I'm trying to install harmony in an Anaconda environment using the conda skeleton cran command. Unfortunately, doing so fails unless there is at least one release for the repo. Would it be possible to add a release to make the repo compatible with conda?
This might be a naive question, but since harmony typically is not used to correct data in the original input space (i.e. gene expression) and instead works with PCA embeddings, I'm wondering if it would be appropriate to attempt to impute gene expression based off of the new embedding coordinates and the original PCA loadings (i.e. E*L, where E is an mxp matrix of m cells and p PCs, and L is a pxn matrix of loadings for the p PCs and n genes).
Hi, I'm running into an error when trying to call HarmonyMatrix
in another package:
Error in cpp_object_initializer(.self, .refClassDef, ...) :
could not find function "cpp_object_initializer"
This seems to be due to an issue with Rcpp, see RcppCore/Rcpp#168 for an explanation. Adding import(methods,Rcpp)
to the NAMESPACE file fixes the problem.
Hi, it seems reduction.save does not work when using RunHarmony. I get the following error when I try to use something other than "harmony" for reduction.save.
Error in `[[.Seurat`(object, reduction) :
Cannot find 'harmony' in this Seurat object
Upon checking the code I realised in RunHarmony.R 2 lines are incorrect. I think in lines 326 and 334 "harmony"
should be replaced by reduction.save
.
Hey,
The repo for LISI is empty and I am searching for your computation of LISI in the paper. Is there a code that you can share? Do you know of a python implementation?
Thanks!
tau
is set to 0 by default. I wonder if this snippet is working as intended?
Line 170 in 1a6d77a
For the toy example, we have the following:
> nclust
[1] 50
> tau
[1] 0
> nclust * tau
[1] 0
> (N_b / (nclust * tau))
half jurkat t293
Inf Inf Inf
> -(N_b / (nclust * tau))
half jurkat t293
-Inf -Inf -Inf
> -(N_b / (nclust * tau)) ^ 2
half jurkat t293
-Inf -Inf -Inf
> exp(-(N_b / (nclust * tau)) ^ 2)
half jurkat t293
0 0 0
> 1 - exp(-(N_b / (nclust * tau)) ^ 2)
half jurkat t293
1 1 1
> theta * (1 - exp(-(N_b / (nclust * tau)) ^ 2))
half jurkat t293
1 1 1
We end up with c(1, 1, 1)
for theta
.
I guess you intended for the default to be c(2, 2, 2)
instead of c(1, 1, 1)
, right?
Lines 147 to 149 in 1a6d77a
Hi,
This is not an issue, but a question.
Harmony takes the PCA embeddings and works with this data, do you know what would happen if another dimensionality reduction techniques is used instead? Would this make the results unreliable as harmony is tailored towards PCA embeddings?
Kind regards,
Connor
I have a 10X single-cell dataset consisting of four treatments and two replicates (8 samples). There are batch effects between my samples. To do batch correction between my samples, what is ideal -
to do batch correction across all eight samples or
do batch correction only between biological replicates of the same treatment and use corrected treatment- specific matrix for further analysis to compare between treatments.
Hi, thanks for the great tool! It works when I'm correcting for one variable only, but I seem to be unable to make it work with multiple variables- how many iterations to run? I tried with 10,20, and also increased the nclust to 130 or 150 (I have ~80k cells to align for some of my data) but it never reaches convergence and upon inspection, the dimensionality reduction matrix has only NaN. The PCA looks fine, i'm computing 50 PC on 2500-3000 variable genes, and using all to run harmony.
is there some optimal set of params to use when correcting for multiple covariates?
thank you!
When install the package, I meet the following error, could you please help me to solve it ?
###################################
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'harmony'
Disclaimer: This is not an issue, rather a self-serving request for additional control over how covariates are treated during integration. I am curious if there is a plan to incorporate nested covariate structures in the model? There are occasions when an experimentalist (errrrm .... asking for a friend) did not consider the effects of separating treatment/control in separate batches. In such instances, it is desirable to remove variance between only a subset of the data points, rather than all observations. Are there any recommended strategies/tricks for tailoring harmony arguments in these situations? I have been performing separate analysis of each treatment/control (lets say different tissues) individually. The ideal situation would be to fully integrate all data into a single analysis, by effectively controlling for batch effects using a nested approach, thus conserving predetermined biological sources of variation.
Any suggestions, workarounds, or potential directions are welcome.
P.s. The ability to include numeric covariates (such as size factors) would also be nice. I promise to not overfit.
Hi, your package looks very interesting for performing data integration. I was wondering if there might be plans to submit to Bioconductor and in the process incorporate support for the SingleCellExperiment (SCE) class for more easily integrating into existing scRNA-seq workflows that rely on SCE? Thanks!
Hi,
Can the seurat v3 SCTransform module be used prior to harmony?
eg:
pbmc_harmony_integrated <- merge(pbmc1, y=c(pbmc2,pbmc3, pbmc4), add.cell.ids = c("Donor1","Donor2","Donor3","Donor4"), project = "pbmc_combined") %>% SCTransform(vars.to.regress = "percent.mt", return.only.var.genes =FALSE, verbose = TRUE) %>% RunPCA(pc.genes = [email protected], verbose = FALSE) %>% RunHarmony(c("donor.number"),plot_convergence = TRUE)
Thanks,
Shams
Dear,
Will harmony support scanpy in later days ?
The correction expression matrix is different when re-run harmony every time, they're unreproducible. It seems that there are some random factors in harmony algorithm, can you fix it and add some help documentations on the detailed usage ?
It still refers to bioRxiv but could be updated to Nature Methods.
I tried running Harmony (latest GitHub version) on a Seurat object and got an error:
> seurat_obj = RunHarmony(object = seurat_obj, group.by.vars = "orig.ident")
Harmony 1/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 2/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 3/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 4/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 5/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 6/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 7/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 8/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 9/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 10/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in data.use %*% cell.embeddings : non-conformable arguments
Do you know what could be the problem?
Hi,
I've tried running harmony on a dataset of cultured cell lines and patient tumour samples. There were about 26 patient cell culture lines, each representing a separate scRNA experiment, as well as several tumours, each having at least one replicate scRNA experiment. PCA was determined using variable genes found via the mean.var.plot method. It appears that when I run harmony using default settings, there's a jump in the objective function. I don't think this is expected behavior, so I was wondering if there were any recommendatations you had for diagnosing what might be going wrong here.
Here's the plot of objective function vs iterations:
Here's a PCA plot of the data prior to correction:
And the harmony coordinates after:
I've found that genes differentially expressed between bulk tumours and cell lines (found using bulk RNA-seq data) are indeed differentially expressed between the tumour and cell line cells for this scRNA data, so my hunch is that most of the differences observed between tumour and line cells are due to actual biological effects.
Thanks for any help you might be able to give.
I can give you sesssionInfo details or code upon further request if necessary.
Grats on the paper! Well deserved.
I'm trying to use Harmony in a workflow, and I'm a sucker for replicability. However, I've now ran the same data through Harmony three times, ultimately leading to three different downstream UMAP manifolds and partitions. Is there some way that I could seed whatever RNG elements Harmony uses, even if it's via a similarly ugly trick as calling np.random.seed(0)
before calling scrublet to make that method deterministic?
EDIT: Adding the command I call, which may or may not be relevant: hem = HarmonyMatrix(pca, batch, theta=4, verbose=FALSE, do_pca=FALSE)
, on the PCA coordinates and a batch vector ported from an AnnData object.
EDIT EDIT: Found an old issue, apparently set.seed(1)
helps with regards to this stuff. Any chance of the remaining 10% of the way to determinism showing up soon?
Hi,
I have a problem installing "harmorny". Here is the screenshot of the error.
The R studio pops up the following window.
When I hit "Yes", it led me to the following webpage which I still don't know what to do.
https://www.cnet.com/how-to/install-command-line-developer-tools-in-os-x/
Here is my Session info in case you need that.
Could you please give me some advice to fix the issue? Thank you so much!
Hello, I have the following error when trying to install harmony, thanks for looking over and the help!
Installing package into ‘C:/Users/Jeff's PC/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
*** arch - i386
c:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c harmony.cpp -o harmony.o
harmony.cpp: In member function 'void harmony::init_cluster_cpp(unsigned int)':
harmony.cpp:69:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (C > 0 && C < K) {
^
harmony.cpp: In member function 'CUBETYPE harmony::moe_ridge_get_betas_cpp()':
harmony.cpp:234:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (unsigned k = 0; k < K; k++) {
^
c:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o harmony.dll tmp.def RcppExports.o harmony.o -LC:/PROGRA1/R/R-361.2/bin/i386 -lRlapack -LC:/PROGRA1/R/R-361.2/bin/i386 -lRblas -lgfortran -lm -lquadmath -LC:/PROGRA1/R/R-361.2/bin/i386 -lR
installing to C:/Users/Jeff's PC/Documents/R/win-library/3.6/00LOCK-harmony/00new/harmony/libs/i386
*** arch - x64
c:/Rtools/mingw_64/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_64/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppArmadillo/include" -I"C:/Users/Jeff's PC/Documents/R/win-library/3.6/RcppProgress/include" "-DUSE_FLOAT_TYPES=0" -O2 -Wall -mtune=generic -c harmony.cpp -o harmony.o
harmony.cpp: In member function 'void harmony::init_cluster_cpp(unsigned int)':
harmony.cpp:69:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (C > 0 && C < K) {
^
harmony.cpp: In member function 'CUBETYPE harmony::moe_ridge_get_betas_cpp()':
harmony.cpp:234:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (unsigned k = 0; k < K; k++) {
^
c:/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o harmony.dll tmp.def RcppExports.o harmony.o -LC:/PROGRA1/R/R-361.2/bin/x64 -lRlapack -LC:/PROGRA1/R/R-361.2/bin/x64 -lRblas -lgfortran -lm -lquadmath -LC:/PROGRA1/R/R-361.2/bin/x64 -lR
installing to C:/Users/Jeff's PC/Documents/R/win-library/3.6/00LOCK-harmony/00new/harmony/libs/x64
** R
** data
** byte-compile and prepare package for lazy loading
Error: unexpected symbol in "setwd('C:/Users/JEFF'S"
Execution halted
ERROR: lazy loading failed for package 'harmony'
Hi, guys, thanks for your great tool!
I have a very large data comprised several levels of batches which are 3 platform, 10+ datasets and 100+ patients.
I want to use harmony to reduce these unwant variances, but I don't know how to set the order and their weights (i.e. 'theta') for my data. Do you have any suggestion?
BTW, does the order of covariates would affect harmony's result?
Hi,
I'm using the wrapper RunHarmony on a seurat object on which I already performed a PCA and selected the dimensions using the ElbowPlot. Where can I find the statistics of the harmony redution so that I can choose the appropriate number of dimensions for downstream analysis?
Hi,
Sorry if this is a naive question, but can Harmony be used in cases where only some of the cell populations are overlapping between samples, or does it assume that the same cell populations are present in all samples?
Many thanks,
Lucy
Hi,
Harmony is a great tool for me. However, my data is from three different samples with distinct treatments. What should I do when I run harmony to remove the batch effect?
Thanks!
hi
I tried to install Harmony using the code
but i receive the error below.
library(devtools)
install_github("immunogenomics/harmony")
can anyone maybe help to implement this in R?
thanks a lot
Sara
Downloading GitHub repo immunogenomics/harmony@master
✓ checking for file ‘/private/var/folders/nb/57kr63yx2q53xww_zr7pgfrmncmyb0/T/RtmpGRTsNz/remotes106870bdc345/immunogenomics-harmony-1a6d77a/DESCRIPTION’ ...
─ preparing ‘harmony’:
✓ checking DESCRIPTION meta-information ...
─ cleaning src
─ running ‘cleanup’
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘harmony_1.0.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] devtools_2.2.1 usethis_1.5.1
[3] gridExtra_2.3 dplyr_0.8.3
[5] Seurat_3.1.2 cowplot_1.0.0
[7] ggplot2_3.2.1 SingleCellExperiment_1.6.0
[9] SummarizedExperiment_1.14.1 DelayedArray_0.10.0
[11] BiocParallel_1.18.1 matrixStats_0.55.0
[13] Biobase_2.44.0 GenomicRanges_1.36.1
[15] GenomeInfoDb_1.20.0 IRanges_2.18.3
[17] S4Vectors_0.22.1 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] backports_1.1.5 sn_1.5-4 plyr_1.8.5
[4] igraph_1.2.4.2 lazyeval_0.2.2 splines_3.6.0
[7] listenv_0.8.0 TH.data_1.0-10 digest_0.6.23
[10] htmltools_0.4.0 fansi_0.4.1 gdata_2.18.0
[13] memoise_1.1.0 magrittr_1.5 cluster_2.1.0
[16] ROCR_1.0-7 remotes_2.1.0 globals_0.12.5
[19] RcppParallel_4.4.4 R.utils_2.9.2 sandwich_2.5-1
[22] prettyunits_1.1.0 colorspace_1.4-1 ggrepel_0.8.1
[25] xfun_0.12 callr_3.4.0 crayon_1.3.4
[28] RCurl_1.95-4.12 jsonlite_1.6 zeallot_0.1.0
[31] survival_3.1-8 zoo_1.8-7 ape_5.3
[34] glue_1.3.1 gtable_0.3.0 zlibbioc_1.30.0
[37] XVector_0.24.0 leiden_0.3.1 pkgbuild_1.0.6
[40] future.apply_1.4.0 scales_1.1.0 mvtnorm_1.0-12
[43] bibtex_0.4.2.2 Rcpp_1.0.3 metap_1.2
[46] plotrix_3.7-7 viridisLite_0.3.0 reticulate_1.14
[49] rsvd_1.0.2 SDMTools_1.1-221.2 tsne_0.1-3
[52] htmlwidgets_1.5.1 httr_1.4.1 gplots_3.0.1.2
[55] RColorBrewer_1.1-2 ellipsis_0.3.0 TFisher_0.2.0
[58] ica_1.0-2 pkgconfig_2.0.3 R.methodsS3_1.7.1
[61] farver_2.0.1 uwot_0.1.5 tidyselect_0.2.5
[64] labeling_0.3 rlang_0.4.2 reshape2_1.4.3
[67] munsell_0.5.0 tools_3.6.0 cli_2.0.1
[70] ggridges_0.5.2 evaluate_0.14 stringr_1.4.0
[73] yaml_2.2.0 npsurv_0.4-0 processx_3.4.1
[76] knitr_1.26 fs_1.3.1 fitdistrplus_1.0-14
[79] caTools_1.17.1.4 purrr_0.3.3 RANN_2.6.1
[82] pbapply_1.4-2 future_1.15.1 nlme_3.1-143
[85] R.oo_1.23.0 compiler_3.6.0 rstudioapi_0.10
[88] curl_4.3 plotly_4.9.1 png_0.1-7
[91] testthat_2.3.1 lsei_1.2-0 tibble_2.1.3
[94] stringi_1.4.5 ps_1.3.0 desc_1.2.0
[97] RSpectra_0.16-0 lattice_0.20-38 Matrix_1.2-18
[100] multtest_2.40.0 vctrs_0.2.1 mutoss_0.1-12
[103] pillar_1.4.3 lifecycle_0.1.0 BiocManager_1.30.10
[106] Rdpack_0.11-1 lmtest_0.9-37 RcppAnnoy_0.0.14
[109] data.table_1.12.8 bitops_1.0-6 irlba_2.3.3
[112] gbRd_0.4-11 R6_2.4.1 KernSmooth_2.23-16
[115] sessioninfo_1.1.1 codetools_0.2-16 pkgload_1.0.2
[118] MASS_7.3-51.5 gtools_3.8.1 assertthat_0.2.1
[121] rprojroot_1.3-2 withr_2.1.2 sctransform_0.2.1
[124] mnormt_1.5-5 multcomp_1.4-12 GenomeInfoDbData_1.2.1
[127] grid_3.6.0 tidyr_1.0.0 rmarkdown_2.0
[130] Rtsne_0.15 numDeriv_2016.8-1.1
I had tried out Harmony first from within scanpy-jupyter lab framework following the notebook from the Teichman group (pancreas-5-Harmony-kBET.ipynb; see: https://github.com/Teichlab/bbknn/tree/master/examples) about a year ago (Feb 2019). Then, I could easily reproduce the batch effect correction on the said pancreas dataset and have used Harmony since then. Now, on a new system, I installed the most recent github version of Harmony and ran the same notebook. The run completed but the output looks drastically different. I tried it on other datasets on which I had applied Harmony before and I see the same differences: it generates a large to very large number of clusters that do not integrate well or separate well. I am attaching the earlier (2019) and the recent (2020) UMAPs for review. The relevant code that I applied:
%%R -i pca -i batch -o hem
library(harmony) library(magrittr)
hem <- HarmonyMatrix(pca, batch, theta=4)
hem = data.frame(hem)
Where pca is X_pca from the pca step in scanpy and batch is the reference to the batch variable to be sampled. I tried different thetas (0 to 5; not shown). There is a change but it is minimal.
2019:
2020:
I really like the fast processing that comes with Harmony. Hence, I would be interested in feedback on any changes in Harmony processing that might explain the results I observed.
Hi,
Thanks for developing harmony, it is definitely a great tool for sample integration.
One question i have is that, i know there are PCElbowPlot
for viewing the SD change for each PCs, and MetageneBicorPlot
for CCA to view the correlation strength for each CC. I would like to know is there a function such as these for harmony to visualize a certain metrics to select the most appropriate harmony dimension for tSNE and clustering other than DimHeatmap
.
Thanks,
Shuyang
Hi there,
I figure it's in the pipeline, but it seems that--although still in beta--Seurat v3's data structures have been decided for the full release. I'm just wondering if there's a projected timeline to support direct input of a Seurat v3.x object
David
Hi, I succefully installed it but when run it, I still meet the following error. Epithelial is a Seurat object .
#############
Epithelial <- RunHarmony(Epithelial,"Batch")
Starting harmony
Using top 20 PCs
Error in cpp_object_initializer(.self, .refClassDef, ...) :
could not find function "cpp_object_initializer"
Hi, I tried to install harmony to re-analyze my single cells data, but during the installation I get an error (see below), what can I do?
Thanks
Hugo
#script
library(devtools)
install_github("immunogenomics/harmony")
#error during installation
harmony.cpp:236:1: warning: control may reach end of non-void function [-Wreturn-type]
}
^
1 warning generated.
clang++ -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o harmony.so RcppExports.o harmony.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin15/6.1.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin15/6.1.0'
ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [harmony.so] Error 1
ERROR: compilation failed for package ‘harmony’
Thanks for developing Harmony, and integrating it with Seurat. It is simple to use, and surely faster.
However, I had a few issues:
Firstly, there was no documentation for the functions. ?RunHarmony did not return any help. This was a bit frustrating, as I could not understand the parameters.
Secondly, in the vignette 'Aligning 10X PBMCs' presented, I think the command,
system.time(pbmc %<>% RunHarmony("stim", theta = 2, plot_convergence = TRUE, nclust = 50,
max.iter.cluster = 100, max.iter.harmony = 4))
should have max.iter.harmony = 10, as the run shows 10 iterations from 1/10 till the end.
This was confusing to me as I did not understand this incongruity at first. Just a small thing to fix :)
Third and last, I actually did not get RunHarmony to run. I got the error:
Error in if (return_object) { : argument is not interpretable as logical
I couldn't understand why this was happening. The command would run iterations, converge, plot, and then give error.
Thus, I went inside the code and ran all the commands following HarmonyMatrix individually. They all ran perfectly, and I could get harmony to work. I cannot point exactly to the reason why this error is occurring.
One more thing, and this is only to clear a confusion I had: the 'dims.use' is not actually passed to HarmonyMatrix. Why is it calculated in RunHarmony command.
Sorry to combine multiple issues together. But once Harmony started working, the speed and results were impressive. Hopefully, these things can be sorted out!
I have a 30000 * 110000 dgCMatrix. When I running HarmonyMatrix function, it always throws me an error:
Error in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
BLAS/LAPACK routine 'DLASCL' gave error code -4
Calls: HarmonyMatrix -> ncol -> -> do.call ->
Execution halted
Even I give HarmonyMatrix a 110000 * 20 pca matrix and do_pca=F, I got the same error.
Hi, thank you for the amazing tool!
I have a question about the convergence plot interpretation. Is it that objective function the lower the better and the harmony_idx the higher the better? Sometimes, I will have a plot like this, where in the final iteration the dots jumped up and the message says it is converged. Is it suppose to be like this?
Thank you!
Can you provide one example of "meta_data", what is format of meta_data and what is in meta_data?
Hi,
thank you for your great tool - it works very well for our experiments with relatively hight batch effect. We have two different experimental conditions (treated and untreated) with 3 replicates each, we use Seurat (v3) to merge those experiments. Do you have any suggestions/guidelines when (before or after batch-correction) and how to merge the experiments when using harmony and Seurat?
Thank you in advance!
Some of the cells are included in 2 blocks instead of 1 block.
Here's what I see when I run Harmony with 97 cells:
idx_min 0
idx_max 4
idx_list 0 1 2 3 4
idx_min 4
idx_max 9
idx_list 4 5 6 7 8 9
idx_min 9
idx_max 14
idx_list 9 10 11 12 13 14
idx_min 14
idx_max 19
idx_list 14 15 16 17 18 19
...
idx_min 87
idx_max 92
idx_list 87 88 89 90 91 92
idx_min 92
idx_max 96
idx_list 92 93 94 95 96
Notice that:
This probably doesn't matter too much, but I guess you intended for each cell to be included in a single block.
Hi Harmony developers,
thanks for the great package, really enjoy how easy and fast it is to plug harmony into existing Seurat pipelines.
I used harmony previously to align cells from different genotypes with success. I now wanted to integrate cells sampled at different developmental time points. My seurat object has all the processing needed and the variable is encoded as a character in the meta.data table. I also tried running it as factor but got the same error.
Im running harmony like this:
expression_seurat_final_test <- RunHarmony(expression_seurat_final, "Age")
and get the following cryptic error:
Starting harmony Using top 50 PCs Error in harmonyObj$setup(pc_mat, phi, Pr_b, sigma, theta, max.iter.cluster, : Expecting a single value: [extent=49].
The error clearly seems to come from the following call of the HarmonyMatrix function but I could not figure out what is causing it exactly:
Line 106 in 6f07162
I would highly appreciate some help with this problem!
Thanks in advance,
Florian
Hi,
I want to extract top genes from harmony results, and the value of genes were very large. Is this normal? And I can select a knee value as the threshold to extract the top genes like in pca?
Thanks,
Ran
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.