GithubHelp home page GithubHelp logo

jkrijthe / rtsne Goto Github PK

View Code? Open in Web Editor NEW
254.0 19.0 66.0 633 KB

R wrapper for Van der Maaten's Barnes-Hut implementation of t-Distributed Stochastic Neighbor Embedding

License: Other

R 30.43% C++ 69.57%

rtsne's Introduction

CRAN version R-CMD-check codecov.io CRAN mirror downloads

R wrapper for Van der Maaten’s Barnes-Hut implementation of t-Distributed Stochastic Neighbor Embedding

Installation

To install from CRAN:

install.packages("Rtsne") # Install Rtsne package from CRAN

To install the latest version from the github repository, use:

if(!require(devtools)) install.packages("devtools") # If not already installed
devtools::install_github("jkrijthe/Rtsne")

Usage

After installing the package, use the following code to run a simple example (to install, see below).

library(Rtsne) # Load package
iris_unique <- unique(iris) # Remove duplicates
set.seed(42) # Sets seed for reproducibility
tsne_out <- Rtsne(as.matrix(iris_unique[,1:4])) # Run TSNE
plot(tsne_out$Y,col=iris_unique$Species,asp=1) # Plot the result

Details

This R package offers a wrapper around the Barnes-Hut TSNE C++ implementation of [2] [3]. Changes were made to the original code to allow it to function as an R package and to add additional functionality and speed improvements.

References

[1] L.J.P. van der Maaten and G.E. Hinton. “Visualizing High-Dimensional Data Using t-SNE.” Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

[2] L.J.P van der Maaten. “Accelerating t-SNE using tree-based algorithms.” Journal of Machine Learning Research 15.1:3221-3245, 2014.

[3] https://lvdmaaten.github.io/tsne/

rtsne's People

Contributors

by321 avatar daniel-wells avatar jkrijthe avatar laurentgatto avatar michaelchirico avatar richierocks avatar talgalili avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rtsne's Issues

Rtsne error: "Remove duplicates before running TSNE."

I have a matrix with dimension 19481x9187 and I got error message ""Remove duplicates before running TSNE." When I run below code:

dim(est)
[1] 19481 9187
is.matrix(est)
[1] TRUE
est.rtsne<-Rtsne(X=est, check_duplicates=FALSE, dims=2, initial_dims=10)

Do you have some suggestions?

Variance needed for both axes

Hi!
We know in PCA results there's a 'percentVar' for explaining contributions of each of components.
Is there a similar one for tSNE dimensionality reduction?
Thanks!

> str(tsne_out)
List of 13
 $ theta              : num 0.5
 $ perplexity         : num 30
 $ N                  : int 149
 $ origD              : int 4
 $ Y                  : num [1:149, 1:2] -17.7 -20.1 -20 -20.4 -17.7 ...
 $ costs              : num [1:149] 1.03e-04 -2.29e-04 -5.16e-05 -3.76e-04 -3.55e-05 ...
 $ itercosts          : num [1:20] 44.6 45.2 43 45.4 44.3 ...
 $ stop_lying_iter    : int 250
 $ mom_switch_iter    : int 250
 $ momentum           : num 0.5
 $ final_momentum     : num 0.8
 $ eta                : num 200
 $ exaggeration_factor: num 12

benchmarks

How does Rtsne compare to other tsne implementations for R?

Rtsne on large dataset

Hi developer,
I have a dataset of 40 million observations and 40 variables. May I know how to reduce the dimension of this type of large dataset, get the coordinates of 40 million observations to visualize in 2-dimension?
Thanks,
-Ivy

is_distance = TRUE experimental ?

in the man page it says supplying a precomputed distance matrix is experimental, could you comment on that? is it safe to use? thanks!

Multicore support

Is there a way to run this in multiple cores to speed up the execution time?

tsne versus som

I have successfully generated a SOM for a dataset of ~20k variables (drugs) for observations (GI50) against 422 protein targets. Using the som_init tools (ratio of first 2 principal components) for setting SOM dimensions yields a SOM of ~1000 clusters (xdim=20 ydim=50).
The 422 protein targets are labelled according to the protein target with a suffix associated with 7 protein classes. Each SOM cluster's members can be parsed for which of the 7 protein classes are most represented within each SOM cluster.

My problem (question) is how to generate a tsne result where each element is colored according to the 7 protein classes. Any suggestions would be appreciated.

In order to complete this analysis, enabling parallelization would be very helpful. Is there a 'parallel' option on the tsne command.

Respectfully,
llevoc

RTsne output sometimes gives fewer rows than the input data frame

I'm unsure if this is intended? And if it is, is there a way to see what rows are not calculated? I want to be able to cbind the results to my original data (so that i can plot it using color labels from the original data), but a mismatch between input and output dimensions obviously makes this impossible.

I'm running RTsne on a 15000x3000 data frame with no duplicates, it gives no errors or warnings and completes without issues, but the Y table in the output has ~500 less rows than the input.

Cannot install Rtsne on R-devel 3.3.0

Hi there

I'm developing a package, scater, which benefits from using Rtsne. I have been using Rtsne successfully on previous versions of R, but having just switched to R-devel 3.3.0, I can no longer get Rtsne to install, either with install.packages or installing directly from Github - both show the error output shown below.

Hope you can advise how to get the package to install, as it is very useful for our work.

Best wishes
Davis

> devtools::install_github("jkrijthe/Rtsne")
Downloading GitHub repo jkrijthe/Rtsne@master
Installing Rtsne
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ  \
  --no-save --no-restore CMD INSTALL  \
  '/private/var/folders/11/qp2jn0wx7t5bgjb74nkwbl3h0000gp/T/Rtmp0xzeRY/devtools57fa675732f5/jkrijthe-Rtsne-2489f20'  \
  --library='/Library/Frameworks/R.framework/Versions/3.3/Resources/library'  \
  --install-tests 

* installing *source* packageRtsne...
** libs
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c RcppExports.cpp -o RcppExports.o
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c Rtsne.cpp -o Rtsne.o
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c sptree.cpp -o sptree.o
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c tsne.cpp -o tsne.o
clang++ -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o Rtsne.so RcppExports.o Rtsne.o sptree.o tsne.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/lib/gcc/x86_64-apple-darwin13.0.0/4.8.2 -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/usr/local/lib/gcc/x86_64-apple-darwin13.0.0/4.8.2'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Rtsne.so] Error 1
ERROR: compilation failed for packageRtsne* removing/Library/Frameworks/R.framework/Versions/3.3/Resources/library/RtsneError: Command failed (1)

session info

> sessionInfo()
R Under development (unstable) (2016-01-20 r69964)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scater_0.1.12       ggplot2_2.0.0       Biobase_2.31.3      BiocGenerics_0.17.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3      lattice_0.20-33  XML_3.98-1.3     bitops_1.0-6    
 [5] grid_3.3.0       plyr_1.8.3       gtable_0.1.2     stats4_3.3.0    
 [9] scales_0.3.0     viridis_0.3.2    limma_3.27.9     Matrix_1.2-3    
[13] splines_3.3.0    tools_3.3.0      munsell_0.4.2    RCurl_1.95-4.7  
[17] colorspace_1.2-6 VGAM_1.0-0       gridExtra_2.0.0 

Seg fault problem

When I run multiple Rtsne in paralell, I got this error:

`An irrecoverable exception occurred. R is aborting now ...

*** caught segfault ***
address 0x1a1, cause 'memory not mapped'

Traceback:
1: .Call("Rtsne_Rtsne_cpp", PACKAGE = "Rtsne", X, no_dims_in, perplexity_in, theta_in, verbose, max_iter, distance_precomputed, Y_in, init, stop_lying_iter_in, mom_switch_iter_in, momentum_in, final_momentum_in, eta_in, exaggeration_factor_in)
`
I have already reinstalled R and all packages

I really don't understand where the problem is coming from...

Failed to install the Rtsne package on the MacOS

Hi,
I am trying to install the package by using the command install.packages("Rtsne") # Install Rtsne package from CRAN or if(!require(devtools)) install.packages("devtools") # If not already installed devtools::install_github("jkrijthe/Rtsne"). However, both are failed and have the same error `clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Rtsne.so] Error 1
ERROR: compilation failed for package ‘Rtsne’

  • removing ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rtsne’
    Warning message:
    In i.p(...) :
    installation of package ‘/var/folders/9b/tp5f_t7s3qjg4n83xx22_z100000gn/T//RtmpVRF3Zb/file715344d8011c/Rtsne_0.16.tar.gz’ had non-zero exit status`
    Do you have any suggestion for this error?

Thank you very much!

Optionally skip prcomp step

3 cases where this would be useful:

  1. You want to run tsne on the full dataset, without pca (maybe the data are already correlated, centered, and scaled)
  2. You have a large dataset and want to run the prcomp step once, followed by many different tsne embedding. Perhaps you want to try many different embedding dimensions to see which one works best for your analysis, and you don't want to re-run tsne each time
  3. You have another pre-processing method you'd like to try (e.g. irlba or ICA).

Visualise iterations periodically during the runtime

It would be great if there could be a plotting option after a set number of iterations during the runtime of the algorithm, This is to see the progress and decide on right number of iterations easily.

Right now when I try to find the sweet spot(# iterations) I have to call Rtsne() multiple times with different parameters. This is time consuming and can be optimised with frequent feedback through plots like this http://projector.tensorflow.org/.

Error: protect(): protection stack overflow

Hey there!

I'm getting the following error when trying to run t-SNE on a large matrix of about ~3 million records.
Tried to increase the max-ppsize parameter R gets, but unfortunately no luck there.
Any ideas how to fix that?

Thank a bunch!
Chen

colors in e.g. plot

Great work on this. Just reading your example on the Iris dataset. Should it be col=iris_unique$Species since you've dropped a duplicate record?

Use R's RNG

It'd be nice to have this code use R's random number generator, such that each run will use a different seed, unless set.seed() is used to explicitly set the RNG.

E.g. you could use rand_seed=floor(runif(1, -1e12, 1e12))

Passing ... to prcomp

Currently Rtsne has ... but doesn't use them to pass parameters to inner functions. Would you consider a patch to pass them to prcomp?

The reason for this is to allow centering and scaling (default is FALSE) the data, which seems to make a difference in our case - see this issue (in particular from this comment on) for details.

An alternative to passing ... to prcomp would be to have explicit scale (and possible center) parameters in Rtsne.

Failed to load the distance matrix from aligned DNA sequences

Hi,

Thanks for your works. I am trying to use Rtsne to analysis the distance matrix from aligned DNA sequences as attached.

The error reported is

Error in .check_tsne_params(nrow(X), dims = dims, perplexity = perplexity, :
perplexity is too large for the number of samples

Here is the code:
`
library(ape)
library(Rtsne)

dna <- read.dna(file = "Norovirus.aligned.txt", format = 'fasta')
D <- dist.dna(dna, model = "TN93")
D_uniq <- unique(D)
dna.tsne <- Rtsne(as.matrix(normalize_input(as.matrix(D_uniq))))

or
dna <- read.dna(file = "Norovirus.aligned.txt", format = 'fasta')
D <- dist.dna(dna, model = "TN93")
dna.tsne <- Rtsne(as.matrix(D), is_distance = TRUE)
`

Norovirus.aligned.txt

R: 3.5.1
OS: MacOS 10.13.6

Any helps from you will be highly appreciated!

Cheers!

Performance Improvements via PCA

I see there is an effort to increase performance via parallel tSNE in #16. However, for some datasets the PCA (via SVD) preprocessing step dominates the runtime.

For example take a RNA expression dataset of single cells. There are 5069 cells, and 23556 genes (which I subset down to 5k)

# Download data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75330)
marques <- data.table::fread("curl ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE75nnn/GSE75330/suppl/GSE75330%5FMarques%5Fet%5Fal%5Fmol%5Fcounts2%2Etab%2Egz | zcat")
mat_marques <- t(as.matrix(marques[,-"cellid", with=FALSE]))

# subset to 5k genes and do PCA via SVD
system.time(marques_pca_A <- prcomp(mat_marques[,1:5000]))

# run Rtsne
system.time(marques_rtsne_A <- Rtsne::Rtsne(marques_pca_A$x[,1:50], pca=FALSE))
   user  system elapsed
503.787   5.419 511.918

   user  system elapsed
 39.383   0.816  40.448

The PCA step takes 12x longer than the tSNE step. It may not be obvious to some users that it's the PCA step which is taking most of the time, or that there are faster alternatives. I think it would be good to have a note in the manpage recommending doing the PCA step seperately with a faster package for large datasets e.g. flashpcaR

system.time(marques_pca_B <- flashpcaR::flashpca(mat_marques[,1:5000], ndim=50, stan="center"))
   user  system elapsed 
  3.400   0.273   3.682

Or perhaps a warning message when attempting to run Rtsne on a matix when both dimensions are largeish e.g.

if(min(dim(X)) > 2500){
warning("Using pca=TRUE can be slow for large datasets. Consider computing PCA seperately with a different package (e.g. flashpcaR).")
}

(the time complexity of SVD is O(min(pn^2, np^2)) )

Memory not released with multi-threading

Hi.

I noticed that when running Rtsne with num_thread=0 the used memory can increase dramatically and doesn't get released after Rtsne is finished, even after doing a call to gc(). This is especially noticable on a high-core machine, with 64 threads. When run with num_threads=1 memory is 181MB before running Rtsne and is 205MB after the call. With num_threads is 0 (on a 64 threads machine) the used mem after Rtsne is 6604MB. Even after calling gc() I cannot release the memory. In fact, setting num_threads artificially high to 2014, the used memory is 20GB.

Not 100% sure if my way to calculate memory usage using /proc is correct.

Any thoughts?

mem.used <- function(digits = 0) {
    file <- paste("/proc", Sys.getpid(), "stat", sep = "/")
    what <- vector("character", 52)
    vsz <- as.numeric(scan(file, what = what, quiet = TRUE)[23])
    vsz <- vsz / (1024**2) ## MB
    mem <- paste0(round(vsz, digits), "MB")
  mem
}

X <- matrix(rnorm(1000*100),1000,100)

mem.used()
system.time( pos <- Rtsne::Rtsne(t(X), num_threads = 1) )
mem.used()

system.time( pos <- Rtsne::Rtsne(t(X), num_threads = 128) )
mem.used()

gc()
mem.used()

itercost increment

Hi,
Thanks for your invaluable package.
I am reviewing the code in order to better understand your code organization vs the multicore implementation.
At L220 shouldn't be costi++?

Rtsne/src/tsne.cpp

Lines 219 to 220 in 14b195f

itercost[costi] = C;
itercost++;

Is the error (stored in itercost IIUC) monitored/used somewhere in the code?
Best.

Single-threaded on Windows, multi-threaded on Linux

Hi. The Rtnse function is running only on a single thread on Windows, while it is parallelized on Linux. Both operating systems are running on the same computer.

However, the same happens with running only PCA as well, which t-SNE performs in the beginning. So this is probably not an issue of this package, but I couldn't find a solution anywhere and wanted to try my chances here. Thanks in advance.

add CXX flags for openmp

openmp branch was not compiling with openmp enabled. While PKG_CFLAGS have been specified in Makevars, the code checks #ifdef _OPENMP, which requires openmp to be enabled during compilation as well. This can be fixed by adding the following line to Makevars:
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)
Thanks!

Clearer Parameter Documentation

For example, for X it would be worthwhile to specify that the rows are for samples and the columns are for variables. The opposite is usually true in bioinformatics. Also, perplexity is vauguely described as "numeric; Perplexity parameter". It would be good to explicitly state the range of values which it can take to avoid the "Perplexity is too large." error (i.e. one less than the number of samples divided by 3).

The effect with the order of features

Hi
I have some questions about Rtsne. For the database of iris, the default order is "Sepal.Length Sepal.Width Petal.Length Petal.Width", but i have a different result of Rtsne when i change the order as "Petal.Length Petal.Width Sepal.Length Sepal.Width" (or other orders). I'm very confused that the result is effected and changes by the features order.
What's more, I have try the different pipeline and get the different results, the only difference of the two pipeline is:
tsne1 <- Rtsne(iris_matrix)
tsne2 <- Rtsne(dist(iris_matrix))

why check_duplicates?

There was some speculation over at this SO question about what if any problems could arise from setting check_duplicates to FALSE. I just wanted to hear it from the devs, since the man page doesn't offer a rationale. Is it an estimation or speed/memory performance concern, or both?

My inkling would be to omit all dups and then assign them the same vector solved for the one included in the estimation. Think that's valid?

Reproducing R results in "pure" C++

Hello Jesse:

Is there a way to reproduce R results just in C++? The other repository:

https://github.com/lvdmaaten/bhtsne

has been updated recently as well, but I can see that the TSNE::run() function has 11 parameters there and 19 parameters in your repository. I tried downloading your C++ files but, contrary to Maaten's files, I can't compile them. If there is a way to compile your C++ files or to reproduce R results with Maaten's files, please let me know.

Regards,
Nik

constraint to dims now %in% 1,2,3 intended?

after example(Rtsne) in 0.15

> Rtsne(iris_matrix, theta=0.1, num_threads = 2, dim=4)
Error in .check_tsne_params(nrow(X), dims = dims, perplexity = perplexity,  : 
  dims should be either 1, 2 or 3

Is this necessary? Previous use of larger values of dims seemed useful. If
this is the way forward, please note in man page. Thanks!

Predicting on new data

It looks like the scikit learn folks are considering an implementation of Barnes-Hut t-SNE that allows for predictions on new data. (They're implementing fit and transform methods, rather than a single fit_transform method).

Would it be possible to do that here, and add a predict method to Rtsne?

Multiple tsne plots comparable

Dear developer:

I am interested using Rtsne for my research project. I would like to do research about the effects of different normalization methods on clustering using my high-dimensional dataset. I first normalized the original dataset using different normalization methods. Then, I showed several tsne plots (one plot for each normalized dataset), compared these plots and checked which normalization method is helpful for clustering algorithm. May I ask:

Are the x and y axes comparable across the plots for different normalized data? Could I compare these plots directly (e.g. the plot which shows clusters obviously means the dataset is well separated into each cluster than other plots). Here is my code to generate multiple tsne plots:

tsne_fit<-normalized data%>%Rtsne()
tsne_df<- tsne_fit$Y%>%as.data.frame()%>%rename(tSNE1='V1',tSNE2='V2')%>%mutate(ID=substr(rownames(normalized data), 7, 7))
tsne_df%>%ggplot(aes(x=tSNE1,y=tSNE2,color=ID))+geom_point()

Thank you so much for your help!

Is num_threads = 1.0 really using all the threads?

Hi,
In my hands (Windows 10, Rtsne 0.15), when I set num_threads = 4, the elapsed time is shorter and all cores seem to used. If I let the default (1.0), the computation takes longer although the doc says that the default will use all available threads.
As a player, I set num_threads to 0. It detects and use all cores.
If I set it to -1, R crashes, althought I thought it will all available cores minus 1 ;-)
Is 0 usage undocumented or Windows specific?
Best.

distance is squared within the Rtsne function when theta set to 0 and distance matrix provided

When a distance matrix is provided as input, I have noticed that the distance is squared within the Rtsne function before passing it to the Rtsne_cpp function, when the exact tsne is used:

if(is_distance & theta == 0) X <- X^2

Is there a specific reason to do so? Is it to have a quite equivalent approach when a raw data matrix is provided, as the data is transformed with an squared Euclidean distance by default (in the C++ code)? If so, why is it done only when the tsne is exact (theta == 0)?

Thanks in advance for your time, this is a very useful package.

Maximum perplexity constraint

Rtsne limits the maximum perplexity value (N-1)/3, where N is the number of data points. Is there a reasoning behind this constraint? I can't seem to justify it based on the theorey in the paeprs so I was curious of potential issues of increasing it above this threshold.

Thank you in advance!

Wish: prevent deteriorating between iterations

I am surprised that Rtsne does not improve (or stay the same) from iteration to iteration. Increasing the number of iterations eventually cycles between solutions of similar quality, but does not keep the best ones (e.g. for Rtsne(swiss, perplexity=15, max_iter=1000, verbose=TRUE)). Couldn't that be changed?

Best, Ulrike

Using kNN matrix directly in Barnes-Hut

It would be very useful to have a knn option, as an alternative to the experimental is_distance option for very large matrices, such that given n x k distance matrix can directly be plugged into bhTSNE.

CRAN submission and/or version bump

It would be useful to bump the package version (to 0.12.1, for example), to make it easier for dependencies to make use of new features in Rtsne. With a new version number, my package could import `Rtsne (>= 0.12.1) and make sure the pca pre-processing arguments are available.

A push to CRAN would be helpful at some point, to make sure that the latest version of Rtsne is installed/updated automatically.

Error in complete.cases(hto_dist_mtx) : long vectors not supported yet: complete_cases.c:192

Hi,

I am using the RunTSNE function from Seurat which calls Rtsne. I am analysing hashtag antibody data and have generated a distance matrix as input to tSNE.

hto_dist_mtx <- as.matrix(dist(t(GetAssayData(seurat_obj, assay = "HTO"))))

seurat_obj <- RunTSNE(seurat_obj, distance.matrix = hto_dist_mtx,
                      perplexity = 100, reduction.name = "tsnedist")

I am getting the error below. What does this mean and is there a way for me to fix this? Does it relate to the size of the matrix? I have 60,000 cells and six hashtag antibodies.

Error in complete.cases(hto_dist_mtx) : long vectors not supported yet: complete_cases.c:192

Traceback:

9. complete.cases(object)
8. na.fail.default(X)
7. na.fail(X)
6. Rtsne.default(X = object, dims = dim.embed, pca = FALSE, ...)
5. Rtsne(X = object, dims = dim.embed, pca = FALSE, ...)
4. RunTSNE.matrix(object = distance.matrix, assay = DefaultAssay(object = object),
seed.use = seed.use, tsne.method = tsne.method, dim.embed = dim.embed,
reduction.key = reduction.key, is_distance = TRUE, ...)
3. RunTSNE(object = distance.matrix, assay = DefaultAssay(object = object),
seed.use = seed.use, tsne.method = tsne.method, dim.embed = dim.embed,
reduction.key = reduction.key, is_distance = TRUE, ...)
2. RunTSNE.Seurat(seurat_obj, distance.matrix = hto_dist_mtx, perplexity = 100,
reduction.name = "tsnedist")
1. RunTSNE(seurat_obj, distance.matrix = hto_dist_mtx, perplexity = 100,
reduction.name = "tsnedist")

Many thanks,
Lucy

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /Filers/package/R-base/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /Filers/package/R-base/4.2.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cellhashR_1.0.3         forcats_0.5.2           stringr_1.4.1          
 [4] dplyr_1.0.10            purrr_0.3.5             readr_2.1.3            
 [7] tidyr_1.2.1             tibble_3.1.8            ggplot2_3.3.6          
[10] tidyverse_1.3.2         sp_1.5-0                SeuratObject_4.1.2.9003
[13] Seurat_4.2.0.9001      

loaded via a namespace (and not attached):
  [1] readxl_1.4.1          backports_1.4.1       plyr_1.8.7           
  [4] igraph_1.3.5          lazyeval_0.2.2        splines_4.2.0        
  [7] listenv_0.8.0         scattermore_0.8       digest_0.6.30        
 [10] htmltools_0.5.3       fansi_1.0.3           magrittr_2.0.3       
 [13] tensor_1.5            googlesheets4_1.0.1   cluster_2.1.4        
 [16] ROCR_1.0-11           tzdb_0.3.0            globals_0.16.1       
 [19] modelr_0.1.9          matrixStats_0.62.0    vroom_1.6.0          
 [22] spatstat.sparse_3.0-0 rmdformats_1.0.4      colorspace_2.0-3     
 [25] rvest_1.0.3           ggrepel_0.9.1         haven_2.5.1          
 [28] xfun_0.34             crayon_1.5.2          jsonlite_1.8.3       
 [31] progressr_0.11.0      spatstat.data_3.0-0   survival_3.4-0       
 [34] zoo_1.8-11            glue_1.6.2            polyclip_1.10-4      
 [37] gtable_0.3.1          gargle_1.2.1          leiden_0.4.3         
 [40] future.apply_1.9.1    abind_1.4-5           scales_1.2.1         
 [43] DBI_1.1.3             spatstat.random_2.2-0 miniUI_0.1.1.1       
 [46] Rcpp_1.0.9            viridisLite_0.4.1     xtable_1.8-4         
 [49] reticulate_1.26       spatstat.core_2.4-4   bit_4.0.4            
 [52] htmlwidgets_1.5.4     httr_1.4.4            RColorBrewer_1.1-3   
 [55] ellipsis_0.3.2        ica_1.0-3             farver_2.1.1         
 [58] pkgconfig_2.0.3       uwot_0.1.14           dbplyr_2.2.1         
 [61] deldir_1.0-6          utf8_1.2.2            labeling_0.4.2       
 [64] tidyselect_1.2.0      rlang_1.0.6           reshape2_1.4.4       
 [67] later_1.3.0           munsell_0.5.0         cellranger_1.1.0     
 [70] tools_4.2.0           cli_3.4.1             generics_0.1.3       
 [73] broom_1.0.1           ggridges_0.5.4        evaluate_0.17        
 [76] fastmap_1.1.0         yaml_2.3.6            goftest_1.2-3        
 [79] bit64_4.0.5           knitr_1.40            fs_1.5.2             
 [82] fitdistrplus_1.1-8    RANN_2.6.1            pbapply_1.5-0        
 [85] future_1.28.0         nlme_3.1-160          mime_0.12            
 [88] ggrastr_1.0.1         xml2_1.3.3            compiler_4.2.0       
 [91] rstudioapi_0.14       beeswarm_0.4.0        plotly_4.10.0        
 [94] png_0.1-7             spatstat.utils_3.0-1  reprex_2.0.2         
 [97] stringi_1.7.8         rgeos_0.5-9           lattice_0.20-45      
[100] Matrix_1.5-1          vctrs_0.5.0           pillar_1.8.1         
[103] lifecycle_1.0.3       spatstat.geom_3.0-3   lmtest_0.9-40        
[106] RcppAnnoy_0.0.20      data.table_1.14.4     cowplot_1.1.1        
[109] irlba_2.3.5.1         httpuv_1.6.6          patchwork_1.1.2      
[112] R6_2.5.1              bookdown_0.29         promises_1.2.0.1     
[115] KernSmooth_2.23-20    gridExtra_2.3         vipor_0.4.5          
[118] parallelly_1.32.1     codetools_0.2-18      MASS_7.3-58.1        
[121] assertthat_0.2.1      withr_2.5.0           sctransform_0.3.5    
[124] mgcv_1.8-41           parallel_4.2.0        hms_1.1.2            
[127] grid_4.2.0            rpart_4.1.19          rmarkdown_2.17       
[130] googledrive_2.0.0     Rtsne_0.16            shiny_1.7.3          
[133] lubridate_1.8.0       ggbeeswarm_0.6.0     

missing datapoint for big data?

see the number of row is off by 1, but it works for iris dataset with no problem

> dim(juice(rec))
[1] 7939 1479
> #juice(rec, starts_with("ledger"))
> library(Rtsne)
> ts=Rtsne(juice(rec), check_duplicates=FALSE, pca=TRUE, perplexity=30, theta=0.5, dims=2)
> dim(ts$Y)
[1] 7938    2
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tsne_0.1-3           janitor_0.3.0        bindrcpp_0.2        
 [4] broom_0.4.2          factoextra_1.0.5     dbscan_1.1-1        
 [7] Rtsne_0.13           purrr_0.2.3          readr_1.1.1         
[10] tidyr_0.7.1          tibble_1.3.4         ggplot2_2.2.1       
[13] tidyverse_1.1.1      recipes_0.1.0        dplyr_0.7.4         
[16] RevoUtils_10.0.6     RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] ggrepel_0.7.0     Rcpp_0.12.13      lubridate_1.6.0   lattice_0.20-35  
 [5] class_7.3-14      digest_0.6.12     assertthat_0.2.0  ipred_0.9-6      
 [9] psych_1.7.8       R6_2.2.2          cellranger_1.1.0  plyr_1.8.4       
[13] httr_1.3.1        rlang_0.1.2       lazyeval_0.2.0    readxl_1.0.0     
[17] kernlab_0.9-25    rpart_4.1-11      Matrix_1.2-11     labeling_0.3     
[21] splines_3.4.2     CVST_0.2-1        ddalpha_1.3.1     gower_0.1.2      
[25] stringr_1.2.0     foreign_0.8-69    munsell_0.4.3     compiler_3.4.2   
[29] modelr_0.1.1      pkgconfig_2.0.1   mnormt_1.5-5      dimRed_0.1.0     
[33] tidyselect_0.2.2  nnet_7.3-12       prodlim_1.6.1     DRR_0.0.2        
[37] RcppRoll_0.2.2    ggpubr_0.1.5      MASS_7.3-47       grid_3.4.2       
[41] nlme_3.1-131      jsonlite_1.4      gtable_0.2.0      magrittr_1.5     
[45] scales_0.5.0      stringi_1.1.5     reshape2_1.4.2    timeDate_3012.100
[49] robustbase_0.92-7 xml2_1.1.1        lava_1.5.1        tools_3.4.2      
[53] forcats_0.2.0     glue_1.1.1        DEoptimR_1.0-8    sfsmisc_1.1-1    
[57] hms_0.3           parallel_3.4.2    survival_2.41-3   yaml_2.1.14      
[61] colorspace_1.3-2  rvest_0.3.2       knitr_1.17        bindr_0.1        
[65] haven_1.1.0   

THANK YOU

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.