beringresearch / ivis Goto Github PK

View Code? Open in Web Editor NEW

315.0 315.0 42.0 36.14 MB

Dimensionality reduction in very large datasets using Siamese Networks

Home Page: https://beringresearch.github.io/ivis/

License: Apache License 2.0

Python 92.93% R 6.81% TeX 0.26%

data-visualization dimensionality-reduction machine-learning neural-network siamese-neural-network

ivis's People

Contributors

Stargazers

Watchers

ivis's Issues

Add a vignette to the R package

Hello,

For JOSS review.

Is your feature request related to a problem? Please describe.

The R package lacks documentation of an application to a real-life dataset.

Describe the solution you'd like

Please add a vignette in the R package demonstrating at least an example application to a single-cell dataset. Basically, the equivalent of the scanpy workflow here.

A convenient way to use the pbmc3k dataset for demonstration purposes is the Bioconductor TENxPBMCData package.

Suggested code:

library(TENxPBMCData)
tenx_ pbmc3k <- TENxPBMCData(dataset = "pbmc3k")

Ideally, consider using the vignette (or a separate one) to also give an introduction to the functionality of the R package.
It is not necessary to duplicate information already described in the documentation of the Python package (DRY principle); you may simply include a link to the main page.

Describe alternatives you've considered

A working example of an R workflow could also be included in the documentation of the Python package, although this is probably unnecessarily difficult to maintain.
Ideally, that example would be run and tested for every new release of the Python and R source code.

Additional context
Once you have an R vignette written, you should also consider using pkgdown to automatically create a GitHub website including the full package documentation.

`NotFittedError` after caching and reloading fitted `Ivis` instance

The issue

A fitted Ivis instance is not adequately preserved when joblib.dump() is used to save it. Consequently, when Ivis is used as part of a sklearn.pipeline.Pipeline object with memory != None, errors occur.

Minimal reproducible examples

Two examples are provided herein: one with sklearn.pipeline.Pipeline, and other with joblib only (sklearn uses joblib in sklearn.pipeline.Pipeline, so I thought this second example could help).

Environment

A virtual environment was created specifically for this project, wherein all modules specified in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

Runtime

python=3.9.5

Relevant modules

ivis=2.0.4
tensorflow=2.5.0

Example with `sklearn.pipeline.Pipeline`

Script

import tempfile
import ivis

from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing, svm


X, y = datasets.load_iris(return_X_y=True)

pipeline_with_ivis = pipeline.Pipeline([
    ("normalize", preprocessing.MinMaxScaler()),
    ("project", None),
    ("classify", None),
], memory=tempfile.mkdtemp())

parameter_grid = {
    "project": (ivis.Ivis(verbose=0),),
    "project__k": (15,),

    "classify": (ensemble.RandomForestClassifier(), svm.SVC()),
    "classify__random_state": (2021,)
}

grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, verbose=3,
                                           return_train_score=True).fit(X, y)  # should fail

Log with errors

Fitting 10 folds for each of 2 candidates, totalling 20 fits
[CV 1/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=  11.3s
[CV 2/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   4.3s
[CV 3/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   8.6s
[CV 4/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   3.9s
[CV 5/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   6.4s
[CV 6/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   5.8s
[CV 7/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   4.5s
[CV 8/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   5.3s
[CV 9/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.667) total time=   4.3s
[CV 10/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   3.8s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 1/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 2/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 3/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
[CV 4/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
[CV 5/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 6/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
[CV 7/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 8/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 9/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 10/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [0.88666667        nan]
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the train scores are non-finite: [ 1. nan]
  warnings.warn(

Example without `sklearn.pipeline.Pipeline`

Script

import ivis
import joblib

from sklearn import datasets, model_selection


X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33, random_state=42)
model = ivis.Ivis(k=15, batch_size=15, verbose=0).fit(X_train, y_train)

joblib.dump(model, "ivis.pkl")

new_model = joblib.load("ivis.pkl")

model.transform(X_test)      # should work
new_model.transform(X_test)  # should fail

Log with errors

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<USER_FOLDER>\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\211.7142.13\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "<USER_FOLDER>\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\211.7142.13\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "<REPOSITORY_ROOT>/playground3.py", line 20, in <module>
    new_model.transform(X_test)  # should fail
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\ivis\ivis.py", line 329, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.

Discussion

As seen in the example with sklearn.pipeline.Pipeline and sklearn.model_selection.GridSearchCV, everything runs smoothly when Ivis is fitted the first time for all folds. When the model is cached and retrieved for the subsequent runs, however, errors happen because at least Ivis.encoder is missing. Upon experimentation, it was found that even after loading Ivis.encoder, errors happened with the reloaded model, indicating that other important attributes were not properly pickled.

Although I never tested such functions, it seems that saving and loading capabilities were already developed for Ivis in Ivis.save_model() and Ivis.load_model(). However, to ensure that Ivis is pickleable, it would be ideal to transfer and adapt this functionality to Ivis.__getstate__() and Ivis.__setstate__() (the latter of which does not exist AFAIK) so that pickle and joblib know how to pickle an Ivis instance. This would enable its employment in Pipeline objects with memory != None, thus significantly speeding up the hyper-parameter fine-tuning process performed by GridSearchCV.

Ivis produces different embeddings from a saved model across multiple tensorflow sessions

Running ivis.transform on a pre-built model across different tensorflow sessions produces different embeddings. Embeddings are consistent within a session, but once tf session is restarted and model is reloaded, embeddings change.

This was introduced in 2.0 upgrade as earlier versions of ivis behaved as expected. Additionally, this only seems to affect larger datasets. I don't see this in iris dataset, but in 500k+ row dataset it's present.

Things I checked that seem to be ok:

model loading: model weights, optimizer weights all appear to be consistent between sessions. So this isn't an issue with incorrect initialisation

toggling GPU training: the bug seems to be present when running both CPU and GPU transformations.

data normalization: data normalization stays consistent i.e. input data is not altered between sessions.

How to get stable results?

Hello Folks,

thank you for all the work on this lib. I have a question about reproducibility: Is there a way to set a random seed or random state and get stable results?

I'm trying to achieve this with:

import random
import numpy
random.seed(42)
numpy.random.seed(42)

I'm aware that these are not threadsafe, so this may be the reason of the not reproducible results. Anyway, is there any way to enforce this?

Under tensorflow 2.2.0 ivis fails to load a saved model

To reproduce:

pip3 install tensorflow --upgrade

from sklearn.datasets import load_iris
from ivis import Ivis

X = load_iris()['data']
y = load_iris()['target']

# Supervised and unsupervised modes result in the same error
ivis = Ivis(k=5, batch_size=8).fit(X, y)
ivis.save_model('tmp.ivis')

model = Ivis()
model.load_model('tmp.ivis')

This results in AttributeError: 'Model' object has no attribute '_make_predict_function'

Custom generator for training on out-of-memory datasets

In https://bering-ivis.readthedocs.io/en/latest/oom_datasets.html, for out-of-memory datasets, you say to train on h5 files that exist on disk.

In my case, I can't use h5 files, but I could use a custom generator which yields numpy array batched data.

Is there a way to provide batched data through a custom generator function? Something like keras' fit_generator.

Thank you

Add multi-label support to supervised ivis

Propose to encode multi-label response variables using sklearn's MultiLabelBinarizer.

`chunk_size` in knn set to 0

Describe the bug
It seems chunk_size in ivis.data.neighbour_retrieval.knn is set to 0 for my dataset, which has shape (6, 784).

Stack trace

Building KNN index
100%|█████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1318.07it/s]
Extracting KNN neighbours
Traceback (most recent call last):
  File "main.py", line 17, in <module>
    viz.visualize(*embeddings, dpi=150)
  File "/Users/ryedida/Desktop/CSC522/userdata_mining/visualization/embeddings.py", line 76, in visualize
    x = self._reduce_dims(arg)
  File "/Users/ryedida/Desktop/CSC522/userdata_mining/visualization/embeddings.py", line 50, in _reduce_dims
    return ivis.fit_transform(arg)
  File "/usr/local/lib/python3.8/site-packages/ivis/ivis.py", line 336, in fit_transform
    self.fit(X, Y, shuffle_mode)
  File "/usr/local/lib/python3.8/site-packages/ivis/ivis.py", line 314, in fit
    self._fit(X, Y, shuffle_mode)
  File "/usr/local/lib/python3.8/site-packages/ivis/ivis.py", line 179, in _fit
    self.neighbour_matrix = AnnoyKnnMatrix.build(X, path=self.annoy_index_path,
  File "/usr/local/lib/python3.8/site-packages/ivis/data/neighbour_retrieval/knn.py", line 60, in build
    return cls(index, X.shape, path, k, search_k, precompute, include_distances, verbose)
  File "/usr/local/lib/python3.8/site-packages/ivis/data/neighbour_retrieval/knn.py", line 47, in __init__
    self.precomputed_neighbours = self.get_neighbour_indices()
  File "/usr/local/lib/python3.8/site-packages/ivis/data/neighbour_retrieval/knn.py", line 93, in get_neighbour_indices
    return extract_knn(
  File "/usr/local/lib/python3.8/site-packages/ivis/data/neighbour_retrieval/knn.py", line 189, in extract_knn
    for i in range(0, data_shape[0], chunk_size):
ValueError: range() arg 3 must not be zero

Desktop (please complete the following information):

OS: macOS Big Sur
Browser [e.g. chrome, safari]: Firefox
Version [e.g. 22]: 87.0

Additional context
Python 3.8. embedding_dims was set to 2, k was set to 3.

Applying ivis on sequences

I would like to apply ivis on a high dim time series/sequence data. Is there a way to achieve this with the current version?

Suggest implementing `predict_proba` and `predict` methods for Ivis object.

This will enable ivis to be used in sklearn GridSearchCV and utilise various scoring function.

Enable registration or passing of a custom triplet loss function

In Python, Ivis.__init__ accepts a distance: str keyword argument, which sets from a dictionary a predefined triplet loss function for that distance metric. Currently, one of the ways to provide a custom distance function is to monkeypatch the ivis.nn.losses.get_loss_functions. Other ways to accomplish the same are even messier from the perspectives of usage and implementation.

The nature of dimensionality reduction, especially when dealing with one-hot-encoded categorical features, sometimes requires custom ways to calculate loss. Under the hood, ivis has the ability to enable custom loss functions, but any such offerings need to be implemented in a clean and API-idiomatic manner.

A custom distance function requires its own triplet loss implementation. Ivis.__init__ could support an additional keyword argument (e.g. triplet_loss: Callable[..., ...] = ...) for users to be able to pass their own.

Alternatively, it could simply be passed inside the existing distance kwarg, with its signature changing to distance: Union[str, Callable[..., ...]].

Another way would be to make the losses dictionary built by ivis.nn.losses.get_loss_functions a module-level loss function registrar.

Additionally, docs and examples need to be updated on how to correctly implement a custom loss function. With all currently available distance metrics, the triplet loss implementation follows a very similar pattern, and should not be too daunting to attempt to implement.

How is the performance of ivis comparing to other single cell embedding methods?

General questions about algorithm design and usage.
Hi,
It is a great new method to learn the low dimensional embedding of the high dimensional single cell data. But how about comparing to other scRNA-seq embedding methods? There are lots of methods for scRNA-seq dimension reduction, for example ZIFA [1], ZINB-Wave [2], DCA [3], scvi [4], scvis [5], scScope [6] etc. Most of them are zero-inflated matrix factorization analysis or denoising/zero-inflated auto-encoders. Thanks.

[1] Pierson, Emma, and Christopher Yau. "ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis." Genome biology 16.1 (2015): 241.
[2] Risso, Davide, et al. "A general and flexible method for signal extraction from single-cell RNA-seq data." Nature communications 9.1 (2018): 284.
[3] Eraslan, Gökcen, et al. "Single-cell RNA-seq denoising using a deep count autoencoder." Nature communications 10.1 (2019): 390.
[4] Lopez, Romain, et al. "Deep generative modeling for single-cell transcriptomics." Nature methods 15.12 (2018): 1053.
[5] Ding, Jiarui, Anne Condon, and Sohrab P. Shah. "Interpretable dimensionality reduction of single cell transcriptome data with deep generative models." Nature communications 9.1 (2018): 2002.
[6] Deng, Yue, et al. "Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning." Nature methods 16.4 (2019): 311.

How does ivis compare to UMAP?

I read the paper of this project, find the method is similar with umap(based on KNN). So,what's the differences between these methods?

Cosine distance is not compatible with TF import

Current behaviour when using distance='cosine' throws an error:

AttributeError: module 'tensorflow.keras.losses' has no attribute 'cosine_distance'

Potential fix is changing module imports to: tf.compat.v1.losses.cosine_distance

Implement a parallel retrieval of kNNS

Issue running the example code to test the R package

Hello,

For JOSS review.

I am running into the following issue when running the example R code (given in this README.md) in an R console in my terminal.

The package was successfully installed in an R console in my terminal as described in #28

The main error below seems to be: UnboundLocalError: local variable 'a' referenced before assignment.

> library(ivis)
> 
> model <- ivis(k = 3, batch_size = 3)
Using TensorFlow backend.
/Users/kevin/miniconda3/envs/ivis/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
  return f(*args, **kwds)
> 
> X = data.matrix(iris[, 1:4])
> model = model$fit(X)
Building KNN index
/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/data/knn.py:15: FutureWarning: The default argument for metric will be removed in future version of Annoy. Please pass metric='angular' explicitly.
  index = AnnoyIndex(X.shape[1])
100%|███████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 127667.53it/s]
Extracting KNN from index
/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/data/knn.py:92: FutureWarning: The default argument for metric will be removed in future version of Annoy. Please pass metric='angular' explicitly.
  self.index = AnnoyIndex(n_dims)
 11%|████████▌                                                                  | 17/150 [00:00<00:00, 141.19it/s]
WARNING:tensorflow:From /Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  UnboundLocalError: local variable 'a' referenced before assignment

Detailed traceback: 
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/ivis.py", line 209, in fit
    self._fit(X, Y, shuffle_mode)
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/ivis.py", line 147, in _fit
    triplet_network(base_network(self.model_def, input_size),
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/nn/network.py", line 41, in base_network
    return default_base_network(input_shape)
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/nn/network.py", line 61, in default_base_network
    x = AlphaDropout(0.1)(x)
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/keras/layers/noise.py", line 165, in call
    return K
> 
> xy = model$transform(X)
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  AttributeError: 'NoneType' object has no attribute 'predict'

Detailed traceback: 
  File "/Users/kevin/miniconda3/envs/ivis/lib/python3.7/site-packages/ivis/ivis.py", line 248, in transform
    embedding = self.encoder.predict(X, verbose=self.verbose)

Session info reported below

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin18.5.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ivis_1.1.3

loaded via a namespace (and not attached):
[1] compiler_3.6.0     BiocManager_1.30.4 Matrix_1.2-17      tools_3.6.0       
[5] Rcpp_1.0.2         reticulate_1.13    grid_3.6.0         jsonlite_1.6      
[9] lattice_0.20-38

Add progress indicator to index generator

implement thread-safe generator for multiple workers while using HDF5 for training

Add n_jobs hyper parameter to control number of cores that are dedicated to multiprocessing tasks

Wrong README file referenced in the setup script

Minor bug that I noticed while I was working in a conda-forge recipe for ivis. In the setup script, the README.md file is referenced, but it is not being packaged with the source (README.txt is).

with open('README.md', encoding='utf-8') as f:
    long_description = f.read()

model_save: optimizer is not compatible with pickle

When attempting to use save_model after fitting a supervised Ivis instance, I get an error when trying to save. It looks like some part of the optimizer is not compatible to be pickled with python.

Replicate:

import ivis
i = ivis.Ivis(embedding_dims=10, n_epochs_without_progress=5)
i.fit(X, y)
i.save_model("model.ivis")

Traceback (most recent call last):
  File "src/ivis_persist.py", line 69, in <module>
    ivises[output].save_model(f"models/{output}.ivis")
  File "/Users/pbaumgartner/anaconda3/envs/env/lib/python3.7/site-packages/ivis/ivis.py", line 404, in save_model
    pkl.dump(self.model_.optimizer, f)
AttributeError: Can't pickle local object 'make_gradient_clipnorm_fn.<locals>.<lambda>'

System Info:
Running ivis==2.0.0 on macOS with python 3.7.

tensorflow's Filling shuffle buffer is slow to fill up on very large datasets

Before each epoch, tensorflow fills up a shuffle buffer:

Filling up shuffle buffer (this may take a while): 69 of 79

This is not optimal behaviour for large datasets. Potential solution here: tensorflow/tensorflow#30646 (comment)

Add support for sparse numpy matrices

Potentially use .toarray() within batch generator.

Add JOSS Badge to README

Just noticed you don't have the JOSS badge on your README - it would be helpful if you could add the badge to link directly to the corresponding article:

Unstable result on dimensionality reduction

I'm trying Ivis for dimensionality reduction on Iris dataset. My code is as follows:

from ivis import Ivis
ivis_model = Ivis(embedding_dims=3, k=5, verbose=False, model='hinton')
ivis_data = ivis_model.fit_transform(data.drop(["species"], axis=1).values)

The problem is each of the time I run the above code chunk, I get different results. What can cause this unstability issue?

Installing Ivis results in the installation of "tensorflow" even when "tensorflow-gpu" is installed.

Describe the bug
Because tensorflow is in setup.py, installing ivis results in the installation of the tensorflow package, which is the CPU-only version of Tensorflow, even if the user already has tensorflow-gpu installed. When the user subsequently uses Ivis (or anything else dependent on Tensorflow), the CPU version will be used. In order to utilize the GPU version of tensorflow, the user must uninstall tensorflow and reinstall tensorflow-gpu after installing ivis.

To Reproduce
pip install ivis

Expected behavior
The user most likely does not expect the installation of a package that depends on Tensorflow to install the CPU version of Tensorflow when they already have the GPU version installed.

Additional context
This has been discussed in tensorflow/tensorflow#7166. One way around this problem is to remove tensorflow from the list of requirements in setup.py and include it in extras_require (see Edward developer @dustinvtran's comment).

Consider adding automated tests

Also part of the JOSS-Review.

Please consider adding automated tests for the R package.

Reproducibility

Hello,

How can we get reproducible results regarding the seed?
Is there an argument regarding the initial_state for example that we can pass?

Thanks,
Regards

Windows compatibility?

Really excited to compare Ivis to UMAP on a project I am currently working on.

The server I have access to is a Windows 10 machine, with a Python 3.7 Anaconda environment.

Following the install instructions and trying to run the MNIST example, I am seeing the following error: TypeError: can't pickle annoy.Annoy objects

Loss always going towards 1, all embeddings transform to the same values???

I have a dataset consisting of 1200D concatonated avg pooled FastText vectors (2 seperate documents going through 2 fasttext models for a total of 4 * 300d vectors per example).

This dataset properly reduces in dimensionality and projects properly when ran through UMAP, but when ran through Ivis (installed through pip), the loss always goes towards 1 and the embeddings for every example are exactly (err, nearly) the same.

This seems like a bug. I can't provide you the exact data to reproduce with but it is easy to generate some fasttext word vectors and try ivis on them.

Example of the what I am seeing for the same data (top is Ivis, bottom is UMAP)

Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel

The problem

I noticed that when Ivis compose a sklearn.pipeline.Pipeline which is passed to sklearn.model_selection.GridSearch to fine-tune hyper-parameters across all estimators/transformers, and GridSearch has n_jobs=-1 (i.e., when executions within GridSearch are parallel), errors are thrown. This does not happen when n_jobs=1 (i.e., when the executions within GridSearch are sequential).

Since Pipeline globally regulates the n_jobs parameter, thus not supporting the parallelization of only specific steps, this problem forces the global use of n_jobs=1, which sensibly slows down the fine-tuning process by underusing the computational power of the setup in which the script is being executed (even in parts where n_jobs=-1 would work).

Environment

A virtual environment was created specifically to this repository, wherein all modules described in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

Runtime

python=3.8.4

Relevant modules

ivis=2.0.3
tensorflow=2.5.0

Minimal reproducible example

Code

if __name__ == "__main__":
    import tempfile
    import ivis

    from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing
    from os import environ

    environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

    X, y = datasets.load_iris(return_X_y=True)

    pipeline_with_ivis = pipeline.Pipeline([
        ("normalize", preprocessing.MinMaxScaler()),
        ("project", ivis.Ivis()),
        ("classify", ensemble.RandomForestClassifier()),
    ], memory=tempfile.mkdtemp())

    parameter_grid = {
        "project__k": (15,),
        "project__verbose": (True,),

        "classify__random_state": (2021,)
    }

    grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, n_jobs=-1,
                                               return_train_score=True, verbose=3).fit(X, y)

Error

<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 212, in extract_knn
    process.start()
  File "C:\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Python38\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\process.py", line 39, in _Popen
    return Popen(process_obj)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\popen_loky_win32.py", line 70, in __init__
    child_env.update(process_obj.env)
AttributeError: 'KnnWorker' object has no attribute 'env'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 341, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 303, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 591, in __call__
    return self._cached_call(args, kwargs)[0]
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 761, in call
    output = self.func(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 754, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 350, in fit_transform
    self.fit(X, Y, shuffle_mode)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 328, in fit
    self._fit(X, Y, shuffle_mode)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 190, in _fit
    self.neighbour_matrix = AnnoyKnnMatrix.build(X, path=self.annoy_index_path,
  File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 63, in build
    return cls(index, X.shape, path, k, search_k, precompute, include_distances, verbose)
  File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 48, in __init__
    self.precomputed_neighbours = self.get_neighbour_indices()
  File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 96, in get_neighbour_indices
    return extract_knn(
  File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 236, in extract_knn
    process.terminate()
  File "C:\Python38\lib\multiprocessing\process.py", line 133, in terminate
    self._popen.terminate()
AttributeError: 'NoneType' object has no attribute 'terminate'
  warnings.warn("Estimator fit failed. The score on this train-test"

[...]

<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [nan]
  warnings.warn(

Discussion

By coding and playing with the example above, I acquired the understanding that, since both sklearn uses joblib and ivis uses multiprocessing, these modules might not be playing well with each other for some reason.

I would discard the understanding that nested estimators/transformers with parallel routines would be the problem: estimators like sklearn.ensemble.RandomForestClassifier can be set to have n_jobs=-1 without problem within the Pipeline passed to GridSearchCV.

I am particularly affected by this issue because I want to employ ivis in projects that involve hyper-parameter fine-tuning using cross-validation via GridSearchCV with concurrent executions. I attempted to diagnose the problem, but to no avail, which is why I bring this issue to your attention.

Observation: another part of this problem is a design choice that is not adherent to the sklearn API guidelines, whose solution I propose and detail in #95. This issue does not cause the aforementioned error, but might cause other errors that could affect the same use scenario (Pipeline in GridSearchCV running in parallel).

Issue installing and running ivis R package in RStudio

Hello,

For JOSS review.

The installation instructions fail when run in the RStudio environment:

> devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
Downloading GitHub repo beringresearch/ivis@master
✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpud6pnU/remotesbe4d59017fdb/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’ ...
─  preparing ‘ivis’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘ivis_1.1.3.tar.gz’
   
* installing *source* package ‘ivis’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘ivis’:
 .onLoad failed in loadNamespace() for 'ivis', details:
  call: path.expand(path)
  error: invalid 'path' argument
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
Error: Failed to install 'ivis' from GitHub:
  (converted from warning) installation of package ‘/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T//Rtmpud6pnU/filebe4d71713083/ivis_1.1.3.tar.gz’ had non-zero exit status

However, it does work fine when run in the console (Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64 x86_64):

> devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
Downloading GitHub repo beringresearch/ivis@master
   checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’
─  preparing ‘ivis’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘ivis_1.1.3.tar.gz’
   
* installing *source* package ‘ivis’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (ivis)

Moreover, the ivis package (installed from the terminal) can be loaded from an R console in a terminal, but throws the following error when loaded in RStudio

> library(ivis)
Error: package or namespace load failed for ‘ivis’:
 .onLoad failed in loadNamespace() for 'ivis', details:
  call: path.expand(path)
  error: invalid 'path' argument

This is most likely due to conda not being on the PATH in RStudio:

# RStudio
> system("echo $PATH")
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin:/opt/local/bin
# Console
> system("echo $PATH")
/Users/kevin/miniconda3/bin:/Users/kevin/miniconda3/condabin:/usr/local/opt/imagemagick@6/bin:/Users/kevin/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin

Is there a recommended way to set up an environment to run ivis in RStudio, or are users only expected to run it from a terminal R console?

Thanks!

embedding dim cannot be 1

Describe the bug
embedding_dims set to be 1 will lead to ->
ValueError: Invalid reduction dimension 1 for input with 1 dimensions. for 'loss/stacked_triplets_loss/Sum' (op: 'Sum') with input shapes: [?], [] and with computed input tensors: input[1] = <1>.

To Reproduce

import numpy as np
from tensorflow.keras.datasets import mnist
from ivis import Ivis

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = np.reshape(X_train, (len(X_train), 28 * 28))
X_test = np.reshape(X_test, (len(X_test), 28 * 28))

model = Ivis(embedding_dims=1)
model.fit(X_train, Y_train)

Expected behavior
It should work when reducing to one dimension.

Versions

OS: [GCC 7.3.0] on linux
Ivis 1.7.1
Python 3.6.7

Default usage of 'annoy.index' is problematic.

I am benchmarking this method on a cluster which uses a shared file system. The problem is that the Ivis class creates a file, annoy.index, without checking whether it first exists. This file is overwritten without giving any warning/error presumably causing issues for the previously existing program.

~~I see that this can be remedied by setting annoy_index_path; however, I this limitation is not obvious from the documentation and is likely to cause confusion.~~
Actually it appears this cannot be done. This argument tells the program to load from this file, so there is no option to actually change the name of the index file ...

Edit:
Also, the annoy.index file is still created even if I set build_index_on_disk=False

Attempt to apply to non-function

Hi,

I am reviewing for JOSS-Reviews.

I think I was able to install the package. Unfortunately when I run the example code, I run into the following error:

model <- ivis(k = 3)
Error in ivis_object$Ivis(embedding_dims = embedding_dims, k = k, distance = distance, :
attempt to apply non-function`

My session info:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ivis_1.2.3

loaded via a namespace (and not attached):
[1] compiler_3.6.0  Matrix_1.2-17   tools_3.6.0     yaml_2.2.0      Rcpp_1.0.2      reticulate_1.13
[7] grid_3.6.0      jsonlite_1.6    lattice_0.20-38

tiny error in R-package/README.md

X = data.matrix(iris[:, 1:4])
is not valid R. Omit the colon.

Extend readme in R package

This is part of JOSS-Review

It would be nice to showcase the computed ivis visualisation of the data in R package. Consider adding the following code:

library(ggplot2)

dat <- data.frame(x=xy[,1],
                  y=xy[,2],
                  species=iris$Species)


ggplot(dat, aes(x=x, y=y)) + geom_point(aes(color=species)) + theme_classic()

Handling of mixed type datasets with categorical features

From the README file:

both categorical and continuous features are handled well

How do we handle categorical features? Is one-hot-encoding enough?

In UMAP you can use different distances for one-hot-encoded categorical features (e.g. dice, jaccard etc.) and continuous features, then you perform an "intersection" (see lmcinnes/umap#58).

How can we handle mixed type datasets in ivis? Can we just use it on a dataset with continuous features and one-hot-encoded categorical features mixed together?

Thank you very much

questions about distance and rescaling

Hi.

I'm trying out Ivis after reading your paper. Nice work and excellent documentation.

Could you clarify what you mean by "margin – The distance that is enforced between points by the triplet loss functions?" (Emphasis added.) This sounds like, "all distances are set to this value." From a quick read of your code, it sounds like this factor is added to all distances during loss calculations.

Also, your examples generally include a call to minmaxscaler. Is this required by Ivis, meaning that it makes assumptions about the scale of the distances between points, whether in the loss calculations or elsewhere?

Thanks!

Import reticulate::virtualenv_remove in NAMESPACE

Describe the bug
The install_ivis() command throws an error if an "ivis" environment already exists because the reticulate::virtualenv_remove() function is not imported in the NAMESPACE.

To Reproduce
Steps to reproduce the behavior:

> install_ivis()
[...successful installation...]

> install_ivis()
Creating a virtual environment (ivis)
Error in virtualenv_remove("ivis") : 
  could not find function "virtualenv_remove"

Expected behavior
Running install_ivis() when it is already installed should not throw an error.
https://github.com/beringresearch/ivis/blob/master/R-package/R/install_ivis.R#L2

Additional context
I can see that the import is declared here.
It is just about roxygenizing the package to update the NAMESPACE file

Bug with index.build(ntrees)

Hello,

I'm trying to run the ivis examples (both the simple iris one and the mnist one, and I keep getting this error whenever the model fitting is being called (running this on Debian). Any thoughts?

In [7]: embeddings = ivis.fit_transform(mnist.data)

Error truncating file: Invalid argument
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-d5f1692c2b85> in <module>
----> 1 embeddings = ivis.fit_transform(mnist.data)

/opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit_transform(self, X, Y, shuffle_mode)
    289         """
    290
--> 291         self.fit(X, Y, shuffle_mode)
    292         return self.transform(X)
    293

/opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit(self, X, Y, shuffle_mode)
    269         """
    270
--> 271         self._fit(X, Y, shuffle_mode)
    272         return self
    273

/opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in _fit(self, X, Y, shuffle_mode)
    146                 print('Building KNN index')
    147             build_annoy_index(X, self.annoy_index_path,
--> 148                               ntrees=self.ntrees, verbose=self.verbose)
    149
    150         datagen = generator_from_index(X, Y,

/opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/data/knn.py in build_annoy_index(X, path, ntrees, verbose)
     28
     29     # Build n trees
---> 30     index.build(ntrees)
     31     if platform.system() == 'Windows':
     32         index.save(path)

Exception: Invalid argument

Meaning of "Observations" on https://bering-ivis.readthedocs.io/en/latest/hyperparameters.html

I was just looking at the hyperparameters section of the Ivis docs (https://bering-ivis.readthedocs.io/en/latest/hyperparameters.html). I'm a little confused on what observations exactly mean, so any help would be appreciated! Thanks!

classification_weight Parameter

Hi,
The section in the docs about metric learning here, talks about a classification_weight parameter. However, the Ivis class does not have such a parameter.

Could anyone explain why that is?

Distance-weighted random sampling of non-neighbor negatives

Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.

I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:

Draw a random set of candidate negatives
Drop any candidate negatives already in the neighbor list
Choose from the remaining set of candidates with probabilities proportional to 1/f(dist(i, j)), where f(dist) could be just 1/dist, 1/sqrt(dist), etc

Annoy gives us the dist(i, j) without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.

This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially) f(dist).

Add contributor guidelines

This is part of the JOSS-Review

Please consider adding clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Add ability to save and load ivis model from R

embedding fails when batch size is greater than total dimensions of the dataset

R pkg fit() call finishes but subprocess doesn't terminate

This model consistently feels like a magic trick, thanks for contributing!

Bug
I'm running the ivis R package(v1.7.1) (more system details below). I can get model$fit() and model$transform() working just fine and producing substantive results. However, when the R process finishes and returns the fitted model, I'm seeing continued sky-high system usage. The R process calling ivis is definitely completed and back to a command prompt, but in htop I can see the RStudio GUI process (parent of the rsession process) occupying at least 2 full cores. Some process further down is not stopping when the R process gets the returned value. (Restarting the R session does kill it.)

I don't understand enough of the ivis-through-reticulate toolchain to provide more helpful diagnostics in this first report, but happy to run experiments and document further.

Environment

ivis R package(v1.7.1), installed from Github (56a8479) 14 Apr 2020
reticulate (v1.15), 2020-04-02 CRAN (R 3.6.2)
R 3.6.2 on MacOS 10.14.6 (18G4032)

platform       x86_64-apple-darwin15.6.0   
arch           x86_64                      
os             darwin15.6.0                
system         x86_64, darwin15.6.0        
status                                     
major          3                           
minor          6.2                         
year           2019                        
month          12                          
day            12                          
svn rev        77560                       
language       R                           
version.string R version 3.6.2 (2019-12-12)
nickname       Dark and Stormy Night

Add some sort of logging to keep track of algorithm's progress during long jobs

Ivis is not able to run inference on a sparse matrix

To reproduce:

from sklearn.datasets import fetch_rcv1
from sklearn.utils import resample

rcv1 = fetch_rcv1()
rcv1.data.shape
X, y = resample(rcv1.data, rcv1.target, replace=False, n_samples=1000, random_state=1234)

ivis = Ivis(epochs=1)
ivis.fit(X)

embeddings = ivis.transform(X)

beringresearch / ivis Goto Github PK

ivis's People

Contributors

Stargazers

Watchers

Forkers

ivis's Issues

The issue

Minimal reproducible examples

Environment

Runtime

Relevant modules

Example with sklearn.pipeline.Pipeline

Script

Log with errors

Example without sklearn.pipeline.Pipeline

Script

Log with errors

Discussion

The problem

Environment

Runtime

Relevant modules

Minimal reproducible example

Code

Error

Discussion

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Example with `sklearn.pipeline.Pipeline`

Example without `sklearn.pipeline.Pipeline`