theislab / dca Goto Github PK

Deep count autoencoder for denoising scRNA-seq data

License: Apache License 2.0

Python 99.79% R 0.21% Shell 0.01%

dca's Introduction

Deep count autoencoder for denoising scRNA-seq data

A deep count autoencoder network to denoise scRNA-seq data and remove the dropout effect by taking the count structure, overdispersed nature and sparsity of the data into account using a deep autoencoder with zero-inflated negative binomial (ZINB) loss function.

See our manuscript and tutorial for more details.

Installation

pip

For a traditional Python installation of the count autoencoder and the required packages, use

$ pip install dca

conda

Another approach for installing count autoencoder and the required packages is to use Conda (most easily obtained via the Miniconda Python distribution). Afterwards run the following commands.

$ conda install -c bioconda dca

Usage

You can run the autoencoder from the command line:

dca matrix.csv results

where matrix.csv is a CSV/TSV-formatted raw count matrix with genes in rows and cells in columns. Cell and gene labels are mandatory.

Results

Output folder contains the main output file (representing the mean parameter of ZINB distribution) as well as some additional matrices in TSV format:

mean.tsv is the main output of the method which represents the mean parameter of the ZINB distribution. This file has the same dimensions as the input file (except that the zero-expression genes or cells are excluded). It is formatted as a gene x cell matrix. Additionally, mean_norm.tsv file contains the library size-normalized expressions of each cell and gene. See normalize_total function from Scanpy for the details about the default library size normalization method used in DCA.
pi.tsv and dispersion.tsv files represent dropout probabilities and dispersion for each cell and gene. Matrix dimensions are same as mean.tsv and the input file.
reduced.tsv file contains the hidden representation of each cell (in a 32-dimensional space by default), which denotes the activations of bottleneck neurons.

Use -h option to see all available parameters and defaults.

Hyperparameter optimization

You can run the autoencoder with --hyper option to perform hyperparameter search.

dca's People

Contributors

Stargazers

Watchers

dca's Issues

-- hyper (no output/empty output)

When I try to apply --hyper I get an empty architecture structure and no output file

here is the message after the empty output:
[]
Traceback (most recent call last):
File "/usr/local/bin/dca", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/dca/main.py", line 154, in main
train.train_with_args(args)
File "/usr/local/lib/python3.7/dist-packages/dca/train.py", line 121, in train_with_args
hyper(args)
File "/usr/local/lib/python3.7/dist-packages/dca/hyper.py", line 93, in hyper
test_fn(objective, hyper_params, save_model=None)
File "/usr/local/lib/python3.7/dist-packages/kopt/hyopt.py", line 68, in test_fn
res = fn(param)
File "/usr/local/lib/python3.7/dist-packages/kopt/hyopt.py", line 607, in call
self._assert_optim_metric(model)
File "/usr/local/lib/python3.7/dist-packages/kopt/hyopt.py", line 553, in _assert_optim_metric
"add_eval_metrics: {0}".format(eval_metrics))
ValueError: optim_metric: 'loss' not in either sets of the losses:
model.metrics_names: []
add_eval_metrics: []

no 'mean_norm.tsv' created

I am trying to run your application from the command line in its most basic form ('dca input.tsv outputdir'), however the output directory does not contain the 'mean_norm.tsv' mentioned in the wiki. Is this file no longer created or is there an argument I am missing?

I am running dca in a Miniconda virtual environment and 'pip show dca' shows the version as 0.2.3

Which keras version to use?

Hello, I am trying to install dca but keep getting errors related to keras.

Which dca, keras, tensorflow versions to use?

Value error while running dca with scanpy

I came across this error when running dca(adata) on a scanpy object in python 3.6.3 environment:

View of AnnData object with n_obs × n_vars = 1448 × 20615
    obs: 'cell_type'
    var: 'gene_ids', 'feature_types'

dca: Calculating reconstructions...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-6-d4407240a7c2> in <module>
----> 1 dca(adata)

/opt/applications/python/3.6.3/gnu/lib/python3.6/site-packages/dca/api.py in dca(adata, mode, ae_type, normalize_per_cell, scale, log1p, hidden_size, hidden_dropout, batchnorm, activation, init, network_kwds, epochs, reduce_lr, early_stop, batch_size, optimizer, learning_rate, random_state, threads, verbose, training_kwds, return_model, return_info, copy)
    193 
    194     hist = train(adata[adata.obs.dca_split == 'train'], net, **training_kwds)
--> 195     res = net.predict(adata, mode, return_info, copy)
    196     adata = res if copy else adata
    197 

/opt/applications/python/3.6.3/gnu/lib/python3.6/site-packages/dca/network.py in predict(self, adata, mode, return_info, copy, colnames)
    402 
    403         # warning! this may overwrite adata.X
--> 404         super().predict(adata, mode, return_info, copy=False)
    405         return adata if copy else None
    406 

/opt/applications/python/3.6.3/gnu/lib/python3.6/site-packages/dca/network.py in predict(self, adata, mode, return_info, copy)
    200             adata.uns['dca_loss'] = self.model.test_on_batch({'count': adata.X,
    201                                                               'size_factors': adata.obs.size_factors},
--> 202                                                              adata.raw.X)
    203         if mode in ('latent', 'full'):
    204             print('dca: Calculating low dimensional representations...')

/opt/applications/tensorflow/1.15.0/python3.6/gnu/lib/python3.6/site-packages/keras/engine/training.py in test_on_batch(self, x, y, sample_weight)
   1486             ins = x + y + sample_weights
   1487         self._make_test_function()
-> 1488         outputs = self.test_function(ins)
   1489         return unpack_singleton(outputs)
   1490 

/opt/applications/tensorflow/1.15.0/python3.6/gnu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2977                     return self._legacy_call(inputs)
   2978 
-> 2979             return self._call(inputs)
   2980         else:
   2981             if py_any(is_tensor(x) for x in inputs):

/opt/applications/tensorflow/1.15.0/python3.6/gnu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2915                 array_vals.append(
   2916                     np.asarray(value,
-> 2917                                dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
   2918         if self.feed_dict:
   2919             for key in sorted(self.feed_dict.keys()):

/opt/applications/python/3.6.3/gnu/lib/python3.6/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: setting an array element with a sequence.

cannot recognize fluidigm RNA-seq raw counts

Hello author,

I was running dca on RNA-seq raw counts data generated by fluidigm. However, the programme is terminated with error message as follows:
AssertionError: Make sure that the dataset (adata.X) contains unnormalized count data.

I double check the input and am sure that it is an unnormalized count matrix (gene by cell). May I know your suggestions? Thanks!

Version of Pandas

I ran the following command on my dataset.

dca SRA779509_SRS3805247.tsv DCA --type zinb

I changed the Tensorflow version to 1.15. However, it looks like the version of pandas that DCA uses is also the older one. The model was trained and the output files mean.tsv, latenet.tsv were produced, following which the execution terminated with an error that pandas series has no attribute 'reshape'.

...
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2020-06-06 17:12:58,635 [WARNING] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2020-06-06 17:12:59.027542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Traceback (most recent call last):
  File "/usr/local/bin/dca", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/dca/__main__.py", line 149, in main
    train.train_with_args(args)
  File "/usr/local/lib/python3.6/dist-packages/dca/train.py", line 174, in train_with_args
    net.write(adata, args.outputdir, mode='full', colnames=predict_columns)
  File "/usr/local/lib/python3.6/dist-packages/dca/network.py", line 543, in write
    write_text_matrix(adata.var['X_dca_dispersion'].reshape(1, -1),
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'reshape'

In the newer versions of pandas, it is replaced with pandas.Series.values.reshape. Is there any way to fix this without changing the version?

denoisesubset option error

Hello,

I tried to use the --denoisesubset option, using a .txt file of circa 100 genes. When I run dca on my data with the option, I get the following error

dca data_ready.csv results_DCA --denoisesubset genes.txt
dca: Successfully preprocessed 20896 genes and 12945 cells.
dca: Subset of 116 genes will be denoised.
dca: Calculating reconstructions...
f"Data matrix has wrong shape {value.shape}, "
ValueError: Data matrix has wrong shape (12945, 116), need to be (12945, 20896).

Can you figure out what is the problem ? I guess that the denoised matrix includes only the denoised genes while the output should contain also all the other initial genes.

Thanks for the help, it will be highly appreciated !

Denoise and Latent mode / Poisson

Hi,

Thanks a lot for the paper and the package. I'm new to neural network.

I have a few questions:

When mode='latent', we are able to get the bottleneck layer which represent latent representation of cells. But is it possible to get also the matrix of the encoder layers and also the decoder layers? And also the W parameters for each layer?
I saw that you have implemented the Poisson Autoencoder (with Poisson as loss function) in the package but you did not talk about it in your paper. Did you perform some test with it? What kind of results did you get?

Thanks again,
Best,
Inaki

Understanding SliceLayer and ColWiseMultLayer

Gokcen,

I'm trying to implement you concepts for a different (and far simpler) application and your jupyter notebook and DCA code have been very helpful. I have been able to reproduce your toy results fairly easily with a few exceptions...(I'm working in R Keras, which has a few limitations).

Can you help me understand a few things....

Is the ColWiseMultLayer different than using keras.layers.Multiply() ?
What is the SliceLayer doing exactly? Is it different than simply concatenating the input tensors? The loss functions don't seem to be splitting the incoming tensors in any way.

I'm currently running as a multiple output model, but I'm not sure how to deal with the scale mismatch of the loss components.

Thanks!

TypeError: Categorical is not ordered for operation max

I'm getting the following error when running DCA. Any idea what might be causing this?

TypeError: load() missing 1 required positional argument: 'Loader'

After running the command "dca matrix.csv results" as per README.md, in the terminal, I am getting a bunch of errors.

Traceback (most recent call last):
File "C:\Users\Sruthi Srinivasan\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "C:\Users\Sruthi Srinivasan\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\Scripts\dca.exe_main.py", line 7, in
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\dca_main.py", line 152, in main
from . import train
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\dca\train.py", line 25, in
from .hyper import hyper
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\dca\hyper.py", line 6, in
from kopt import CompileFN, test_fn
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\kopt_init_.py", line 10, in
from . import hyopt
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\kopt\hyopt.py", line 13, in
from kopt.config import db_host, db_port, save_dir
File "C:\Users\Sruthi Srinivasan\PycharmProjects\pythonProject1\venv\lib\site-packages\kopt\config.py", line 60, in
_config = yaml.load(open(_config_path))
TypeError: load() missing 1 required positional argument: 'Loader'

Could you please give a possible solution for this problem?

Tensorflow version?

Hi,

I installed dca through conda and tried to call dca by scanpy function sc.external.pp.dca(). But then I got the error message from tensorflow. Below is the code:

import numpy as np
import pandas as pd
import scanpy as sc
adata = sc.read_10x_mtx('./data/filtered_gene_bc_matrices/hg19/',  var_names='gene_ids', cache=True)
result = sc.external.pp.dca(adata)

And the error message I got is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/udd/spqis/.conda/envs/scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_dca.py", line 154, in dca
    return_model=return_model)
  File "/udd/spqis/.conda/envs/scanpy/lib/python3.7/site-packages/dca/api.py", line 149, in dca
    tf.set_random_seed(random_state)
AttributeError: module 'tensorflow' has no attribute 'set_random_seed'

My guess is that conda got me a tensorflow version that is incompatible with dca. Can this be fixed? Thank you!

Some questions about DCA implementation

Hi, I wonder why the ZINB loss implementation and NB loss implementation are different for codes and theoretical definitions.

I notice that in the paper, NB and ZINB are defined as:

However, I found that in the part of the codes, ZINB loss's NB is defined as:

Therefore, I wonder where is gamma function? Thanks a lot!

ImportError: cannot import name 'stacked_violin'

I just install it with pip3 install dca in python 3.6.2. Then run it with a commandline "dca GSM3892344.expres.ensemble.csv ./test". ImportError: cannot import name 'stacked_violin' Could you please whether this is because the scanpy version not compatible with the dca installed here.

scale data

Dear,
In DCA tutorial, there is no sc.pp.scale(adata) between sc.pp.log1p(adata) and sc.pp.pca(adata), which is present in Scanpy’s reimplementation of Seurat (https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html).

Is it necessary to use sc.pp.scale()?

there is a term tf.lgamma(y_true+1.0) in the NB loss function. Is it from NB pdf?

Error in running dca

Dear Gökçen,

I have installed dca using anaconda but when I try to run dca, there is a problem of import keras from dca.
Anaconda installation of dca:
conda install -c bioconda dca
All setup conditions are the same as setup.py file. I use jupyter notebook and check the version of packages:
scanpy==1.4 anndata==0.6.18 numpy==1.15.4 scipy==1.2.1 pandas==0.24.1 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1
keras== 2.0.8, six== 1.12.0, h5py== 2.9.0, kopt== 0.1.0

I can work with scanpy without any problem. However, when I import dca in Jupyter server using:
from dca.api import dca
they throw an error like this:

from keras.engine.base_layer import InputSpecModuleNotFoundError: No module named 'keras.engine.base_layer'

Have you ever met this case? Do you have any suggestions?
Thanks a lot and nice day,

Best,
Hoa Tran

When using --hyper, no output matrix produced

Hi, thanks for making the source code available. The "reproducibility" folder is also great. Regarding using DCA to perform hyperparameter search, isn't this same command supposed to produce a mean.tsv file just like with running DCA regularly? All I get at the end are the json files in the "train_models" folder

Side note, but does this message have anything to do with that issue?

"Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA"

Mode 'full' does not return the correct latent representation

In mode 'full', the original data is overwritten with
adata.X = self.model.predict({'count': adata.X, 'size_factors': adata.obs.size_factors})
before the latent representation is created using:
adata.obsm['X_dca'] = self.encoder.predict({'count': adata.X, 'size_factors': adata.obs.size_factors})
"X_dca" therefore does not contain the accurate latent representation if this mode is used.

10x genomics h5 format for huge data

For large datasets, e.g. 1.3 million cells in mousr brain dataset

is there a support for h5 format?

Thanks

python version

In setup.py of dca i see that you depend on python 3.5 . But anndata, which you use, uses f strings introduced in python 3.6

Output file description

Dear Gökçen,

Thank you so much for developing DCA. Would it be possible to get a short description of output files produced by the dca command line usage?

Best regards,

Vedran

Error in running dca

Hello,

I have a problem when running the "dca matrix.csv results":
File "/usr/local/lib/python3.6/site-packages/six.py", line 82, in _import_module __import__(name) File "/usr/local/lib/python3.6/tkinter/__init__.py", line 36, in <module> import _tkinter # If this fails your Python may not be configured for Tk ModuleNotFoundError: No module named '_tkinter'

I tried one solution: sudo apt-get install tk-dev , but didnt work out.

Is there anyone who knows how to solve the problem with this module ?

Best,
Monika

smaller expression value after imputation?

Hi,

I ran dca with scanpy and checked the maximum value of the matrix,

np.max(adata_imputed.X)
185.14267

which is even smaller than the original raw counts' maximum

np.max(adata.layers["counts"])
296

How shall I explain such a phenomenon? My personal explanation would be, dca is an AE-based denoiser, not guaranteed to be a missing data imputing tool. Am I right? can you give me more explanation from the AE theoretical principle?

Look forward to your reply. Thanks.

autoencoder.api.autoencode doesn't work

Hi there. Saw you at NIPS. Trying to get this thing going on my data.

autoencoder.api.autoencode seems to have a bug. If I have a numpy array of cells by genes, what is the most straightforward way to run your method on my data? Thanks!

Dependency on deprecated tensorflow function

I tried dca in scanpy pipeline in python 3.8 environment and got this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-9bb00469e9a1> in <module>
      1 ###----- DCA denoising
----> 2 dca(adata)

~/anaconda3/envs/sc_py3.8/lib/python3.8/site-packages/dca/api.py in dca(adata, mode, ae_type, normalize_per_cell, scale, log1p, hidden_size, hidden_dropout, batchnorm, activation, init, network_kwds, epochs, reduce_lr, early_stop, batch_size, optimizer, learning_rate, random_state, threads, verbose, training_kwds, return_model, return_info, copy)
    147     random.seed(random_state)
    148     np.random.seed(random_state)
--> 149     tf.set_random_seed(random_state)
    150     os.environ['PYTHONHASHSEED'] = '0'
    151 

AttributeError: module 'tensorflow' has no attribute 'set_random_seed'

Use count data as input but encounter assertion error

DCA reports assertion errors when I use certain raw count scRNA-seq dataset as input.

Traceback (most recent call last):
File "/home/Elvis/anaconda3/envs/dca-env/bin/dca", line 8, in
sys.exit(main())
File "/home/Elvis/anaconda3/envs/dca-env/lib/python3.6/site-packages/dca/main.py", line 149, in main
train.train_with_args(args)
File "/home/Elvis/anaconda3/envs/dca-env/lib/python3.6/site-packages/dca/train.py", line 113, in train_with_args
test_split=args.testsplit)
File "/home/Elvis/anaconda3/envs/dca-env/lib/python3.6/site-packages/dca/io.py", line 69, in read_dataset
assert np.all(X_subset.astype(int) == X_subset), norm_error
AssertionError: Make sure that the dataset (adata.X) contains unnormalized count data.

Installation via conda

Hi,

DCA can now be installed using conda (on bioconda channel):

conda install -c bioconda dca

I can update the README file to add this information if you want.

Bérénicd

--hyper

Hi theislab,

I tried to use the --hyper parameter in dca, but running dca -t --hyper pbmc.g949_c10k.msk90.csv hyper_dca,

But I got the following error:

/mnt/lfs2/rui/vir/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
  File "/mnt/lfs2/rui/vir/bin/dca", line 11, in <module>
    load_entry_point('DCA==0.1', 'console_scripts', 'dca')()
  File "/mnt/lfs2/rui/vir/lib/python3.5/site-packages/DCA/__main__.py", line 135, in main
    train.train_with_args(args)
  File "/mnt/lfs2/rui/vir/lib/python3.5/site-packages/DCA/train.py", line 103, in train_with_args
    hyper(args)
  File "/mnt/lfs2/rui/vir/lib/python3.5/site-packages/DCA/hyper.py", line 15, in hyper
    ds = io.create_dataset(args.input,
AttributeError: module 'DCA.io' has no attribute 'create_dataset'

Input and Output

Dear author,

Thanks for developing the software! I have a few questions but did not find clear answer from your paper:

is the input a count matrix?
is the output a count matrix? Has it been normalized by library size, or log2-transformed?
I tried to use a raw count matrix as an input, but find in output values are as large as 400+ and three quantiles are between 0 and 1?

Thanks!

pip install : ModuleNotFoundError: No module named 'keras.objectives'

I'm struggling to get the pip install to work.
After
$pip install dca
$python
>>> from dca.api import dca

I get the error

Traceback (most recent call last):
File "", line 1, in
File "/Users/calderwa/opt/anaconda3/envs/test/lib/python3.6/site-packages/dca/api.py", line 15, in
from .train import train
File "/Users/calderwa/opt/anaconda3/envs/test/lib/python3.6/site-packages/dca/train.py", line 24, in
from .network import AE_types
File "/Users/calderwa/opt/anaconda3/envs/test/lib/python3.6/site-packages/dca/network.py", line 27, in
from keras.objectives import mean_squared_error
ModuleNotFoundError: No module named 'keras.objectives'

I find that with the bioconda install I get the same error message.

Please see below for the full output from the pip install step:

test ❯ pip install dca
Collecting dca
Using cached DCA-0.3.2-py3-none-any.whl (26 kB)
Collecting scikit-learn
Using cached scikit_learn-0.24.2-cp36-cp36m-macosx_10_13_x86_64.whl (7.2 MB)
Collecting six>=1.10.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting keras>=2.4
Using cached keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
Collecting pandas
Using cached pandas-1.1.5-cp36-cp36m-macosx_10_9_x86_64.whl (10.2 MB)
Collecting tensorflow>=2.0
Using cached tensorflow-2.6.0-cp36-cp36m-macosx_10_11_x86_64.whl (198.9 MB)
Collecting numpy>=1.7
Using cached numpy-1.19.5-cp36-cp36m-macosx_10_9_x86_64.whl (15.6 MB)
Collecting kopt
Using cached kopt-0.1.0-py2.py3-none-any.whl
Collecting scanpy
Using cached scanpy-1.7.2-py3-none-any.whl (10.3 MB)
Collecting h5py
Using cached h5py-3.1.0-cp36-cp36m-macosx_10_9_x86_64.whl (2.9 MB)
Collecting flatbuffers~=1.12.0
Using cached flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting termcolor~=1.1.0
Using cached termcolor-1.1.0-py3-none-any.whl
Collecting protobuf>=3.9.2
Downloading protobuf-3.18.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 3.4 MB/s
Collecting google-pasta~=0.2
Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting astunparse~=1.6.3
Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting grpcio<2.0,>=1.37.0
Using cached grpcio-1.40.0-cp36-cp36m-macosx_10_10_x86_64.whl (3.9 MB)
Collecting absl-py~=0.10
Using cached absl_py-0.13.0-py3-none-any.whl (132 kB)
Collecting tensorflow-estimator~=2.6
Using cached tensorflow_estimator-2.6.0-py2.py3-none-any.whl (462 kB)
Collecting opt-einsum~=3.3.0
Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting gast==0.4.0
Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting typing-extensions~=3.7.4
Using cached typing_extensions-3.7.4.3-py3-none-any.whl (22 kB)
Collecting six>=1.10.0
Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting wrapt~=1.12.1
Using cached wrapt-1.12.1-cp36-cp36m-macosx_10_9_x86_64.whl
Collecting tensorboard~=2.6
Using cached tensorboard-2.6.0-py3-none-any.whl (5.6 MB)
Collecting keras-preprocessing~=1.1.2
Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Requirement already satisfied: wheel~=0.35 in ./opt/anaconda3/envs/test/lib/python3.6/site-packages (from tensorflow>=2.0->dca) (0.37.0)
Collecting clang~=5.0
Using cached clang-5.0-py3-none-any.whl
Collecting cached-property
Using cached cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)
Requirement already satisfied: setuptools>=41.0.0 in ./opt/anaconda3/envs/test/lib/python3.6/site-packages (from tensorboard~=2.6->tensorflow>=2.0->dca) (52.0.0.post20210125)
Collecting markdown>=2.6.8
Using cached Markdown-3.3.4-py3-none-any.whl (97 kB)
Collecting google-auth<2,>=1.6.3
Using cached google_auth-1.35.0-py2.py3-none-any.whl (152 kB)
Collecting requests<3,>=2.21.0
Using cached requests-2.26.0-py2.py3-none-any.whl (62 kB)
Collecting tensorboard-plugin-wit>=1.6.0
Using cached tensorboard_plugin_wit-1.8.0-py3-none-any.whl (781 kB)
Collecting werkzeug>=0.11.15
Using cached Werkzeug-2.0.1-py3-none-any.whl (288 kB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
Using cached tensorboard_data_server-0.6.1-py3-none-macosx_10_9_x86_64.whl (3.5 MB)
Collecting cachetools<5.0,>=2.0.0
Using cached cachetools-4.2.2-py3-none-any.whl (11 kB)
Collecting pyasn1-modules>=0.2.1
Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting rsa<5,>=3.1.4
Using cached rsa-4.7.2-py3-none-any.whl (34 kB)
Collecting requests-oauthlib>=0.7.0
Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting importlib-metadata
Using cached importlib_metadata-4.8.1-py3-none-any.whl (17 kB)
Collecting pyasn1<0.5.0,>=0.4.6
Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting charset-normalizer~=2.0.0
Using cached charset_normalizer-2.0.5-py3-none-any.whl (37 kB)
Collecting urllib3<1.27,>=1.21.1
Using cached urllib3-1.26.6-py2.py3-none-any.whl (138 kB)
Requirement already satisfied: certifi>=2017.4.17 in ./opt/anaconda3/envs/test/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->tensorflow>=2.0->dca) (2021.5.30)
Collecting idna<4,>=2.5
Using cached idna-3.2-py3-none-any.whl (59 kB)
Collecting oauthlib>=3.0.0
Using cached oauthlib-3.1.1-py2.py3-none-any.whl (146 kB)
Collecting dataclasses
Using cached dataclasses-0.8-py3-none-any.whl (19 kB)
Collecting zipp>=0.5
Using cached zipp-3.5.0-py3-none-any.whl (5.7 kB)
Collecting future
Using cached future-0.18.2-py3-none-any.whl
Collecting scipy
Using cached scipy-1.5.4-cp36-cp36m-macosx_10_9_x86_64.whl (28.8 MB)
Collecting hyperopt
Using cached hyperopt-0.2.5-py2.py3-none-any.whl (965 kB)
Collecting pyyaml
Using cached PyYAML-5.4.1-cp36-cp36m-macosx_10_9_x86_64.whl (249 kB)
Collecting matplotlib
Using cached matplotlib-3.3.4-cp36-cp36m-macosx_10_9_x86_64.whl (8.5 MB)
Collecting joblib>=0.11
Using cached joblib-1.0.1-py3-none-any.whl (303 kB)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-2.2.0-py3-none-any.whl (12 kB)
Collecting networkx>=2.2
Using cached networkx-2.5.1-py3-none-any.whl (1.6 MB)
Collecting cloudpickle
Using cached cloudpickle-2.0.0-py3-none-any.whl (25 kB)
Collecting tqdm
Using cached tqdm-4.62.2-py2.py3-none-any.whl (76 kB)
Collecting decorator<5,>=4.3
Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3
Using cached pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
Collecting kiwisolver>=1.0.1
Using cached kiwisolver-1.3.1-cp36-cp36m-macosx_10_9_x86_64.whl (61 kB)
Collecting python-dateutil>=2.1
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pillow>=6.2.0
Using cached Pillow-8.3.2-cp36-cp36m-macosx_10_10_x86_64.whl (3.0 MB)
Collecting cycler>=0.10
Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting pytz>=2017.2
Using cached pytz-2021.1-py2.py3-none-any.whl (510 kB)
Collecting seaborn
Using cached seaborn-0.11.2-py3-none-any.whl (292 kB)
Collecting umap-learn>=0.3.10
Using cached umap_learn-0.5.1-py3-none-any.whl
Collecting anndata>=0.7.4
Using cached anndata-0.7.6-py3-none-any.whl (127 kB)
Collecting legacy-api-wrap
Using cached legacy_api_wrap-1.2-py3-none-any.whl (37 kB)
Collecting tables
Using cached tables-3.6.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.4 MB)
Collecting numba>=0.41.0
Using cached numba-0.53.1-cp36-cp36m-macosx_10_14_x86_64.whl (2.2 MB)
Collecting patsy
Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting packaging
Using cached packaging-21.0-py3-none-any.whl (40 kB)
Collecting natsort
Using cached natsort-7.1.1-py3-none-any.whl (35 kB)
Collecting statsmodels>=0.10.0rc2
Using cached statsmodels-0.12.2-cp36-cp36m-macosx_10_15_x86_64.whl (9.5 MB)
Collecting sinfo
Using cached sinfo-0.3.4-py3-none-any.whl
Collecting xlrd<2.0
Using cached xlrd-1.2.0-py2.py3-none-any.whl (103 kB)
Collecting llvmlite<0.37,>=0.36.0rc1
Using cached llvmlite-0.36.0-cp36-cp36m-macosx_10_9_x86_64.whl (18.5 MB)
Collecting pynndescent>=0.5
Using cached pynndescent-0.5.4-py3-none-any.whl
Collecting get-version>=2.0.4
Using cached get_version-2.1-py3-none-any.whl (43 kB)
Collecting stdlib-list
Using cached stdlib_list-0.8.0-py3-none-any.whl (63 kB)
Collecting numexpr>=2.6.2
Using cached numexpr-2.7.3-cp36-cp36m-macosx_10_9_x86_64.whl (101 kB)
Installing collected packages: urllib3, pyasn1, numpy, idna, charset-normalizer, zipp, typing-extensions, threadpoolctl, six, scipy, rsa, requests, pyasn1-modules, oauthlib, llvmlite, joblib, cachetools, scikit-learn, requests-oauthlib, pytz, python-dateutil, pyparsing, pillow, numba, kiwisolver, importlib-metadata, google-auth, decorator, dataclasses, cycler, cached-property, xlrd, werkzeug, tqdm, tensorboard-plugin-wit, tensorboard-data-server, stdlib-list, pynndescent, protobuf, patsy, pandas, packaging, numexpr, networkx, natsort, matplotlib, markdown, h5py, grpcio, google-auth-oauthlib, get-version, future, cloudpickle, absl-py, wrapt, umap-learn, termcolor, tensorflow-estimator, tensorboard, tables, statsmodels, sinfo, seaborn, pyyaml, opt-einsum, legacy-api-wrap, keras-preprocessing, keras, hyperopt, google-pasta, gast, flatbuffers, clang, astunparse, anndata, tensorflow, scanpy, kopt, dca
Successfully installed absl-py-0.13.0 anndata-0.7.6 astunparse-1.6.3 cached-property-1.5.2 cachetools-4.2.2 charset-normalizer-2.0.5 clang-5.0 cloudpickle-2.0.0 cycler-0.10.0 dataclasses-0.8 dca-0.3.2 decorator-4.4.2 flatbuffers-1.12 future-0.18.2 gast-0.4.0 get-version-2.1 google-auth-1.35.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.40.0 h5py-3.1.0 hyperopt-0.2.5 idna-3.2 importlib-metadata-4.8.1 joblib-1.0.1 keras-2.6.0 keras-preprocessing-1.1.2 kiwisolver-1.3.1 kopt-0.1.0 legacy-api-wrap-1.2 llvmlite-0.36.0 markdown-3.3.4 matplotlib-3.3.4 natsort-7.1.1 networkx-2.5.1 numba-0.53.1 numexpr-2.7.3 numpy-1.19.5 oauthlib-3.1.1 opt-einsum-3.3.0 packaging-21.0 pandas-1.1.5 patsy-0.5.1 pillow-8.3.2 protobuf-3.18.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pynndescent-0.5.4 pyparsing-2.4.7 python-dateutil-2.8.2 pytz-2021.1 pyyaml-5.4.1 requests-2.26.0 requests-oauthlib-1.3.0 rsa-4.7.2 scanpy-1.7.2 scikit-learn-0.24.2 scipy-1.5.4 seaborn-0.11.2 sinfo-0.3.4 six-1.15.0 statsmodels-0.12.2 stdlib-list-0.8.0 tables-3.6.1 tensorboard-2.6.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.6.0 tensorflow-estimator-2.6.0 termcolor-1.1.0 threadpoolctl-2.2.0 tqdm-4.62.2 typing-extensions-3.7.4.3 umap-learn-0.5.1 urllib3-1.26.6 werkzeug-2.0.1 wrapt-1.12.1 xlrd-1.2.0 zipp-3.5.0

~ 1m 38s
test ❯ python
Python 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 12:58:59)
[GCC Clang 10.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from dca.api import dca

conda installation error

I installed the latest conda (conda 4.8.0).

I cannot install dca successfully.

(dca) xxx:~$ conda install -c bioconda dca
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Package kopt conflicts for:
dca -> kopt
Package scanpy conflicts for:
dca -> scanpy
Package keras conflicts for:
dca -> keras[version='>=2.0.8']
Package numpy conflicts for:
dca -> numpy[version='>=1.7']
Package python conflicts for:
dca -> python[version='>=3.6']
Package scikit-learn conflicts for:
dca -> scikit-learn
Package six conflicts for:
dca -> six[version='>=1.10.0']
Package h5py conflicts for:
dca -> h5py
Package pandas conflicts for:
dca -> pandas

why KeyError: 'rmsprop'?

dca: Successfully preprocessed
Traceback (most recent call last):
File "", line 1, in
File "/home/anaconda3/envs/dca/lib/python3.6/site-packages/dca/api.py", line 194, in dca
hist = train(adata[adata.obs.dca_split == 'train'], net, **training_kwds)
File "/home/anaconda3/envs/dca/lib/python3.6/site-packages/dca/train.py", line 48, in train
optimizer = opt.dictoptimizer
KeyError: 'rmsprop'

conda install failure

always being "Solving environment", Please make a runnable environment through conda or docker to let the paper be more worthy !!!

pytorch version

I am very interested in the dca, but tensorflow version is complicated, could you provide pytorch version for more researchers.

after use --nocheckcounts, An error occurs : numpy() is only available when eager execution is enabled.

I have tried to enable eager execution, but then another error said DCA needs tensorflow V2+.

BTW, if I use --checkcounts, another error asks me to make sure my dataset is nonormalized. However, I am sure it's nonormalized and with cell and gene labels.

package versions

I am running into trouble using this package, tried different versions of keras, tensorflow ect. yet various errors still pop up. Can anyone who is using the package successfully kindly provide the version information of the packages required for dca? Thanks in advance!

pre trained model

goodmornig. Thanks for this work, how can i use a dca pre-trained model for a differnt dataset?

small error

in loss.py
134 theta = tf.minimum(self.theta, 1e6)
should be 1e-6

Not able to run DCA from command line

Dear Gökçen,

Thanks a lot for developing DCA. I am really looking forward to use your method in my data, however, I am not being able to run from command line (although the installation was successful). Here is my try:

dca test.csv dca_test

Using TensorFlow backend. WARNING: This might be very slow. Consider passing cache=True`, which enables much faster reading from a cache file.
Traceback (most recent call last):
File "/home/gu/miniconda3/envs/work/bin/dca", line 11, in
load_entry_point('DCA==0.1', 'console_scripts', 'dca')()
File "/home/gu/miniconda3/envs/work/lib/python3.5/site-packages/DCA/main.py", line 135, in main
train.train_with_args(args)
File "/home/gu/miniconda3/envs/work/lib/python3.5/site-packages/DCA/train.py", line 108, in train_with_args
test_split=args.testsplit)
File "/home/gu/miniconda3/envs/work/lib/python3.5/site-packages/DCA/io.py", line 64, in read_dataset
assert 'n_count' not in adata.obs, norm_error
AttributeError: 'AnnData' object has no attribute 'obs'

0:00:01.420 - total wall time
`
Attached is the example of my input (first 100 rows and 100 columns). The error is the same with the full dataset.

test.csv.zip

I appreciate any help on how to fix this.
Thanks
Gustavo

Normalization in mean_norm.tsv output

Dear developers,

What is the meaning of values in mean_norm.tsv file? Values do not sum to 1.0 per cell, when description says that 'file contains the library size-normalized expressions of each cell and gene'. Please explain how I should read the output and if it is possible to transform to smth like TPM.

Thanks,

Andrzej

input and output dimensions

How to make the output file has the same dimension with the input file? Don't remove the zero-expression genes or cells.

--transpose/-t option

In the description it says:

 input                 Input is raw count data in TSV/CSV or H5AD (anndata)
                        format. Row/col names are mandatory. Note that TSV/CSV
                        files must be in gene x cell layout where rows are
                        genes and cols are cells (scRNA-seq convention).Use
                        the -t/--transpose option if your count matrix in cell
                        x gene layout. H5AD files must be in cell x gene
                        format (stats and scanpy convention).

  -t, --transpose       Transpose input matrix (default: False)

my csv files are in cellxgene format

I used code:
dca -t input.csv out_name
But in the log file, I found that dca still treated gene as cell and cell as genes

Am I doing sth wrong? thanks!

dca îs not a package

I tried installing dca using conda, pip3 and I am getting the following error when I try to load the package. I am using vscode for my scripting.

from dca.api import dca
ModuleNotFoundError: No module named 'dca.api'; 'dca' is not a package

Thank you.

inconsistent requrement

Hi,

Recent updates on scanpy made this package inconsistent with it.
If you use scanpy>=1.1a (given an arbitrary input), on saving stage you will face:

  File "/usr/local/lib/python3.5/site-packages/DCA/train.py", line 168, in train_with_args
    net.predict(adata, predict_columns)
  File "/usr/local/lib/python3.5/site-packages/DCA/network.py", line 404, in predict
    res = super().predict(adata, colnames=colnames, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/DCA/network.py", line 220, in predict
    sc.write('output', adata)
  File "/usr/local/lib/python3.5/site-packages/scanpy/readwrite.py", line 159, in write
    if is_valid_filename(filename):
  File "/usr/local/lib/python3.5/site-packages/scanpy/readwrite.py", line 510, in is_valid_filename
    elif ext[-1][1:] in avail_exts:
IndexError: list index out of range

The source of problem is this commit of scanpy.

Regards :D

csv.gz support

Hello,

The DCA seems only supporting csv rather than csv.gz
I have very large csv files to begin with
Is it possible to support csv.gz?

Thanks!
Ray

Unable to run dca with mode='denoise'. 'ValueError: setting an array element with a sequence.'

Hello,

Thank you for developing this package! I am able to run it with mode='latent', but not with mode='denoise'. I have tried installing different versions of numpy and tensorflow, but nothing seems to work. I am using the following code.

import numpy as np
import pandas as pd
import scanpy as sc
from dca.api import dca

adata = sc.read_10x_mtx(
    './data/filtered_gene_bc_matrices/hg19/', #following example in scanpy tutorial
    var_names='gene_symbols',               
    cache=True)     

sc.pp.filter_genes(adata, min_counts=1)
dca(adata, mode='denoise', return_info=True)

The error message includes the following.

//anaconda3/envs/scenv/lib/python3.6/site-packages/dca/api.py in dca(adata, mode, ae_type, normalize_per_cell, scale, log1p, hidden_size, hidden_dropout, batchnorm, activation, init, network_kwds, epochs, reduce_lr, early_stop, batch_size, optimizer, learning_rate, random_state, threads, verbose, training_kwds, return_model, return_info, copy)
193
194 hist = train(adata[adata.obs.dca_split == 'train'], net, **training_kwds)
--> 195 res = net.predict(adata, mode, return_info, copy)
196 adata = res if copy else adata
197

//anaconda3/envs/scenv/lib/python3.6/site-packages/dca/network.py in predict(self, adata, mode, return_info, copy, colnames)
402 name='mean')(self.decoder_output)
403 output = ColwiseMultLayer([mean, self.sf_layer])
--> 404 output = SliceLayer(0, name='slice')([output, disp, pi])
405
406 zinb = ZINB(pi, theta=disp, ridge_lambda=self.ridge, debug=self.debug)

//anaconda3/envs/scenv/lib/python3.6/site-packages/dca/network.py in predict(self, adata, mode, return_info, copy)
200 adata.uns['dca_loss'] = self.model.test_on_batch({'count': adata.X,
201 'size_factors': adata.obs.size_factors},
--> 202 adata.raw.X)
203
204 if mode in ('latent', 'full'):

//anaconda3/envs/scenv/lib/python3.6/site-packages/keras/engine/training.py in test_on_batch(self, x, y, sample_weight)
1486 ins = x + y + sample_weights
1487 self._make_test_function()
-> 1488 outputs = self.test_function(ins)
1489 return unpack_singleton(outputs)
1490

//anaconda3/envs/scenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2977 return self._legacy_call(inputs)
2978
-> 2979 return self._call(inputs)
2980 else:
2981 if py_any(is_tensor(x) for x in inputs):

//anaconda3/envs/scenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2915 array_vals.append(
2916 np.asarray(value,
-> 2917 dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
2918 if self.feed_dict:
2919 for key in sorted(self.feed_dict.keys()):

//anaconda3/envs/scenv/lib/python3.6/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)

ValueError: setting an array element with a sequence.

This is the problem I have with mode='denoise'. I could run dca with mode='latent', but I would need the mean output, which I could not find in the anndata object even if return_info=True.

Thank you!

I found that the tutorial page is broken

I cannot find dca code in the tutorial page. Could you please fix it? Thanks a lot!

DCA optimizer KeyError

I just installed the dca extension to use in scanpy and came across an issue regarding the optimizer. The issue occurs regardless of the optimizer chosen. As the command line option of dca does not support data in h5 format I did not try this option so far.

The error occurred for any dataset used so I will not include it and just reference to a scanpy adata object with raw counts.

import scanpy.external as sce
sce.pp.dca(adata)

Gives me the following trace:

/Users/marcglettig/miniconda3/envs/mg/lib/python3.8/site-packages/kopt/config.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  _config = yaml.load(open(_config_path))
dca: Successfully preprocessed 17374 genes and 3844 cells.
WARNING:tensorflow:From /Users/marcglettig/miniconda3/envs/mg/lib/python3.8/site-packages/keras/layers/normalization.py:524: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2021-05-23 12:11:17,344 [WARNING] From /Users/marcglettig/miniconda3/envs/mg/lib/python3.8/site-packages/keras/layers/normalization.py:524: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/marcglettig/miniconda3/envs/mg/lib/python3.8/site-packages/dca/train.py:41: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-05-23 12:11:18,311 [WARNING] From /Users/marcglettig/miniconda3/envs/mg/lib/python3.8/site-packages/dca/train.py:41: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-8e2e71f66817> in <module>
----> 1 sce.pp.dca(adata)

~/miniconda3/envs/mg/lib/python3.8/site-packages/scanpy/external/pp/_dca.py in dca(adata, mode, ae_type, normalize_per_cell, scale, log1p, hidden_size, hidden_dropout, batchnorm, activation, init, network_kwds, epochs, reduce_lr, early_stop, batch_size, optimizer, random_state, threads, learning_rate, verbose, training_kwds, return_model, return_info, copy)
    152         raise ImportError('Please install dca package (>= 0.2.1) via `pip install dca`')
    153 
--> 154     return dca(
    155         adata,
    156         mode=mode,

~/miniconda3/envs/mg/lib/python3.8/site-packages/dca/api.py in dca(adata, mode, ae_type, normalize_per_cell, scale, log1p, hidden_size, hidden_dropout, batchnorm, activation, init, network_kwds, epochs, reduce_lr, early_stop, batch_size, optimizer, learning_rate, random_state, threads, verbose, training_kwds, return_model, return_info, copy, check_counts)
    199     }
    200 
--> 201     hist = train(adata[adata.obs.dca_split == 'train'], net, **training_kwds)
    202     res = net.predict(adata, mode, return_info, copy)
    203     adata = res if copy else adata

~/miniconda3/envs/mg/lib/python3.8/site-packages/dca/train.py in train(adata, network, output_dir, optimizer, learning_rate, epochs, reduce_lr, output_subset, use_raw_as_output, early_stop, batch_size, clip_grad, save_weights, validation_split, tensorboard, verbose, threads, **kwds)
     53 
     54     if learning_rate is None:
---> 55         optimizer = opt.__dict__[optimizer](clipvalue=clip_grad)
     56     else:
     57         optimizer = opt.__dict__[optimizer](lr=learning_rate, clipvalue=clip_grad)

KeyError: 'rmsprop'

AssertionError: DCA claims adata does not contain counts

I'm trying to run DCA on count data, but I keep getting the error that my AnnData object allegedly does not contain count data. This is however not the case...

Here's a screenshot of the issue:

theislab / dca Goto Github PK

dca's Introduction

Deep count autoencoder for denoising scRNA-seq data

Installation

pip

conda

Usage

Results

Hyperparameter optimization

dca's People

Contributors

Stargazers

Watchers

Forkers

dca's Issues

from dca.api import dca ModuleNotFoundError: No module named 'dca.api'; 'dca' is not a package

Recommend Projects

Recommend Topics

Recommend Org

Jobs

from dca.api import dca
ModuleNotFoundError: No module named 'dca.api'; 'dca' is not a package