GithubHelp home page GithubHelp logo

Comments (7)

martinjzhang avatar martinjzhang commented on August 22, 2024 2

Fixed. The issue is due to a small discrepancy between different pandas versions.
#85

from scdrs.

martinjzhang avatar martinjzhang commented on August 22, 2024

Hi, v1.0.3 is in the main branch. We may have updated the test data. Can you install from the main branch and run the tests again?

from scdrs.

hoholee avatar hoholee commented on August 22, 2024

Same error with v1.0.3:

$ python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items

tests/test_CLI.py F..                                                                                                                                                                                                                  [100%]

================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________

    def test_score_cell_cli():
        """
        Test CLI `scdrs compute-score`
        """
        # Load toy data
        ROOT_DIR = scdrs.__path__[0]
        H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
        COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
        assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
        assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"

        tmp_dir = tempfile.TemporaryDirectory()
        tmp_dir_path = tmp_dir.name
        dict_df_score = {}
        for gs_species in ["human", "mouse"]:
            gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
            # call compute_score.py
            cmds = [
                f"scdrs compute-score",
                f"--h5ad_file {H5AD_FILE}",
                "--h5ad_species mouse",
                f"--gs_file {gs_file}",
                f"--gs_species {gs_species}",
                f"--cov_file {COV_FILE}",
                "--ctrl_match_opt mean_var",
                "--n_ctrl 20",
                "--flag_filter_data False",
                "--weight_opt vs",
                "--flag_raw_count False",
                "--flag_return_ctrl_raw_score False",
                "--flag_return_ctrl_norm_score False",
                f"--out_folder {tmp_dir_path}",
            ]
            subprocess.check_call(" ".join(cmds), shell=True)
            dict_df_score[gs_species] = pd.read_csv(
                os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
                sep="\t",
                index_col=0,
            )
        # consistency between human and mouse
        assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)

        df_res = dict_df_score["mouse"]

        REF_COV_FILE = os.path.join(
            ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
        )
        df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
>       compare_score_file(df_res, df_ref_res)

tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

df_res =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -1.627243  1.000000  0.956739     0.019207  -1.714034
df_res_ref =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -2.305674  1.000000  0.991680     0.003628  -2.394591

    def compare_score_file(df_res, df_res_ref):
        """
        Compare df_res
        """

        col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
        for col in col_list:
            v_ = df_res[col].values
            v_ref = df_res_ref[col].values
            err_msg = "Inconsistent values: {}\n".format(col)
            err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
                "OBS", "REF", "DIF", "REL_DIF"
            )
            for i in range(v_.shape[0]):
                err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
                    v_[i],
                    v_ref[i],
                    v_[i] - v_ref[i],
                    np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
                )
>           assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E           AssertionError: Inconsistent values: norm_score
E             |      OBS      |      REF      |      DIF      |    REL_DIF    |
E             |   4.445e+00   |   6.326e+00   |  -1.881e+00   |   2.973e-01   |
E             |   6.038e+00   |   5.916e+00   |   1.216e-01   |   2.056e-02   |
E             |   4.697e+00   |   5.552e+00   |  -8.552e-01   |   1.540e-01   |
E             |   5.186e+00   |   7.299e+00   |  -2.112e+00   |   2.894e-01   |
E             |   6.072e+00   |   5.779e+00   |   2.927e-01   |   5.065e-02   |
E             |  -6.976e-01   |  -5.614e-01   |  -1.362e-01   |   2.427e-01   |
E             |  -1.192e+00   |  -1.582e+00   |   3.897e-01   |   2.463e-01   |
E             |  -2.219e+00   |  -2.312e+00   |   9.325e-02   |   4.033e-02   |
E             |   1.216e+00   |   1.157e+00   |   5.952e-02   |   5.146e-02   |
E             |  -4.155e+00   |  -3.166e+00   |  -9.896e-01   |   3.126e-01   |
E             |   2.262e+00   |   1.505e+00   |   7.576e-01   |   5.035e-01   |
E             |  -2.240e+00   |  -3.798e+00   |   1.558e+00   |   4.102e-01   |
E             |   7.692e-01   |   1.052e+00   |  -2.824e-01   |   2.686e-01   |
E             |   2.888e-01   |  -1.237e-01   |   4.126e-01   |   3.334e+00   |
E             |  -4.752e-01   |  -8.706e-01   |   3.954e-01   |   4.541e-01   |
E             |  -3.281e+00   |  -3.768e+00   |   4.869e-01   |   1.292e-01   |
E             |  -1.792e+00   |  -2.232e+00   |   4.397e-01   |   1.970e-01   |
E             |  -7.435e-01   |  -6.558e-01   |  -8.775e-02   |   1.338e-01   |
E             |  -3.577e-01   |  -4.232e-01   |   6.545e-02   |   1.547e-01   |
E             |  -1.968e+00   |  -2.191e+00   |   2.235e-01   |   1.020e-01   |
E             |  -3.799e-01   |  -2.172e-01   |  -1.626e-01   |   7.487e-01   |
E             |   7.900e-02   |  -1.761e-01   |   2.551e-01   |   1.449e+00   |
E             |   8.555e-01   |   7.654e-01   |   9.011e-02   |   1.177e-01   |
E             |  -2.135e-01   |  -3.305e-01   |   1.170e-01   |   3.541e-01   |
E             |  -1.905e+00   |  -2.228e+00   |   3.232e-01   |   1.451e-01   |
E             |  -3.454e+00   |  -2.705e+00   |  -7.495e-01   |   2.771e-01   |
E             |  -2.037e+00   |  -2.207e+00   |   1.692e-01   |   7.670e-02   |
E             |  -4.795e-01   |  -3.563e-01   |  -1.232e-01   |   3.458e-01   |
E             |  -2.691e+00   |  -3.141e+00   |   4.506e-01   |   1.434e-01   |
E             |  -1.627e+00   |  -2.306e+00   |   6.784e-01   |   2.942e-01   |
E
E           assert False
E            +  where False = <function allclose at 0x7f4a28366270>(array([ 4.4454584 ,  6.037902  ,  4.6971283 ,  5.186194  ,  6.071957  ,\n       -0.6976079 , -1.1924832 , -2.2186813 , ...900415,  0.8554982 , -0.21349816, -1.9051081 ,\n       -3.4541266 , -2.037314  , -0.47953042, -2.690723  , -1.6272427 ]), array([ 6.3260064 ,  5.916272  ,  5.5523157 ,  7.2986684 ,  5.7792473 ,\n       -0.5613674 , -1.5821338 , -2.3119287 , ...612725,  0.7653889 , -0.33054087, -2.228345  ,\n       -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E            +    where <function allclose at 0x7f4a28366270> = np.allclose

tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.1s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.1s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.1s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.4s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 272.68it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 286.57it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 37.78s ========================================================================================================

from scdrs.

hoholee avatar hoholee commented on August 22, 2024

I've also tried scDRS v.1.0.3 with multiple versions of Python (3.8-3.12), and the test only passed with Python 3.8 for some reason:

python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.19, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
plugins: anyio-3.7.1
collected 3 items

tests/test_CLI.py ...                                                                                                                                                                                                                  [100%]

============================================================================================================= 3 passed in 46.72s =============================================================================================================

from scdrs.

KangchengHou avatar KangchengHou commented on August 22, 2024

Somewhat strangely, I couldn't replicate this error using either python 3.9 / 3.10.

For example in https://colab.google/ (3.10)

!python --version
!pip install git+https://github.com/martinjzhang/scDRS.git

import os
import pandas as pd
import scdrs

DATA_PATH = scdrs.__path__[0]
H5AD_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.cov")
GS_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.gs")

# Load .h5ad file, .cov file, and .gs file
adata = scdrs.util.load_h5ad(H5AD_FILE, flag_filter_data=False, flag_raw_count=False)
df_cov = pd.read_csv(COV_FILE, sep="\t", index_col=0)
df_gs = scdrs.util.load_gs(GS_FILE)

# Preproecssing .h5ad data compute scDRS score
scdrs.preprocess(adata, cov=df_cov)
gene_list = df_gs['toydata_gs_mouse'][0]
gene_weight = df_gs['toydata_gs_mouse'][1]
df_res = scdrs.score_cell(adata, gene_list, gene_weight=gene_weight, n_ctrl=20)

print(df_res.iloc[:4])

from scdrs.

hoholee avatar hoholee commented on August 22, 2024

Strange indeed... Maybe something is wrong with my conda. But I can't think of any reason why only the norm_score is affected and why this is Python version-dependent.

Thanks for the efforts in pinpointing the issue. I'm closing this for now unless someone else runs into this. But I'd recommend updating the installation instructions in the tutorial to v.1.0.3.

from scdrs.

martinjzhang avatar martinjzhang commented on August 22, 2024

I replicated this issue (with the exact norm_score values as @hoholee's) using conda + py39 on a local HPC. This might be a Python version issue. I will look into this matter further.

from scdrs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.