GithubHelp home page GithubHelp logo

biosyy / daism-xmbd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xmuyulab/daism-xmbd

1.0 0.0 0.0 9.67 MB

Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks

License: MIT License

Dockerfile 0.87% Python 98.30% Shell 0.83%

daism-xmbd's Introduction

DAISM-DNN

We propose data augmentation through in silico mixing with deep neural networks (DAISM-DNN) to achieve highly accurate and unbiased immune-cell proportion estimation from bulk RNA sequencing (RNA-seq) data. Our method tackles the batch effect problem by creating a data-specific training dataset from a small subset of calibration samples with ground truth cell proportions which is further augmented with publicly available RNA-seq data from purified cells, single-cell RNA-seq (scRNA-seq) data or CITE-seq data.

A pre-print describing the method is available at bioRxiv: DAISM-DNN^XMBD^: Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks

Installation

Pip

It is recommended to create a new conda environment:

conda create -n daism python=3.7

Activate this environment:

conda activate daism

And run the following command to install daism via pip:

pip install daism

The main dependencies listing below will be installed with daism.

python (v3.7.7)
torch (v1.5.1)
pandas (v1.2.4)
numpy (v1.18.1)
scikit-learn (v0.24.2)
argh (v0.26.2) 
anndata (v0.7.6)
scanpy (v1.8.1)
tqdm (v4.46.0)

Docker

We provide a docker image with DAISM-DNN installed: https://hub.docker.com/r/zoelin1130/daism

Pull the docker image:

docker pull zoelin1130/daism:1.0

Create a container (GPU):

docker run --gpus all -i -t --name run_daism -v example/:/workspace/example/ zoelin1130/daism:1.0 /bin/bash

Create a container (CPU):

docker run -i -t --name run_daism -v example/:/workspace/example/ zoelin1130/daism:1.0 /bin/bash

run_daismis your container name. It is strongly recommended to add -v parameter for implementing data and scripts mounting: mount the local volume example (from your machine) to /workspace/example/ (to your container) instead of directly copy them into the container.

Cell Types Supported

The example we provide contains the following cell types. The purified dataset for data augmentation can be downloaded from:https://figshare.com/s/3c230f06565e0a1cccc1

pbmc8k.h5ad contains 5 cell types: B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells.

pbmc8k_fine.h5ad contains 11 cell types: naive.B.cells, memory.B.cells, naive.CD4.T.cells, memory.CD4.T.cells,naive.CD8.T.cells, memory.CD8.T.cells, regulatory.T.cells, monocytes, macrophages, myeloid.dendritic.cells, NK.cells.

RNA_TPM_coarse.h5ad contains 5 cell types: B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells.

Note: each cell type needs to be named according to above format.

DAISM-DNN can support the prediction of any cell types, as long as calibration samples with ground truth and purified expression profiles of corresponding cell types are provided.

Usage

In our example below, we set working directory to daism. Use -h to print out help information on DAISM-DNN modules.

# If you git clone the repository to local, you can use the following command.
cd daism
python daism.py -h

# if you installed daism via pip, you can use following command to print out help information.
daism -h

DAISM-DNN consists of four modules:

DAISM modules:

daism DAISM -platform S -caliexp ../example/caliexp.txt -califra ../example/califra.txt -aug ../example/pbmc8k.h5ad -N 16000 -testexp ../example/testexp.txt -net coarse -outdir output/

DAISM is a one-stop mode to run DAISM-DNN, which integrates simulation, training and prediction in one module. Example: we use pbmc8k.h5ad, a single cell RNA-seq dataset, as purified samples for data augmentation and put it under the example directory. So we use S for platform parameter. The calibration data is an RNA-seq expression profile caliexp.txt. And we use coarse network architecture. (We have two network architectures, namely coarse and fine.)

simulation modules:

We have two training set simulation modules. One is DAISM_simulation which using DAISM strategy in generating mixtures.

daism DAISM_simulation -platform S -caliexp ../example/caliexp.txt -califra ../example/califra.txt -aug ../example/pbmc8k.h5ad -N 16000 -testexp ../example/testexp.txt -outdir ./

The other is Generic_simulation which generates training set only using purified cells.

daism Generic_simulation -platform S -aug ../example/pbmc8k.h5ad -N 16000 -testexp ../example/testexp.txt -outdir ./

training modules:

# If you use DAISM_simulation mode:
daism training -trainexp ./output/DAISM_mixsam.txt -trainfra ./output/DAISM_mixfra.txt -net coarse -outdir ./

# If you use Generic_simulation mode:
daism training -trainexp ./output/Generic_mixsam.txt -trainfra ./output/Generic_mixfra.txt -net coarse -outdir ./

We use the DAISM-generated mixtures DAISM_mixsam.txt and corresponding artificial cell fractions DAISM_mixfra.txt to train the neural networks.

prediction modules:

daism prediction -testexp ../example/testexp.txt -model ./output/DAISM_model.pkl -celltype ./output/DAISM_celltypes.txt -feature ./output/DAISM_feature.txt -net coarse -outdir ./

Both the result file and the process files will be saved in the output folder.

daism-xmbd's People

Contributors

cxli914 avatar steinslee avatar wx115 avatar zoelin1130 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.