ebecht / dr_benchmark Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
This repository contains the code to reproduce our dimensionality reduction methods benchmark, with a focus on single cells. The same code bundled with input data and computations' outputs is available at https://figshare.com/s/9c3a0136f12b97f1dadd Our article is now published at https://www.nature.com/articles/nbt.4314 and an earlier version is available as a pre-print at https://www.biorxiv.org/content/early/2018/04/10/298430 To run the benchmark If you want to re-run the dimensionality reduction methods: - Install python3 and the umap-learn package (using pip or https://github.com/lmcinnes/umap) - Install FItSNE (https://github.com/KlugerLab/FIt-SNE) then copy the FIt-SNE folder to this directory. Edit the fast_tsne.R file and replace the default "fast_tsne_path" argument by "../FIt-SNE/bin/fast_tsne" - Install scvis (https://bitbucket.org/jerry00/scvis-dev) Please remember that every script requires you to set the working directory to the one where the script lies before running it If you only want to re-generate the graphs, all intermediary results are saved: most of the heavy computations' results are saved and bundled together with the code. The full benchmark takes about a week to complete on our 8 core computer, but if you only benchmark UMAP and t-SNE it should only take a few hours. Most scripts are either to run dimensionality reduction or to make figures, usually not both. Here is a brief summary of what every script does : ** Main figure scripts ** * Generate the embeddings and corresponding main figures of the paper ./Han/data_parsing.R : Parsing input data for the Han[Bone marrow + PBMC] dataset ./Han/graphs_full_dataset.R : Filtering the Han[Bone marrow + PBMC] dataset ./Han/DR_on_data_subset.R : Dimensionality reduction on the subsetted Han[Bone marrow + PBMC] dataset ./Han/graphs_paper.R : Graphs for the Han[Bone marrow + PBMC] dataset (Fig 2CDEF) ./Samusik/Samusik_01_data_load_and_DR.R : Parsing, dimensionality reduction and graphs for the Samusik_01 dataset (Fig 2AB) ./Wong/RunFCStSNEone TraffickTNK 10k.R : Downsample to 10k cells per sample, run t-SNE and UMAP on the Wong dataset. This is a script used within our lab meant for non-programmers to use these tools and it is shared as is. This script is an interface to functions definned in./Wong/FCStSNEone.R ./Wong/data_parsing.R : Data parsing for the Wong datasets ./Wong/Wong.R : Graphs for the Wong datasets (Fig 1 ABCD) ./Wong/FCStSNEone.R : Not to be ran ** Extended benchmark scripts [Supplementary Notes] ** * Parse inputs for the benchmark [Supplementary notes and most of the supplementary figures] ./Han_400k/data_parsing_full_dataset.R : Parsing the Han_400k dataset ./Han_400k/DR.R : Running UMAP, t-SNE, FIt-SNE, FIt-SNE l.e. on the Han_400k dataset ./Samusik/Samusik_all_data_load_and_DR.R : Parsing the Samusik_all dataset and running UMAP, t-SNE, FIt-SNE, FIt-SNE l.e. * Generate embeddings for the benchmark ./Benchmark_global/DR.R : Generates embeddings for subsamples t-SNE, FIt-SNE, FIt-SNE l.e. and UMAP for the Wong, Han_400k and Samusik datasets ./Benchmark_global/scvis.R : Run scvis on the above datasets ./Benchmark_global/rdata/Single seeds/combine_seeds.R : This is to recompile results from the benchmark if it could not be completed in one go. This also adds scvis.R outputs to DR.R outputs * Generate plots for the benchmark ./Benchmark_global/graphs_benchmark.R [Fig 3, Fig 4b, Fig 5, Fig 6, Fig S5, Fig S6] ./Benchmark_global/random_forests.R [Fig 4A] * Misc ./Benchmark_global/tSNE_parameters.R : Small benchmark perplexity and max iterations for t-SNE ./FIt-SNE/fast_tsne.R : R wrapper for FIt-SNE. Install using release on https://github.com/KlugerLab/FIt-SNE . ./utils.R : Some functions that are used throughout the other scripts. You may need to specify your python executable here (default to /usr/bin/python3)
Hi, I'm new to R and thank you for contributions! I'd like to rerun the codes but need some help.
In Wong.R in Wong file, we need to use library en.lab in line 2. but I cannot find it on the website. Is it a typo or which is the package I need to download?
Thank you in advance!
Hi,
Thanks for this interesting improvement of dimension reduction and sharing the code.
I noticed that there is no cofactor when transforming the CyTOF data of Samusik (I didn't check the others). Any reason for that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.