GithubHelp home page GithubHelp logo

isabella232 / seurat-data Goto Github PK

View Code? Open in Web Editor NEW

This project forked from satijalab/seurat-data

0.0 0.0 0.0 180 KB

Dataset distribution for Seurat

License: GNU General Public License v3.0

R 100.00%

seurat-data's Introduction

SeuratData

SeuratData is a mechanism for distributing datasets in the form of Seurat objects using R's internal package and data management systems. It represents an easy way for users to get access to datasets that are used in the Seurat vignettes.

Installation

Installation of SeuratData can be accomplished through devtools

devtools::install_github('satijalab/seurat-data')

Getting Started

When loading SeuratData, a list of all available datasets will be displayed (this is similar to other metapackages like tidyverse along with the version of Seurat used to create each dataset). This message can be suppressed with suppressPackageStartupMessages

> library(SeuratData)
── Installed datasets ───────────────────────────────────────────────────────────── SeuratData v0.1.0 ──
✔ cbmc   3.0.0panc8  3.0.0ifnb   3.0.0pbmc3k 3.0.0

───────────────────────────────────────────────── Key ──────────────────────────────────────────────────
✔ Dataset loaded successfully

To see a manifest of all available datasets, use AvailableData; this manifest will update as new datasets are uploaded to our data repository.

> AvailableData()
                     Dataset Version                                                        Summary species            system ncells                                                            tech         notes Installed InstalledVersion
cbmc.SeuratData         cbmc   3.0.0                   scRNAseq and 13-antibody sequencing of CBMCs   human CBMC (cord blood)   8617                                                        CITE-seq          <NA>      TRUE            3.0.0
hcabm40k.SeuratData hcabm40k   3.0.0 40,000 Cells From the Human Cell Atlas ICA Bone Marrow Dataset   human       bone marrow  40000                                                          10x v2          <NA>     FALSE            3.0.0
ifnb.SeuratData         ifnb   3.0.0                              IFNB-Stimulated and Control PBMCs   human              PBMC  13999                                                          10x v1          <NA>      TRUE            3.0.0
panc8.SeuratData       panc8   3.0.0               Eight Pancreas Datasets Across Five Technologies   human Pancreatic Islets  14892                SMARTSeq2, Fluidigm C1, CelSeq, CelSeq2, inDrops          <NA>      TRUE            3.0.0
pbmc3k.SeuratData     pbmc3k   3.0.0                                     3k PBMCs from 10X Genomics   human              PBMC   2700                                                          10x v1          <NA>      TRUE            3.0.0
pbmcsca.SeuratData   pbmcsca   3.0.0           Broad Institute PBMC Systematic Comparative Analysis   human              PBMC  31021 10x v2, 10x v3, SMARTSeq2, Seq-Well, inDrops, Drop-seq, CelSeq2 HCA benchmark     FALSE            3.0.0

Installation of datasets can be done with InstallData; this function will accept either a dataset name (eg. pbmc3k) or the corresponding package name (eg. pbmc3k.SeuratData). InstallData will automatically attach the installed dataset package so one can immediately load and use the dataset.

> InstallData("pbmc3k")

Loading a dataset is done using the data function

> data("pbmc3k")
> pbmc3k
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features)

Dataset documentation and information

All datasets provided have help pages built for them. These pages are accessed using the standard help function

> ?pbmc3k
> ?ifnb

A full command list for the steps taken to generate each dataset is present in the examples section of these help pages.

Packages will also often have citation information bundled with the package. Citation information can be accessed by passing the package name, not the dataset name, to the citation function

> citation('cbmc.SeuratData')

To cite the CBMC dataset, please use:

  Stoeckius et al. Simultaneous epitope and transcriptome measurement in
  single cells. Nature Methods (2017)

A BibTeX entry for LaTeX users is

  @Article{,
    author = {Marlon Stoeckius and Christoph Hafemeister and William Stephenson and Brian Houck-Loomis and Pratip K Chattopadhyay and Harold Swerdlow and Rahul Satija and Peter Smibert},
    title = {Simultaneous epitope and transcriptome measurement in single cells},
    journal = {Nature Methods},
    year = {2017},
    doi = {10.1038/nmeth.4380},
    url = {https://www.nature.com/articles/nmeth.4380},
  }

Rationale and Implementation

We created SeuratData in order to distribute datasets for Seurat vignettes in as painless and reproducible a way as possible. We also wanted to give users the flexibility to selectively install and load datasets of interest, to minimize disk storage and memory use.

To accomplish this, we opted to distribute datasets through individual R packages. Under the hood, SeuratData uses and extends standard R functions, such as install.packages for dataset installation, available.packages for dataset listing, and data for dataset loading.

SeuratData therefore serves as a more specific package manager (similar to a metapackage) for R. We provide wrappers around R's package management functions, extend them to provide relevant metadata about each dataset, and set default settings (for example, the repository where data is stored) to facilitate easy installation.

seurat-data's People

Contributors

mojaveazure avatar austinhartman avatar andrewwbutler avatar kant avatar kou avatar bbimber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.