GithubHelp home page GithubHelp logo

ilia-kats / mudata Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 4.0 6.95 MB

MuData-compatible storage for bioconductor's MultiAssayExperiment

Home Page: https://ilia-kats.github.io/MuData/

R 100.00%
bioconductor scrna-seq multimodal-omics multi-omics anndata mudata

mudata's Introduction

MuData

R-CMD-check pkgdown

Documentation | Preprint | Discord

MuData is a package that provides I/O functionality for .h5mu files and MultiAssayExperiment objects.

You can learn more about multimodal data containers in the reference mudata documentation.

Installation

MuData uses rhdf5 to access .h5mu and .h5ad files. In the meantime, the bioc-devel version of rhdf5 must be used.

rhdf5 and MuData can be installed by running

remotes::install_github("grimbough/rhdf5")
remotes::install_github("ilia-kats/MuData")

We use rhdf5 over hdf5r to stay compatible with the rest of the Bioconductor ecosystem. In particular, using hdf5r would make integrating with other packages building on rhdf5, such as HDF5Array, much more difficult, if not impossible. We have implemented necessary HDF5 features that the .h5ad and consequently .h5mu formats make use of upstream, including file creation properties and object references.

Quick start

MuData provides a set of I/O operations for multimodal data.

MuData implements WriteH5MU() that saves MultiAssayExperiment objects to .h5mu files that can be further integrated into workflows in multiple programming languages, including the muon Python library and the Muon.jl Julia library. ReadH5MU() reads .h5mu files into MultiAssayExperiment objects.

Writing files

Start with an existing dataset, e.g. a MultiAssayExperiment object with five distinct modalities:

library(MultiAssayExperiment)
data(miniACC)

WriteH5MU() allows to save the object into a .h5mu file:

library(MuData)
WriteH5MU(miniACC, "miniACC.h5mu")

Reading files

miniACC <- ReadH5MU("miniACC.h5mu")

Relevant projects

Other R packages for multimodal I/O include:

mudata's People

Contributors

gtca avatar ilia-kats avatar votti avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mudata's Issues

`Unknown encoding dict`

First of all, thanks for writing this, pure-R IO for anndata is going to be incredibly useful!

I have some anndata files that I created with R (using zellkonverter, so they might be a bit unusual). When I try to load them, I get the following:

> MuData::readH5AD("/vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/0.2/original/6d526063e52b119fc18ac359cc7a5667.h5ad")
Error in .local(x, ..., value) : 
  replacement 'metadata' value must be a list
In addition: Warning message:
In read_group(h5autoclose(view & "uns")) : Unknown encoding dict

It fails right at the end of read_modality:

metadata(se) <- read_group(h5autoclose(view & "uns"))

Any idea what's going wrong here? When I try to explore the file with h5ls, I get:

$ h5ls --data /vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/0.2/original/6d526063e52b119fc18ac359cc7a5667.h5ad/uns/ 
X_name                   Dataset {SCALAR}
    Data:
        (0) "X"

I'm not really sure what this means, but it doesn't look like a dictionary to me.

NaN in categorical column fails reading

I have an mudata/anndata dataset exported with anndata=0.7.8.

When trying to read it, I get the error reading the var:

Error in factor(as.integer(values), labels = labels_items): invalid 'labels'; length 4732 should be 1 or 4733
Traceback:

1. readH5AD(file)
2. read_modality(h5, backed)
3. read_with_index(h5autoclose(view & "var"))
4. read_dataframe(dataset)
5. lapply(columnorder, function(name) {
 .     col <- group & name
 .     values <- read_attribute(col)
 .     if (H5Aexists(col, "categories")) {
 .         attr <- H5Aopen(col, "categories")
 .         labels <- H5Aread(attr)
 .         if (!is(labels, "H5Ref")) {
 .             warning("found categories attribute for column ", 
 .                 name, ", but it is not a reference")
 .         }
 .         else {
 .             labels <- H5Rdereference(labels, h5loc = col)
 .             labels_items <- H5Dread(labels)
 .             n_labels <- length(unique(values))
 .             if (length(labels_items) > n_labels) {
 .                 labels_items <- labels_items[seq_len(n_labels)]
 .             }
 .             values <- factor(as.integer(values), labels = labels_items)
 .             H5Dclose(labels)
 .         }
 .         H5Aclose(attr)
 .     }
 .     H5Dclose(col)
 .     values
 . })
6. FUN(X[[i]], ...)
7. factor(as.integer(values), labels = labels_items)
8. stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
 .     nlab, length(levels)), domain = NA)

I found the reason was that the column contains NA that are represented as -1 in the categorical values but do not have a matching label in the categories.

Would you be interested in a PR with a fix?

" The provided H5Identifier is not a dataset identifier."

Hi there, thanks for the useful package!
I have run into the following error when loading a .h5mu file exported from pythons mudata.
The mudata file was exported with anndata=0.8.0 and mudata=0.2.3.

Is this another instance of the changed anndata on-disk format?

Error:

Warning message in read_group(attr):
“Unknown encoding categorical”
Error: Error in h5checktype(). The provided H5Identifier is not a dataset identifier.
Traceback:

1. readH5MU(fn_mudata)
2. read_with_index(h5autoclose(h5 & "obs"))
3. read_dataframe(dataset)
4. lapply(columnorder, function(name) {
 .     col <- group & name
 .     values <- read_attribute(col)
 .     if (H5Aexists(col, "categories")) {
 .         attr <- H5Aopen(col, "categories")
 .         labels <- H5Aread(attr)
 .         if (!is(labels, "H5Ref")) {
 .             warning("found categories attribute for column ", 
 .                 name, ", but it is not a reference")
 .         }
 .         else {
 .             labels <- H5Rdereference(labels, h5loc = col)
 .             labels_items <- H5Dread(labels)
 .             n_labels <- length(unique(values))
 .             if (length(labels_items) > n_labels) {
 .                 labels_items <- labels_items[seq_len(n_labels)]
 .             }
 .             values <- factor(as.integer(values), labels = labels_items)
 .             H5Dclose(labels)
 .         }
 .         H5Aclose(attr)
 .     }
 .     H5Dclose(col)
 .     values
 . })
5. FUN(X[[i]], ...)
6. H5Dclose(col)
7. h5checktype(h5dataset, "dataset")
8. stop("Error in ", fctname, ". The provided H5Identifier is not a dataset identifier.", 
 .     call. = FALSE)

Error in is.factor(values) && levels(values) == c("FALSE", "TRUE") when using readH5AD

Dear developers,

I am trying to read an h5ad file into R with MuData and encountered the following error:

Error in is.factor(values) && levels(values) == c("FALSE", "TRUE") : 
  'length = 2' in coercion to 'logical(1)'

This is my command:

library(Seurat)
library(MuData)
library(tidyverse)

fca_head <- readH5AD('s_fca_biohub_head_10x.h5ad')

I am not sure what's going wrong. This dataset can be loaded by scanpy.read_h5ad in python. I also tried to down-sample a subset of cells of this dataset in python and save a new h5ad. The down-sampled file cannot be read in R either.

The file s_fca_biohub_head_10x.h5ad is available from Fly cell atlas: https://cloud.flycellatlas.org/index.php/s/LAEybPc2HZnpzKs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.