GithubHelp home page GithubHelp logo

Add SpatialData as output about spatialvi HOT 20 CLOSED

nf-core avatar nf-core commented on September 28, 2024 2
Add SpatialData as output

from spatialvi.

Comments (20)

grst avatar grst commented on September 28, 2024 2

made a PR to spatiadata_io

from spatialvi.

cavenel avatar cavenel commented on September 28, 2024 1

I don't think the scanpy.read_visium function handles 2.1.0, specifically not the cytassist true/false format.

Hmmm I see, right now we use a "local" version of read_visium (see https://github.com/nf-core/spatialtranscriptomics/blob/1826d0137d980dfc0d33ee448ff535a1f48755e2/bin/read_st_data.py#L17), as scanpy doesn't have scverse/scanpy#2424 released yet. But this is only for Space Ranger 2.0 and not 2.1.0? Maybe we should limit the Space Ranger version to 2.0 then.

you don't need to convert anndata to spatialdata and back. If you want anndata, you can just do

Ok I understand. I was more thinking to keep the h5ad output (maybe optional), as many people might still use h5ad anndata files for downstream analysis. But then I guess we can do something like:

adata = sdata.tables["adata1"]
adata.write_h5ad(outputFilename)

Still a bit sad to loose the low res images in the exported file.

from spatialvi.

grst avatar grst commented on September 28, 2024 1

Not a big fan of limiting the max supported spaceranger version.

Do spatialdata-io and/or squidpy support spaceranger 2.1.0? Then we could use this to load the spaceranger output and convert it to whaterver format is needed.

the lowres images, as well as hires and all relevant metadata are stored in spatialdata, where transformations between coordinate systems is possible. What is stored in anndata with the original scanpy/squidpy functions is a subset (and not really actionable) of the image data and metadata

It should still be possible to build an AnnData object that is compatible with sc.pl.spatial from SpatialData without too much effort? I know that scanpy.spatial will go away eventually, but for a transition period it would still be good to support it.

from spatialvi.

giovp avatar giovp commented on September 28, 2024 1

Do spatialdata-io and/or squidpy support spaceranger 2.1.0? Then we could use this to load the spaceranger output and convert it to whaterver format is needed.

spatialdata-io is the only one that actually handles all the edge cases robustly, across spaceranger versions and configurations

It should still be possible to build an AnnData object that is compatible with sc.pl.spatial from SpatialData without too much effort?

yes, this is something I believed we discussed at some point in the past @LLehner @LucaMarconato but pretty low in priority list

from spatialvi.

giovp avatar giovp commented on September 28, 2024 1

I think a good question to ask, that might resolve the doubts is: who is this pipeline for?

  • if it is for scanpy analysts, that only look at the image once and only for visualization, then I think fixing squidpy/scanpy read visium function or using your own it's totally fine.
  • if this pipeline is for production-level processing (e.g. you'll have >100 datasets, supporting multiple spaceranger versions, as well as various kind of pathology microscopes, with images acquired at different magnifications, maybe having to integrate pathology info (e.g. segmentation)), then you really want to use spatialdata

from spatialvi.

LucaMarconato avatar LucaMarconato commented on September 28, 2024 1

How about adding a to_legacy_anndata() in SpatialData?

We could add this to spatialdata-io, I updated this issue to track it scverse/spatialdata-io#47. It should be quite straightforward to implement; if someone has time to make a PR I am happy to review.

from spatialvi.

fasterius avatar fasterius commented on September 28, 2024

I'm not sure if TissUUmaps is compatible with that data type, but it could potentially be included as an optional output format with a parameter - @cavenel ? Reading the docs I see that the format itself is still under active development and changes are to be expected, so I'm thinking that it'd be nice if it's a bit more stable for any addition to the pipeline.

from spatialvi.

grst avatar grst commented on September 28, 2024

It's true that it is still under active development. The image storage of SpatialData is based on the OME-NGFF and can be considered stable. The metadata part will still evolve. See the corresponding discusson on the scverse zulip.

On the scverse side, SpatialData and Squidpy are the future of spatial analysis. scanpy.spatial is already kind of unmaintained and will be deprecated not too far in the future.

from spatialvi.

cavenel avatar cavenel commented on September 28, 2024

Hi @grst!
I have been looking a bit at NGFF, SpatialData and zarr support. I see pros and cons in adding SpatialData output to the pipeline.
First, as @fasterius mentioned, it's quite new. I think it can be good to see how it evolves in the coming 6 months. Things like consolidated metadata are still being discussed (scverse/spatialdata#278).
Also, I am personally not a huge fan of zarr format for this kind of output. For big datasets, it means creating millions of files per sample, which can easily reach limits of some cluster systems, or even be a problem for file transfer. I understand the benefits for Cloud Object Storage like s3, but not sure it's the best for most users.

Let's keep this issue open for now and see how things evolve on this topic in the community, and maybe add it as an optional output format soon!

from spatialvi.

grst avatar grst commented on September 28, 2024

In principle, zarr supports various backends, including single-file formats such as .zip, .n5 or .sqlite: https://zarr.readthedocs.io/en/stable/api/storage.html

Not sure if this is supported by SpatialData/OME-NGFF, but at least technically it shouldn't be very hard to support it.

from spatialvi.

giovp avatar giovp commented on September 28, 2024

hi all, chiming in on the discussion prompted by @grst .

My understanding is that this pipeline only concerns the processing of 10x genomics Visium data. Couple of questions on this:

  • Will it support only 10x genomics visium or does it plan to support other type of techs (e.g. 10x xenium, vizgen merscope etc.) ?
  • Does it support all the versions of spaceranger outputs, and if yes, does it rely on squidpy/scanpy for that?

On the spatialdata support, while I understand the caution and agree that probably few months delay are useful, we (@scverse) do plan to only support spatialdata as data format for spatial transcriptomic technologies moving forward. The original anndata-spatial is in fact already limiting for the 10x visium case. For example, it does not support saving the full-res microscopy image which is the only useful piece of info to do any type of image analysis with visium (as the high-res and low-res are only useful for visualization).

As on the Zarr backend, while it's true that there shouldn't be limits in the scope (also on the ngff specs side), I do not see full support of formats other than zarr on the near future unfortunately. On this topic, we will follow what developments of the OME community moving forward.

from spatialvi.

melonora avatar melonora commented on September 28, 2024

Hi @grst! I have been looking a bit at NGFF, SpatialData and zarr support. I see pros and cons in adding SpatialData output to the pipeline. First, as @fasterius mentioned, it's quite new. I think it can be good to see how it evolves in the coming 6 months. Things like consolidated metadata are still being discussed (scverse/spatialdata#278). Also, I am personally not a huge fan of zarr format for this kind of output. For big datasets, it means creating millions of files per sample, which can easily reach limits of some cluster systems, or even be a problem for file transfer. I understand the benefits for Cloud Object Storage like s3, but not sure it's the best for most users.

Let's keep this issue open for now and see how things evolve on this topic in the community, and maybe add it as an optional output format soon!

Just wanted to chime in. To the best of my knowledge yes it can be that for a large dataset a lot of files are created, but this is also determined by the chunk size which can be adjusted.

from spatialvi.

cavenel avatar cavenel commented on September 28, 2024

Hi, after some thinking and discussing, here are some answers and more questions about SpatialData:

My understanding is that this pipeline only concerns the processing of 10x genomics Visium data. Couple of questions on this:

  • Will it support only 10x genomics visium or does it plan to support other type of techs (e.g. 10x xenium, vizgen merscope etc.) ?

We will stick to visium for this pipeline. We might rename it eventually to make that clearer.

  • Does it support all the versions of spaceranger outputs, and if yes, does it rely on squidpy/scanpy for that?

It will only support version 2.1.0, and relies on scanpy to convert to AnnData.

I am trying to understand what the best approach would be today if we wanted to export SpatialData instead of AnnData. And probably still keep the option to export AnnData at the end.

  • can we convert AnnData (h5ad) to SpatialData (zarr) and/or the other way around?
  • can we use scanpy to do downstream analysis (filtering, clustering, get spatially variable genes, etc.) directly on SpatialData objects?

from spatialvi.

giovp avatar giovp commented on September 28, 2024

It will only support version 2.1.0, and relies on scanpy to convert to AnnData.

I don't think the scanpy.read_visium function handles 2.1.0, specifically not the cytassist true/false format.

can we convert AnnData (h5ad) to SpatialData (zarr) and/or the other way around?

you don't need to convert anndata to spatialdata and back. If you want anndata, you can just do

import anndata as ad

adata = ad.zarr("/path/to/spatialdata.zarr/table/table")

what adata won't have, are the low res images in adata.uns["spatial"][<library_id>]["images"] (or scale_factors).

can we use scanpy to do downstream analysis (filtering, clustering, get spatially variable genes, etc.) directly on SpatialData objects?

yes

import scanpy as sc

sc.pp.normalize_total(sdata.tables["adata1"])
sc.pp.log1p(sdata.tables["adata1"])
...

from spatialvi.

giovp avatar giovp commented on September 28, 2024

Still a bit sad to loose the low res images in the exported file.

the lowres images, as well as hires and all relevant metadata are stored in spatialdata, where transformations between coordinate systems is possible. What is stored in anndata with the original scanpy/squidpy functions is a subset (and not really actionable) of the image data and metadata

from spatialvi.

cavenel avatar cavenel commented on September 28, 2024

I was also thinking of MultiQC output from the downstream analysis, and in particular if we want to use tools like checkatlas (#40 (comment) and becavin-lab/checkatlas#34). I think that would only work with the AnnData objects and not the SpatialData ones. If we go for a custom multiqc module then we could work directly from SpatialData, but probably more work.

from spatialvi.

grst avatar grst commented on September 28, 2024

Checkatlas is still somewhat work in progress and currently doesn't support spatial.
If we use it only to generate statistics on the gene expression data, what @giovp suggested should work fine:

adata = ad.zarr("/path/to/spatialdata.zarr/table/table")
adata.write_h5ad("adata.h5ad")

If they are going to support spatial QC, then we should push them towards supporting SpatialData directly.

from spatialvi.

cavenel avatar cavenel commented on September 28, 2024

spatialdata-io is the only one that actually handles all the edge cases robustly, across spaceranger versions and configurations

Ok so then I see two options:

  • using spatialdata-io to read the space ranger output (and get rid of our own implementation of read_visium in scanpy).
  • do all filtering, normalization, clustering, etc, on sdata.tables["adata1"]
  • save as zarr SpatialData
  • export AnnData objects without images (and use this for checkatlas and co.)

Cons: not possible to plot with sc.pl.spatial, and the exported AnnData will not have images.

Or:

  • Skip SpatialData for now, and continue with AnnData as we do.
  • Adapt out local implementation of read_visium to support SpaceRanger 2.1.0

Cons: we lose the SpatialData export, and having local read_visium code is ugly.

Am I missing something here?

from spatialvi.

grst avatar grst commented on September 28, 2024

How about adding a to_legacy_anndata() in SpatialData?
Should really be just applying all transformations to the image and adding it to sdata.table.uns["spatial"] or am I missing something? Could also supersede scverse/scanpy#2424

from spatialvi.

fasterius avatar fasterius commented on September 28, 2024

I think a good question to ask, that might resolve the doubts is: who is this pipeline for?

  • if it is for scanpy analysts, that only look at the image once and only for visualization, then I think fixing squidpy/scanpy read visium function or using your own it's totally fine.
  • if this pipeline is for production-level processing (e.g. you'll have >100 datasets, supporting multiple spaceranger versions, as well as various kind of pathology microscopes, with images acquired at different magnifications, maybe having to integrate pathology info (e.g. segmentation)), then you really want to use spatialdata

That is actually a really good point that I don't think we've thought much about. Given the nature of nf-core as a whole being able to support large-scale productions I think that would be the way we want to go. The first group of users could be simply served by also experting an AnnData object and leaving it at that.

from spatialvi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.