Comments (20)
made a PR to spatiadata_io
from spatialvi.
I don't think the
scanpy.read_visium
function handles 2.1.0, specifically not the cytassist true/false format.
Hmmm I see, right now we use a "local" version of read_visium (see https://github.com/nf-core/spatialtranscriptomics/blob/1826d0137d980dfc0d33ee448ff535a1f48755e2/bin/read_st_data.py#L17), as scanpy doesn't have scverse/scanpy#2424 released yet. But this is only for Space Ranger 2.0 and not 2.1.0? Maybe we should limit the Space Ranger version to 2.0 then.
you don't need to convert anndata to spatialdata and back. If you want anndata, you can just do
Ok I understand. I was more thinking to keep the h5ad output (maybe optional), as many people might still use h5ad anndata files for downstream analysis. But then I guess we can do something like:
adata = sdata.tables["adata1"]
adata.write_h5ad(outputFilename)
Still a bit sad to loose the low res images in the exported file.
from spatialvi.
Not a big fan of limiting the max supported spaceranger version.
Do spatialdata-io
and/or squidpy
support spaceranger 2.1.0? Then we could use this to load the spaceranger output and convert it to whaterver format is needed.
the lowres images, as well as hires and all relevant metadata are stored in spatialdata, where transformations between coordinate systems is possible. What is stored in anndata with the original scanpy/squidpy functions is a subset (and not really actionable) of the image data and metadata
It should still be possible to build an AnnData object that is compatible with sc.pl.spatial
from SpatialData without too much effort? I know that scanpy.spatial
will go away eventually, but for a transition period it would still be good to support it.
from spatialvi.
Do spatialdata-io and/or squidpy support spaceranger 2.1.0? Then we could use this to load the spaceranger output and convert it to whaterver format is needed.
spatialdata-io
is the only one that actually handles all the edge cases robustly, across spaceranger versions and configurations
It should still be possible to build an AnnData object that is compatible with sc.pl.spatial from SpatialData without too much effort?
yes, this is something I believed we discussed at some point in the past @LLehner @LucaMarconato but pretty low in priority list
from spatialvi.
I think a good question to ask, that might resolve the doubts is: who is this pipeline for?
- if it is for scanpy analysts, that only look at the image once and only for visualization, then I think fixing squidpy/scanpy read visium function or using your own it's totally fine.
- if this pipeline is for production-level processing (e.g. you'll have >100 datasets, supporting multiple spaceranger versions, as well as various kind of pathology microscopes, with images acquired at different magnifications, maybe having to integrate pathology info (e.g. segmentation)), then you really want to use spatialdata
from spatialvi.
How about adding a to_legacy_anndata() in SpatialData?
We could add this to spatialdata-io
, I updated this issue to track it scverse/spatialdata-io#47. It should be quite straightforward to implement; if someone has time to make a PR I am happy to review.
from spatialvi.
I'm not sure if TissUUmaps is compatible with that data type, but it could potentially be included as an optional output format with a parameter - @cavenel ? Reading the docs I see that the format itself is still under active development and changes are to be expected, so I'm thinking that it'd be nice if it's a bit more stable for any addition to the pipeline.
from spatialvi.
It's true that it is still under active development. The image storage of SpatialData is based on the OME-NGFF and can be considered stable. The metadata part will still evolve. See the corresponding discusson on the scverse zulip.
On the scverse side, SpatialData and Squidpy are the future of spatial analysis. scanpy.spatial
is already kind of unmaintained and will be deprecated not too far in the future.
from spatialvi.
Hi @grst!
I have been looking a bit at NGFF, SpatialData and zarr support. I see pros and cons in adding SpatialData output to the pipeline.
First, as @fasterius mentioned, it's quite new. I think it can be good to see how it evolves in the coming 6 months. Things like consolidated metadata are still being discussed (scverse/spatialdata#278).
Also, I am personally not a huge fan of zarr format for this kind of output. For big datasets, it means creating millions of files per sample, which can easily reach limits of some cluster systems, or even be a problem for file transfer. I understand the benefits for Cloud Object Storage like s3, but not sure it's the best for most users.
Let's keep this issue open for now and see how things evolve on this topic in the community, and maybe add it as an optional output format soon!
from spatialvi.
In principle, zarr supports various backends, including single-file formats such as .zip
, .n5
or .sqlite
: https://zarr.readthedocs.io/en/stable/api/storage.html
Not sure if this is supported by SpatialData/OME-NGFF, but at least technically it shouldn't be very hard to support it.
from spatialvi.
hi all, chiming in on the discussion prompted by @grst .
My understanding is that this pipeline only concerns the processing of 10x genomics Visium data. Couple of questions on this:
- Will it support only 10x genomics visium or does it plan to support other type of techs (e.g. 10x xenium, vizgen merscope etc.) ?
- Does it support all the versions of spaceranger outputs, and if yes, does it rely on squidpy/scanpy for that?
On the spatialdata support, while I understand the caution and agree that probably few months delay are useful, we (@scverse) do plan to only support spatialdata as data format for spatial transcriptomic technologies moving forward. The original anndata-spatial is in fact already limiting for the 10x visium case. For example, it does not support saving the full-res microscopy image which is the only useful piece of info to do any type of image analysis with visium (as the high-res and low-res are only useful for visualization).
As on the Zarr backend, while it's true that there shouldn't be limits in the scope (also on the ngff specs side), I do not see full support of formats other than zarr on the near future unfortunately. On this topic, we will follow what developments of the OME community moving forward.
from spatialvi.
Hi @grst! I have been looking a bit at NGFF, SpatialData and zarr support. I see pros and cons in adding SpatialData output to the pipeline. First, as @fasterius mentioned, it's quite new. I think it can be good to see how it evolves in the coming 6 months. Things like consolidated metadata are still being discussed (scverse/spatialdata#278). Also, I am personally not a huge fan of zarr format for this kind of output. For big datasets, it means creating millions of files per sample, which can easily reach limits of some cluster systems, or even be a problem for file transfer. I understand the benefits for Cloud Object Storage like s3, but not sure it's the best for most users.
Let's keep this issue open for now and see how things evolve on this topic in the community, and maybe add it as an optional output format soon!
Just wanted to chime in. To the best of my knowledge yes it can be that for a large dataset a lot of files are created, but this is also determined by the chunk size which can be adjusted.
from spatialvi.
Hi, after some thinking and discussing, here are some answers and more questions about SpatialData:
My understanding is that this pipeline only concerns the processing of 10x genomics Visium data. Couple of questions on this:
- Will it support only 10x genomics visium or does it plan to support other type of techs (e.g. 10x xenium, vizgen merscope etc.) ?
We will stick to visium for this pipeline. We might rename it eventually to make that clearer.
- Does it support all the versions of spaceranger outputs, and if yes, does it rely on squidpy/scanpy for that?
It will only support version 2.1.0, and relies on scanpy to convert to AnnData.
I am trying to understand what the best approach would be today if we wanted to export SpatialData instead of AnnData. And probably still keep the option to export AnnData at the end.
- can we convert AnnData (h5ad) to SpatialData (zarr) and/or the other way around?
- can we use scanpy to do downstream analysis (filtering, clustering, get spatially variable genes, etc.) directly on SpatialData objects?
from spatialvi.
It will only support version 2.1.0, and relies on scanpy to convert to AnnData.
I don't think the scanpy.read_visium
function handles 2.1.0, specifically not the cytassist true/false format.
can we convert AnnData (h5ad) to SpatialData (zarr) and/or the other way around?
you don't need to convert anndata to spatialdata and back. If you want anndata, you can just do
import anndata as ad
adata = ad.zarr("/path/to/spatialdata.zarr/table/table")
what adata
won't have, are the low res images in adata.uns["spatial"][<library_id>]["images"]
(or scale_factors
).
can we use scanpy to do downstream analysis (filtering, clustering, get spatially variable genes, etc.) directly on SpatialData objects?
yes
import scanpy as sc
sc.pp.normalize_total(sdata.tables["adata1"])
sc.pp.log1p(sdata.tables["adata1"])
...
from spatialvi.
Still a bit sad to loose the low res images in the exported file.
the lowres images, as well as hires and all relevant metadata are stored in spatialdata, where transformations between coordinate systems is possible. What is stored in anndata with the original scanpy/squidpy functions is a subset (and not really actionable) of the image data and metadata
from spatialvi.
I was also thinking of MultiQC output from the downstream analysis, and in particular if we want to use tools like checkatlas (#40 (comment) and becavin-lab/checkatlas#34). I think that would only work with the AnnData objects and not the SpatialData ones. If we go for a custom multiqc module then we could work directly from SpatialData, but probably more work.
from spatialvi.
Checkatlas is still somewhat work in progress and currently doesn't support spatial.
If we use it only to generate statistics on the gene expression data, what @giovp suggested should work fine:
adata = ad.zarr("/path/to/spatialdata.zarr/table/table")
adata.write_h5ad("adata.h5ad")
If they are going to support spatial QC, then we should push them towards supporting SpatialData directly.
from spatialvi.
spatialdata-io
is the only one that actually handles all the edge cases robustly, across spaceranger versions and configurations
Ok so then I see two options:
- using
spatialdata-io
to read the space ranger output (and get rid of our own implementation ofread_visium
in scanpy). - do all filtering, normalization, clustering, etc, on
sdata.tables["adata1"]
- save as zarr SpatialData
- export AnnData objects without images (and use this for checkatlas and co.)
Cons: not possible to plot with sc.pl.spatial
, and the exported AnnData will not have images.
Or:
- Skip SpatialData for now, and continue with AnnData as we do.
- Adapt out local implementation of
read_visium
to support SpaceRanger 2.1.0
Cons: we lose the SpatialData export, and having local read_visium
code is ugly.
Am I missing something here?
from spatialvi.
How about adding a to_legacy_anndata()
in SpatialData?
Should really be just applying all transformations to the image and adding it to sdata.table.uns["spatial"]
or am I missing something? Could also supersede scverse/scanpy#2424
from spatialvi.
I think a good question to ask, that might resolve the doubts is: who is this pipeline for?
- if it is for scanpy analysts, that only look at the image once and only for visualization, then I think fixing squidpy/scanpy read visium function or using your own it's totally fine.
- if this pipeline is for production-level processing (e.g. you'll have >100 datasets, supporting multiple spaceranger versions, as well as various kind of pathology microscopes, with images acquired at different magnifications, maybe having to integrate pathology info (e.g. segmentation)), then you really want to use spatialdata
That is actually a really good point that I don't think we've thought much about. Given the nature of nf-core as a whole being able to support large-scale productions I think that would be the way we want to go. The first group of users could be simply served by also experting an AnnData object and leaving it at that.
from spatialvi.
Related Issues (20)
- Create outputs readable by e.g. TissUUmaps, Loupe
- Module compilation error - cause: unable to resolve class JsonSlurper HOT 2
- Module compilation error HOT 2
- Add documentation for "Analysis options" in `usage.md` HOT 1
- Is Space Ranger Cytassist supported? HOT 2
- Implement tests using nf-test HOT 1
- Update spaceranger to 2.x HOT 8
- It should be possible to provide manual image alignments on a per-sample basis
- Add MultiQC report HOT 13
- Add checks for QC filtering HOT 2
- Auto-detection of samplesheet type HOT 1
- Full-size test dataset
- Version exporting of `leidenalg` and `SpatialDE` does not work
- Aggregate samples into single object HOT 4
- Rename the pipeline HOT 1
- Missing SpatialDE package in Docker container for ST_SPATIAL_DE process HOT 3
- Use `to_legacy_anndata` function from `spatialdata_io` instead of custom code HOT 1
- Option to use the `--unknown-slide` parameter in `spaceranger count` HOT 1
- MultiQC 1.21 doesn't support genomic_dna section HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spatialvi.