GithubHelp home page GithubHelp logo

nf-core / spatialvi Goto Github PK

View Code? Open in Web Editor NEW
41.0 155.0 13.0 7.11 MB

Pipeline for processing spatially-resolved gene counts with spatial coordinates and image data. Designed for 10x Genomics Visium transcriptomics.

Home Page: https://nf-co.re/spatialvi

License: MIT License

HTML 3.89% Python 1.30% Nextflow 87.47% Groovy 0.59% SCSS 6.74%
nf-core nextflow workflow pipeline 10x-genomics 10xgenomics bioinformatics image-processing microscopy rna-seq

spatialvi's Introduction

nf-core/spatialvi

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/spatialvi is a bioinformatics analysis pipeline for Visium spatial transcriptomics data from 10x Genomics. It can process and analyse spatial data either directly from raw data by running Space Ranger or data already processed by Space Ranger. The pipeline currently consists of the following steps:

  1. Raw data processing with Space Ranger (optional)
  2. Quality controls and filtering
  3. Normalisation
  4. Dimensionality reduction and clustering
  5. Differential gene expression testing

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

You can run the pipeline using:

nextflow run nf-core/spatialvi \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/spatialvi was originally developed by the Jackson Laboratory1, up to the 0.1.0 tag. It was further developed in a collaboration between the National Bioinformatics Infrastructure Sweden and National Genomics Infrastructure within SciLifeLab; it is currently developed and maintained by Erik Fasterius and Christophe Avenel.

Many thanks to others who have helped out along the way too, especially Gregor Sturm!

1 Supported by grants from the US National Institutes of Health U24CA224067 and U54AG075941. Original authors Dr. Sergii Domanskyi, Prof. Jeffrey Chuang and Dr. Anuj Srivastava.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #spatialvi channel (you can join with this invite).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

spatialvi's People

Contributors

apeltzer avatar cavenel avatar fasterius avatar grst avatar nf-core-bot avatar sdomanskyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spatialvi's Issues

Module compilation error

Description of the bug

A bug appears when I tested the pipeline.

Command used and terminal output

Command:

nextflow run nf-core/spatialtranscriptomics -profile test,docker -r dev

Output:

N E X T F L O W  ~  version 22.04.5
Launching `https://github.com/nf-core/spatialtranscriptomics` [amazing_wescoff] DSL2 - revision: 40e2fe8125 [dev]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/spatialtranscriptomics v1.0dev
------------------------------------------------------
Core Nextflow options
  revision                  : dev
  runName                   : amazing_wescoff
  containerEngine           : docker
  launchDir                 : /home/jmh/tmp
  workDir                   : /home/jmh/tmp/work
  projectDir                : /home/jmh/.nextflow/assets/nf-core/spatialtranscriptomics
  userName                  : jmh
  profile                   : test,docker
  configFiles               : /home/jmh/.nextflow/assets/nf-core/spatialtranscriptomics/nextflow.config

Input/output options
  input                     : https://raw.githubusercontent.com/nf-core/test-datasets/spatialtranscriptomics/testdata/test-dataset-subsampled/samplesheet.csv

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Max job request options
  max_cpus                  : 4
  max_memory                : 6.GB
  max_time                  : 1.h

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/spatialtranscriptomics for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/spatialtranscriptomics/blob/master/CITATIONS.md
------------------------------------------------------
Project directory:  /home/jmh/.nextflow/assets/nf-core/spatialtranscriptomics

Module compilation error
- file : /home/jmh/.nextflow/assets/nf-core/spatialtranscriptomics/./workflows/../subworkflows/local/../../modules/local/tasks.nf
- cause: unable to resolve class JsonSlurper
 @ line 201, column 19.
       sample_info = new JsonSlurper().parse(new File(fileName))
                     ^

Module compilation error
- file : /home/jmh/.nextflow/assets/nf-core/spatialtranscriptomics/./workflows/../subworkflows/local/../../modules/local/tasks.nf
- cause: unable to resolve class JsonSlurper
 @ line 339, column 19.
       sample_info = new JsonSlurper().parse(new File(fileName))
                     ^

2 errors


### Relevant files

_No response_

### System information

- Nextflow version: 22.04.5
- Container engine: Docker
- OS: CentOS Linux
- Version of nf-core/spatialtranscriptomics: dev

Full-size test dataset

Description of feature

Once spaceranger is fully implemented, the "full-size" test dataset should be updated to run the workflow incl. spaceranger.

Version exporting of `leidenalg` and `SpatialDE` does not work

Description of the bug

Upon checking the contents of pipeline_info/versions.yml I found that the entries for leidenalg and SpatialDE were empty, which they should not be. It seems that using <module>.__version__ does not work for these modules, for some reason. Tested with both Conda and the Docker image utilised in the pipeline. Partial fix in #50.

Command used and terminal output

$ python -c "import SpatialDE; print(SpatialDE.__version__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'SpatialDE' has no attribute '__version__'

Relevant files

No response

System information

No response

Use `to_legacy_anndata` function from `spatialdata_io` instead of custom code

Description of feature

We are currently using custom code as a work-around as a replacement for the to_legacy_anndata function from spatialdata_io, which did not exist when the work with SpatialData in this pipeline was initiated. This function is now merged in the SpatialData repo, but there is an issue with using it: scverse/spatialdata-io#134.

As soon as this (and other possible issues) is fixed we should adopt the existing function instead of the custom one we have now.

Missing SpatialDE package in Docker container for ST_SPATIAL_DE process

Description of the bug

Running the nf-core/spatialtranscriptomics pipeline on an EC2 instance with files in S3. Encountered an error in NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE process:

ModuleNotFoundError: No module named 'SpatialDE'

Command used and terminal output

nextflow run \
    nf-core/spatialtranscriptomics \
    --input s3://<bucket_name>/s3_samplesheet.csv \
    --outdir s3://<bucket_name>/test_outputs/ \
    --spaceranger_reference s3://<bucket_name>/refdata-gex-mm10-2020-A.tar.gz \
    --spaceranger_probeset s3://<bucket_name>/Visium_Mouse_Transcriptome_Probe_Set_v1.0_mm10-2020-A.csv \
    -profile docker \
    -r dev \
    --max_memory 60.GB \
    --max_cpus 12


Output:
Apr-03 19:02:53.856 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 7; name: NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE (Visium_FFPE_Mouse_Brain); status: COMPLETED; exit: 1; error: -; workDir: /home/ec2-user/work/98/d4d2aae315f4018c786ee85f63d92a]
Apr-03 19:02:53.860 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE (Visium_FFPE_Mouse_Brain); work-dir=/home/ec2-user/work/98/d4d2aae315f4018c786ee85f63d92a
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE (Visium_FFPE_Mouse_Brain)` terminated with an error exit status (1)
Apr-03 19:02:53.881 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE (Visium_FFPE_Mouse_Brain)'

Caused by:
  Process `NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE (Visium_FFPE_Mouse_Brain)` terminated with an error exit status (1)

Command executed:

  quarto render st_spatial_de.qmd         -P input_adata_processed:st_adata_processed.h5ad         -P n_top_spatial_degs:14         -P output_spatial_degs:st_spatial_de.csv
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SPATIALTRANSCRIPTOMICS:ST:ST_DOWNSTREAM:ST_SPATIAL_DE":
      quarto: $(quarto -v)
      leidenalg: $(python -c "import leidenalg; print(leidenalg.version)")
      scanpy: $(python -c "import scanpy; print(scanpy.__version__)")
      SpatialDE: $(python -c "from importlib.metadata import version; print(version('SpatialDE'))")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  
  Starting python3 kernel.../opt/conda/lib/python3.10/site-packages/IPython/paths.py:69: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
    warn("IPython parent '{0}' is not a writable location,"
  /opt/conda/lib/python3.10/site-packages/IPython/paths.py:69: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
    warn("IPython parent '{0}' is not a writable location,"
  Done
  
  Executing 'st_spatial_de.ipynb'
    Cell 1/9...Done
    Cell 2/9...Done
    Cell 3/9...
  
  An error occurred while executing the following cell:
  ------------------
  import scanpy as sc
  import pandas as pd
  import SpatialDE
  from matplotlib import pyplot as plt
  ------------------
  
  
  ---------------------------------------------------------------------------
  ModuleNotFoundError                       Traceback (most recent call last)
  Cell In[3], line 3
        1 import scanpy as sc
        2 import pandas as pd
  ----> 3 import SpatialDE
        4 from matplotlib import pyplot as plt
  
  ModuleNotFoundError: No module named 'SpatialDE'

Work dir:
  /home/ec2-user/work/98/d4d2aae315f4018c786ee85f63d92a

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

No response

System information

AWS EC2 instance type: m7i-flex.4xlarge
AMI (OS): Amazon Linux 2023
Executor: local on EC2 instance
nf-core/spatialtranscriptomics version: dev

Remove redundant code

There are several places (or whole scripts in some cases) that are not actually used, and they should be removed. A non-exhaustive list:

  • Workflow ST_PROPER
  • Workflow INPUT_CHECK

Add MultiQC report

Description of feature

Having individual QC reports for each sample is nice, but it would be cool to have one aggregated report that gives an overview of all samples to quickly identify problematic ones.

MultiQC is an obvious choice here, but depending on how much customization is necessary, a custom notebook would also be an option.

Auto-detection of samplesheet type

Description of feature

The samplesheet is currently specified using --input as per standard nf-core practice, while defining whether that samplesheet describes data pre- or post-Space Ranger processing is done via the --run_spaceranger parameter. Another alternative is some kind of auto-detection of the samplesheet type, which could be done by checking the samplesheet header. Is this something that would be useful and help end-users, or would it just add hassle?

Add SpatialData as output

Description of feature

Scverse very recently released their new data structure for spatial data that builds upon the OME-NGFF standard.

Main advantages are that it is more general, can better cope with large images and, as it is effectively an OME-NGFF zarr file, is interoperable with a lot of tools.

I suggest to use this as output instead of (or in addition to) the squidpy anndata (squidpy can work with spatialdata objects directly)

Add Docker containers

The workflow currently uses Singularity definition files, which should be converted to Docker and hosted properly on DockerHub. The Singularity images do not list specific versions for the vast majority of packages, which needs to be added. Already existing containers should be used for processes where this is applicable.

Update spaceranger to 2.x

Description of feature

One feature that's crucial for us is that spaceranger supports CytAssist. This was added in v2.0.

As a side, it would be great if the corresponding Dockerfiles were stored in this repo (or nf-core/modules as appropriate).

Add reporting of pipeline results

There currently is no proper reporting or collection of pipeline results, which should be added. R Markdown, Jupyter Notebook, MultiQC modules or just some custom HTML are all alternatives, which should be discussed.

Module compilation error - cause: unable to resolve class JsonSlurper

Description of the bug

Hello,

I am getting this Module compilation error, when I tried to run the test profile. I have groovy 4.0.5 installed. (N E X T F L O W ~ version 22.04.5). Any advice will be much appreciated.

Command used and terminal output

nextflow spatialtranscriptomics/main.nf -profile test,singularity --input /PATH/TO/samplesheet.csv

Module compilation error
- file : /Users/xxxx/Documents/SpatialTranscriptomics/spatialtranscriptomics/./workflows/../subworkflows/local/../../modules/local/tasks.nf
- cause: unable to resolve class JsonSlurper
 @ line 201, column 19.
       sample_info = new JsonSlurper().parse(new File(fileName))
                     ^

Module compilation error
- file : /Users/xxxxx/Documents/SpatialTranscriptomics/spatialtranscriptomics/./workflows/../subworkflows/local/../../modules/local/tasks.nf
- cause: unable to resolve class JsonSlurper
 @ line 340, column 19.
       sample_info = new JsonSlurper().parse(new File(fileName))

Relevant files

l

System information

N E X T F L O W ~ version 22.04.5

tried in both HPC and MacBook Pro (M1)

local

singularity

CentOS or macOS

dev version

Harmonisation between Python / R

We would like to stick to Python for analysis steps, as long as there are R-specific tools that have no equivalent in Python (or if something is clearly better suited for R). There are some steps in the current workflow that might be moved from an implementation in R to Python, such as normalisation. Which other steps could be moved to Python, and how should they be implemented?

  • Normalisation: can be done in Scanpy
  • Deconvolution: can be done in Scanpy

Add proper workflow/process inputs and outputs

Currently, scripts are run from processes within (sub)workflows, but there is no proper tracking of inputs or outputs - this is done via dummy files. The scripts themselves use various different hard-coded paths which do work with each other, and this should be changed. Each workflow and process needs properly defined inputs, outputs and connections between each other, none of which currently exist.

Add checks for QC filtering

Description of feature

Currently, the filtering of e.g. minimum counts or genes in the quality control report has no related checks to see if there are any spots lefter the filtering. Filtering itself will work just fine, but will produce non-informative errors downstream, which should be improved. The check should output an informative message about (1) what filtering step raised the problem, (2) what the relevant parameter was set to (either default or user-specified) and (3) that the user should re-run with a lowered threshold.

I'm not sure whether this type of check should stop the pipeline or not. Such errors currently do yield errors that stop the pipeline, but we could also re-work the QC report in some fashion to still be able to output something - this could possibly be a useful functionality to have in regards to using e.g. checkatlas or multiqc for aggregating multiple reports as discussed in #40.

It should be possible to provide manual image alignments on a per-sample basis

Description of the bug

Currently, I can only specify a single "manual alignment" file for the entire workflow. However, these JSON files are inherently associated to a sample. It should be possible to provide them on a per-sample basis via the samplesheet.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Implement tests using nf-test

Description of feature

Would be great to have the entire workflow tested, not only that it runs through, but also checking the outputs for plausibility.

Currently the spaceranger part is not tested at all - I guess partly the challenge is to generate appropriated subsampled test data.

Rename the pipeline

Description of feature

Given the addition of other spatial-based nf-core pipelines it was decided that we should rename this pipeline to something more descriptive, since it will only be dealing with Visium data. While "spatial transcriptomics" is still used interchangeably with "Visium" it would be clearer if the pipeline name reflected the fact that it's only Visium data.

Some proposals from Slack so far:

  • visiumflow
  • nextvisium
  • visiumcore
  • visiumpipe
  • spatialseq
  • spatialflow
  • visium

Add documentation for all workflow parts

There is some documentation for the pipeline as a whole, but this should be extended with more details and explanations. The parameter documentation also need to be extended.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.