vitessce / vitessce-data Goto Github PK

View Code? Open in Web Editor NEW

6.0 4.0 4.0 4.04 MB

Utils for loading HuBMAP data formats

License: MIT License

Python 84.59% Shell 15.41%

gehlenborglab kharchenkolab single-cell imaging omics spatial hubmap vitessce hidivelab

vitessce-data's Introduction

Visual Integration Tool for Exploration of Spatial Single-Cell Experiments

Screenshot of Vitessce with Linnarsson data

Why Vitessce

Interactive

Vitessce consists of reusable interactive views including a scatterplot, spatial+imaging plot, genome browser tracks, statistical plots, and control views, built on web technologies such as WebGL.

Integrative

Vitessce enables visual analysis of multi-modal assay types which probe biological systems through techniques such as microscopy, genomics, and transcriptomics.

Serverless

Visualize large datasets stored in static cloud object stores such as AWS S3. No need to manage or pay for expensive compute infrastructure for visualization purposes.

Usage

Vitessce can be used in React projects by installing the package from NPM:

npm install vitessce

For more details, please visit the documentation.

Development

First install PNPM v8. We develop and test against NodeJS v18.6.0 and NPM 8.13.2.

Note NodeJS may require the max_old_space_size value to be increased.

. ./scripts/set-node-options.sh

Checkout the project, cd, and then:

pnpm install
pnpm run build
pnpm run start-demo

The development server will refresh the browser as you edit the code.

Further details for internal developers can be found within dev-docs.

VSCode Note We are currently using a nightly version of TypeScript which supports @import statements in JSDoc comments. To use VSCode features like jump-to-implementation with this syntax, install the TypeScript Nightly extension.

Changesets

We use changesets to manage the changelog. Therefore, when making code changes, do not edit CHANGELOG.md directly. Instead, run pnpm changeset, follow the prompts, and commit the resulting markdown files along with the code changes.

Branches

Please use one of the following naming conventions for new branches:

{github-username}/{feature-name}
{github-username}/fix-{issue-num}

Pull requests

We use squash merging for pull requests.

Monorepo organization

Meta-updater script

pnpm run meta-dryrun
pnpm run meta-update

Testing

For the end-to-end tests, they depend on

pnpm run build-demo

To run all the tests, both unit and e2e: ./scripts/test.sh
To run just the unit tests: pnpm run test

Linting

pnpm run lint

To allow the linter to perform automated fixes during linting: pnpm run lint-fix

Troubleshooting

The following commands can be helpful in case the local environment gets into a broken state:

pnpm install
pnpm run clean: removes build/bundle directories and all tsconfig.tsbuildinfo files (used by TypeScript's Project References).
- pnpm run build: need to re-build subpackages after this type of cleanup.
pnpm run clean-deps: removes all node_modules directories, including those nested inside subpackages.
- pnpm install: need to re-install dependencies after this type of cleanup.

Deployment

Before running any of the deployment scripts, confirm that you have installed the AWS CLI and are in the appropriate AWS account:

$ aws iam list-account-aliases --query 'AccountAliases[0]'
"gehlenborglab"

Staging

To build the current branch and push the "minimal" demo and docs sites to S3, run this script:

./scripts/push-demos.sh

This will build the demo and docs, push both to S3, and finally open the docs deployment in your browser.

Publish staged development site

After doing a manual test of the deployment of the dev site, if it looks good, copy it to dev.vitessce.io:

./scripts/copy-dev.sh https://{url returned by scripts/deploy-release.sh or scripts/push-demos.sh}

Note: if you need to obtain this URL later:

Copy dev to https://s3.amazonaws.com/vitessce-data/demos/$DATE/$HASH/index.html

Publish staged docs to vitessce.io

After doing a manual test of the deployment of the docs, if it looks good, copy it to vitessce.io:

./scripts/copy-docs.sh https://{url returned by scripts/deploy-release.sh or scripts/push-demos.sh}

Note: if you need to obtain this URL later:

Copy docs to https://s3.amazonaws.com/vitessce-data/docs-root/$DATE/$HASH/index.html

Release

Releasing refers to publishing all sub-packages to NPM and creating a corresponding GitHub release.

Note: releasing does not currently result in automatic deployment of the documentation or development sites (see the Deployment section above).

From GitHub Actions

When there are changesets on the main branch, the changesets/action bot will run ./scripts/changeset-version.sh --action and make a pull request titled "Create release".

This pull request remains open until ready to make a release. The bot will update the pull request as new changesets are added to main.

Once this "Create release" pull request is merged, the next time release.yml is executed on GitHub Actions, the following will occur:

changesets/action will run ./scripts/changeset-publish.sh --action, which:
- publishes to NPM
- creates a new git tag for the release
softprops/action-gh-release will generate a GitHub release based on the git tag, using the latest changelog entries for the release notes.

From local machine

pnpm run build
pnpm run bundle
pnpm run build-json-schema

./scripts/changeset-version.sh
./scripts/changeset-publish.sh # runs pnpm publish internally

Version bumps

In this project we try to follow semantic versioning. The following are examples of things that would require a major, minor, or patch type of bump.

Patch version bumps

Bug fixes, minor feature improvements, additional view types, additional coordination types, and additional file type implementations are possible in a patch version bump.

When a coordination type is added, it must be reflected by a new view config JSON schema with an incremented version property, and a new view config upgrade function to enable previous view config versions to remain compatible. The default schema version parameter of the VitessceConfig constructor may also change to reflect the new schema version.

Minor version bumps

An exported helper function or React component for plugin views had a change in props or function signature. Major feature improvements or additions.

Major version bumps

The exported constant values changed, such as view types and coordination types, such that previous code using these values may no longer run successfully. React props of the main <Vitessce /> component changed. Major behavior changes or interface updates. Changes to the directory structure or filenames in the dist/ directory that could result in broken import statements.

Related repositories

Viv: A library for multiscale visualization of high-resolution multiplexed tissue data on the web.
HiGlass: A library for multiscale visualization of genomic data on the web.
vitessce-python: Python API and Jupyter widget.
vitessce-r: R API and R htmlwidget.
vitessce-data: Scripts to generate sample data

Old presentations

Citation

To cite Vitessce in your work, please use:

@article{keller2021vitessce,
  title = {{Vitessce: a framework for integrative visualization of multi-modal and spatially-resolved single-cell data}},
  author = {Keller, Mark S. and Gold, Ilan and McCallum, Chuck and Manz, Trevor and Kharchenko, Peter V. and Gehlenborg, Nils},
  journal = {OSF Preprints},
  year = {2021},
  month = oct,
  doi = {10.31219/osf.io/y8thv}
}

If you use the image rendering functionality, please additionally cite Viv:

@article{manz2022viv,
  title = {{Viv: multiscale visualization of high-resolution multiplexed bioimaging data on the web}},
  author = {Manz, Trevor and Gold, Ilan and Patterson, Nathan Heath and McCallum, Chuck and Keller, Mark S. and Herr, II, Bruce W. and Börner, Kay and Spraggins, Jeffrey M. and Gehlenborg, Nils},
  journal = {Nature Methods},
  year = {2022},
  month = may,
  doi = {10.1038/s41592-022-01482-7}
}

vitessce-data's People

Contributors

Stargazers

Watchers

Forkers

ilan-gold evanbiederstedt sailfish009 ome

vitessce-data's Issues

Run black on ./python dir

I don't believe we have a code formatter specified for the python scripts like we do for most javascript stuff we work on. It would be my preference to run black once on ./python and require it for future changes:

black --line-length 79 ./python # to comply with flake8

Perhaps add to our test script?:

# ./test.sh
start black
black --line-length 79 --check ./python
end black

Currently adding this would yield:

vitessce-data ❯ ./test.sh
travis_fold:start:black
black
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/create_hdf5_fixtures.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/cao_tsv_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/counts_hdf5_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/cytokit_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/cluster.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/dries_json_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/wang_csv_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/loom_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/imzml_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/cell_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/img_hdf5_reader.py
would reformat /Users/trevormanz/GitHub/hubmap/vitessce-data/python/delaunay.py
Oh no! 💥 💔 💥
12 files would be reformatted, 1 file would be left unchanged.

when running ./test.sh

Add OMEXML "equivalent" for zarr based images

Description

Metadata is not standardized in zarr, but with hms-dbmi/viv#178 and hms-dbmi/viv#158 we need to make these data available for zarr-based images.

Proposed Solution

Parse OMEXML for each image we process and add flattened to root .zattrs under omeMetadata` key.

Translate into CWL

To prepare for running HuBMAP pre-processing on Airflow, translate these into CWL.

Missing .zgroups in Spraggins preview?

Loading https://portal.stage.hubmapconsortium.org/preview/multimodal-molecular-imaging-data, I get a number of 404s for .zgroup files. The UI functionality seems fine, but is there a way we can get rid of the 404s? (Either put stub files at those URLS, or understand why they are being requested, and change what we're doing on the client side?) cc @ilan-gold .

https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.ims.zarr/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/0/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/1/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/2/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/3/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/4/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/5/.zgroup
https://vitessce-data.storage.googleapis.com/0.0.27/master_release/spraggins/spraggins.mxif.zarr/6/.zgroup

Load Mermaid sample data

Impute cell data for mermaid

Instead of just finding the centroid, we could generate a hull or octagon around the dots... but we should see if we can get the image segmentation first.
For cell factor data, we could count the number of molecules in each cell.

Check with Nils about priorities before investing time here.

Re-add requirements.txt for pip adherents

Thanks for the excellent work!

I'm someone who prefer to use pip, as I find it quicker when using virtualenv's. I see the appeal of conda, but I've never liked using it very much for a variety of reasons.

Would it be possible to re-add the requirements.txt file and add installation instructions in the README? You could give users the option of using conda or pip.

Naturally, if this is acceptable, I'm happy to make a PR myself.

Thanks, Evan

Use bioformats2raw to convert zarr images to ome-zarr

Now that we can support the outputs of bioformats2raw, we should use that tool to generate our zarr-based image pyramids. However, bioformats does not create the omero metadata which we'd like to use to specify how to render our images. They've put a lot of thought into what this information should look like, and I think we should replicate for our purposes. We can include this both in the source itself for zarr as well as in the image metadata in raster.json.

https://github.com/ome/omero-ms-zarr/blob/master/spec.md

"id": 1,                              # ID in OMERO
"name": "example.tif",                # Name as shown in the UI
"channels": [                         # Array matching the c dimension size
    {
        "active": true,
        "coefficient": 1,
        "color": "0000FF",
        "family": "linear",
        "inverted": false,
        "label": "LaminB1",
        "window": {
            "end": 1500,
            "max": 65535,
            "min": 0,
            "start": 0
        }
    }
],
"rdefs": {
    "defaultT": 0,                    # First timepoint to show the user
    "defaultZ": 118,                  # First Z section to show the user
    "model": "color"                  # "color" or "greyscale"
}

Simply missing files might not be flagged in diff?

Trevor says that if a file is simply missing in the output fixture, if might not be flagged in the diff.

citation for cytokit data?

Is there a citation for the cytokit data that we should include in the README here?

Visual bug in zarr image

I noticed this while working on the CODEX stuff. Not sure what the move is here. One option would be to fix the zarr for the release. Another would be to use TIFF. A final option would be to leave it as-is. This is probably Nils' call. It only appears at one zoom level.

I checked using a colormap to make sure it wasn't some weird deckgl wrapping thing, and it looks like that is the actual image data. Every time we have run into this, it has been an unintended result of padding, so on that past experience, I'm putting this here as opposed to Viv.

CC: @manzt

Add a demo dataset with thousands of genes

If we have a processed dataset containing >5000 genes, or at least an amount comparable to what we expect to see from HuBMAP scRNA-seq experiments, it will allow us to make a demo for vitessce.io to easily test the scalability of our gene-related components in vitessce before it gets to the portal-ui repo.

Reduce size of Linnarsson fake-data

From Chuck:

Consider making it smaller: start with a smaller test input.... if there is a difference in the output, you want the diff to be readable.

Will require updating the .loom file for Linnarsson and all of the fake-data JSON expected output files

Refactor for Packaging

Overview

Right now we have code all over the place for creating Vitessce data/configs:

https://github.com/hubmapconsortium/portal-containers
https://github.com/hubmapconsortium/vitessce-data
https://github.com/hubmapconsortium/portal-ui/blob/master/context/app/api/vitessce.py

This is problematic as it makes launching new Vitessce configs difficult and hard to communicate to people not familiar with out code. This problem is only going to expand, and as we gain users (probably other data portals), it would be good to have not only schemas for validating the data, but a way of reliably generating the data.

The overarching goal here is to take in a Pandas dataframe and output compliant Arrow (in the future), Zarr, OME-TIFF, and JSON data for Vitessce. A secondary goal could be to also create Vitessce configurations based on what data has been generated - basically pre-defined view configurations based on certain standard inputs (i.e a genes/clusters + raster + cells/cell-sets without scatterplot gives what we have for CODEX, and with scatterplot gives Linnarsson minus one of the scatterplots).

I'll organize this issue by data type.

Genes/Clusters (Heatmap)

Our genes and clusters schema convey very similar information, i.e data per observation and a max for rendering. We should think about merging these, if possible, since if we can show one, we can show the other:

https://github.com/hubmapconsortium/portal-containers/blob/fb1910324fc796ff4b7d4e643de27ff2861e7d8c/containers/sprm-to-json/context/main.py#L125-L160

https://github.com/hubmapconsortium/vitessce-data/blob/master/python/cluster.py

https://github.com/hubmapconsortium/vitessce-data/blob/master/snakemake/satija/src/convert_h5ad_to_zarr.py

This might require an arrow loader if it's too hard to parse out data properly using only one schema in the client across the two use cases, since they are used differently.

In any case, I think a function that takes in a Pandas DataFrame containing a Cell x Gene matrix and outputs JSON/Arrow should be the goal here. The index of such a DataFrame would be cell names and the column names genes. This will help with Cells/Cell-Sets.

df_genes
            Actin       CD107a        CD11c       CD20          CD21  CD31         CD3e          CD4         CD45        CD45RO         CD68           CD8       DAPI_2         E_CAD   Histone_H3          Ki67  Pan_CK    Podoplanin
Unnamed: 0                                                                                                                                                                                                                            
1             0.0  3825.083089  2172.038856   0.000000  13118.704545   0.0  2619.149560  2258.743646  3018.150782  13766.025415  2475.430352  17811.810362  2472.491447  13831.021750  2155.434995  12023.281769     0.0  12854.526882
2             0.0  3158.566135  1905.015101   6.866331   9662.850531   0.0  2279.843261  2059.656600  2866.507131   9865.706096  2220.703160  10513.558166  1972.618289  10445.596337  1802.067673   8310.784396     0.0   9166.099972
3             0.0  2112.107533  1464.033661   0.935408   8152.397926   0.0  1778.593705  1477.261827  2401.413574   7463.324054  1703.527838   6728.968341  2594.646470   8001.948144  1467.260735   6173.303675     0.0   7050.821325
4             0.0  2409.139601  1568.258547  30.035613  12435.782407   0.0  1835.470442  1643.249288  2789.540598   7843.279558  1962.359687   7357.050570  2328.332977  11190.447293  1503.501068   6625.033120     0.0   8061.569801
5             0.0  1789.038279  1165.606538  23.199695   6595.104505   0.0  1401.826389  1163.010501  1994.819783   5216.277778  1378.526423   4899.289804  1745.914973   6385.073679  1220.704268   4540.830454     0.0   4463.399051
...           ...          ...          ...        ...           ...   ...          ...          ...          ...           ...          ...           ...          ...           ...          ...           ...     ...           ...
2653          0.0  1528.167373  1040.252119  71.731638   9857.117232   0.0  1133.142655  1081.707627  2482.951977   5863.394068  1245.564972   6276.619350  2695.375000   7168.248588  1072.548729   5214.332627     0.0   5677.270480
2654          0.0   866.767553   579.135481   7.370484   3924.449898   0.0   698.100375   555.293286  1207.978357   2482.735515   713.964724   1805.677062  1886.900818   2124.561350   615.980061   1431.171097     0.0   1684.441207
2655          0.0  1534.898357   949.947653   1.008920   6614.136854   0.0  1718.979343  1471.665023  1850.167840   6816.869014  1180.052113   4810.176761  1911.350939   5107.615493   918.007746   4728.398592     0.0   5064.655399
2656          0.0  1643.330193  1080.667150  23.054348   6832.027778   0.0  1456.217874  1124.606763  2271.074879   5281.138406  1362.480193   5671.768116  1566.910870   5627.569565   986.648792   4990.973913     0.0   5253.209420
2657          0.0  2407.073093  2120.567444   2.307910  12124.994703   0.0  4122.323093  3009.756356  3979.926907  14120.478814  2581.693856  12566.961511  2934.979520  11720.578390  1956.343220  11260.825212     0.0  12085.653249

[2657 rows x 18 columns]
>>> generate_cell_by_gene(df_genes)

Cell-Sets/Cells

@keller-mark knows best (feel free to comment/edit this issue!) but this is a little bit more complicated since the two are intertwined, but not necessary/sufficient in both directions (like the above); that is, one could have "Cells" without "Cell-sets" but not really "Cell-Sets" without "Cells."

Like the above we want a function that takes in a Pandas DataFrame and outputs JSON/Arrow but the structure for the DataFrame is a little bit hairier (not just a labeled Cell x Gene matrix where the labels are basically unchecked). I foresee us needing to either strongly define an API or rely on a properly named DataFrame (i.e each column has a specific name like poly or xy). I think we should probably go the route of an API so we have something like:

>>> df
                                                        Shape  Actin       CD107a        CD11c       CD20          CD21  CD31         CD3e  ...          Ki67  Pan_CK    Podoplanin  Mean  Covariance  Total  Mean All  Shape Vectors
id                                                                                                                                  ...                                                                                      
1           [[0.0, 100.5], [1.0232, 100.5232], [1.7536, 10...    0.0  3825.083089  2172.038856   0.000000  13118.704545   0.0  2619.149560  ...  12023.281769     0.0  12854.526882     4           6      6         2              3
2           [[0.0, 130.5], [1.0798, 130.5798], [1.8667, 13...    0.0  3158.566135  1905.015101   6.866331   9662.850531   0.0  2279.843261  ...   8310.784396     0.0   9166.099972     2           2      3         3              3
3           [[0.0, 647.5], [0.6596, 646.8404], [1.4515, 64...    0.0  2112.107533  1464.033661   0.935408   8152.397926   0.0  1778.593705  ...   6173.303675     0.0   7050.821325     6           2      6         4              1
4           [[0.4782, 736.0218], [0.4782, 736.0218], [0.95...    0.0  2409.139601  1568.258547  30.035613  12435.782407   0.0  1835.470442  ...   6625.033120     0.0   8061.569801     6           2      1         4              2
5           [[0.9636, 890.5], [0.9636, 890.5], [1.6556, 89...    0.0  1789.038279  1165.606538  23.199695   6595.104505   0.0  1401.826389  ...   4540.830454     0.0   4463.399051     3           2      1         1              1
...                                                       ...    ...          ...          ...        ...           ...   ...          ...  ...           ...     ...           ...   ...         ...    ...       ...            ...
2653        [[1005.0357, 298.5], [1005.5179, 298.5], [1005...    0.0  1528.167373  1040.252119  71.731638   9857.117232   0.0  1133.142655  ...   5214.332627     0.0   5677.270480     6           1      2         4              2
2654        [[1006.0, 531.5], [1004.9692, 531.4692], [1004...    0.0   866.767553   579.135481   7.370484   3924.449898   0.0   698.100375  ...   1431.171097     0.0   1684.441207     1           1      2         6              3
2655        [[1005.193, 599.5], [1005.193, 599.5], [1004.5...    0.0  1534.898357   949.947653   1.008920   6614.136854   0.0  1718.979343  ...   4728.398592     0.0   5064.655399     3           2      1         1              3
2656        [[1005.233, 754.5], [1005.233, 754.5], [1004.4...    0.0  1643.330193  1080.667150  23.054348   6832.027778   0.0  1456.217874  ...   4990.973913     0.0   5253.209420     3           2      1         1              3
2657        [[1006.0, 389.5], [1005.4694, 390.0306], [1004...    0.0  2407.073093  2120.567444   2.307910  12124.994703   0.0  4122.323093  ...  11260.825212     0.0  12085.653249     2           4      1         2              3

[2657 rows x 24 columns]

generate_cells(df, poly="Shape", genes=["CD11c", "CD20", ...], factors=["Mean", "Mean All", ...]....)

where each string argument is a column in the dataframe df to be put into the json portion corresponding roughly to the arg key. The index of this dataframe will be cell ids, just like the above.

I think Cell_sets is going to be a little harder. Maybe you could add something about this @keller-mark here in terms of what input data could look like.

Raster

This one is tricky as well. We should probably support both tiff and zarr via a flag. We'll need to set up the docker container for bioformats2raw/raw2ometiff as a dependency (which I think can be done via the setup.py file). Beyond that, the other major paint point will be input data. Are we expecting numpy arrays? dask arrays? zarr stores? File paths? Perhaps all 4 can be possible?

generate_raster(ome_tiff="/path/to/my_file.ome.tif", output_tiff=True)
# or
generate_raster(np_array=my_image, output_zarr=True)

@manzt can probably comment on this as well. I Imagine most people will input OME-TIFF to bioformats2raw but I think we can also handle other inputs and use our custom pyramid generator or something python-specific (in contrast to bioformats2raw) that Glencoe writes.

Molecules

I think this will be relatively straightforward like the genes data - I think an input data frame with the index being molecule names plugged into an API is what we will use:

>>> df
             x_um         y_um
gene                          
Gad2  1278.683956  6020.642260
Gad2  1326.970330  6023.884788
Gad2  1292.026844  6059.337093
Gad2  1300.886241  6097.786264
Gad2  1232.410068  6102.884182
...           ...          ...
Mup5  3161.427603  5192.594981
Mup5  3099.698528  5221.596008
Mup5  3084.582240  5297.234605
Mup5  3054.192051  5342.142346
Mup5  3058.963217  5348.150185

[3841412 rows x 2 columns]

>>> generate_molecules(df, x="x_um", y="y_um")

Raster data model proposal

Here are some preliminary thoughts on defining a more complete raster.json. From vitessce/GLOSSARY.md:

A raster source will cause SourcePublisher to broadcast raster.add events which are subscribed to by Spatial components, and linked to particular layers.

The raster.add event also will subscribe a Channel component (for selecting colors, sliders, selections), which uses an id to publish changes in these events to particular layers in the Spatial component.

Therefore, if multiple channel selection components are desired (i.e. viewing both ims and mxif data) these images should be described in separate raster.json schemas. However, if a single channel/color selector is desired, then we want to be flexible and allow for raster.json to describe multiple "images" which will not have different controllers. The following are several examples of a schema I envision:

Example 1 -- Use case: single image url, single color/selector components

{
  "id": "ims",
  "images": [
    {
      "url": "https://vitessce-data.storage.googleapis.com/0.0.24/master_release/spraggins/spraggins.ims.zarr",
      "type": "zarr",
      "name": "spraggins ims",
      "dimensions": [
        {
          "name": "mz",
          "type": "ordinal",
          "values": ["675.5366", "703.5722", "721.4766"]
        },
        { "name": "y", "type": "quantitative" },
        { "name": "x", "type": "quantitative" }
      ],
      "transform": {
        "scale": 20.0,
        "translate": [601, 951]
      }
    }
  ]
}

Example 2 -- Use case: multiple image urls, single color/selector component + component to trigger `raster.add` event

{
  "id": "ims-mxif",
  "images": [
    {
      "id": "ims",
      "url": "https://vitessce-data.storage.googleapis.com/0.0.24/master_release/spraggins/spraggins.ims.zarr",
      "type": "zarr",
      "name": "spraggins ims",
      "dimensions": [
        {
          "name": "mz",
          "type": "ordinal",
          "values": ["675.5366", "703.5722", "721.4766"]
        },
        { "name": "y", "type": "quantitative" },
        { "name": "x", "type": "quantitative" }
      ],
      "transform": {
        "scale": 20.0,
        "translate": [601, 951]
      }
    },
    {
      "url": "https://vitessce-data.storage.googleapis.com/0.0.24/master_release/spraggins/mxif_pyramid",
      "type": "tiff",
      "name": "spraggins mxif",
      "dimensions": [
        {
          "name": "channel",
          "type": "nominal",
          "values": [
            "Cy3 - Synaptopodin (glomerular)",
            "Cy5 - THP (thick limb)",
            "DAPI - Hoescht (nuclei)",
            "FITC - Laminin (basement membrane)"
          ]
        },
        { "name": "y", "type": "quantitative" },
        { "name": "x", "type": "quantitative" }
      ],
      "isPyramid": true
    }
  ]
}

Example 3-- Use case: multiple image urls, single color/selector component, shared attributes in source

Here rather than repeating shared information, we can set dimensions and type globally for the "images".

{
  "id": "codex",
  "dimensions": [
    { "name": "channel", "type": "nominal", "values": [...] },
    { "name": "z", "type": "ordinal", "values": [...] },
    { "name": "y", "type": "quantitative" },
    { "name": "x", "type": "quantitative" }
  ],
  "type": "tiff",
  "images": [
      {"name": "x0 y0", "url": "..."},
      {"name": "x0 y1", "url": "..."},
      {"name": "x0 y2", "url": "..."},
      {"name": "x0 y3", "url": "..."},
      {"name": "x0 y4", "url": "..."}
  ]
}

Example 4 -- Use case: multiple image urls, multiple color/selector components, no selector for to trigger `raster.add` event

Both schemas are valid, the former defines the "image"-specific information within the images key, but since it's just one image in this source, we can also define it using a flattened structure as in mxif.raster.json.

ims.raster.json

{
  "id": "ims",
  "images": [
    {
      "url": "https://vitessce-data.storage.googleapis.com/0.0.24/master_release/spraggins/spraggins.ims.zarr",
      "type": "zarr",
      "name": "spraggins ims",
      "dimensions": [
        {
          "name": "mz",
          "type": "ordinal",
          "values": ["675.5366", "703.5722", "721.4766"]
        },
        { "name": "y", "type": "quantitative" },
        { "name": "x", "type": "quantitative" }
      ],
      "transform": {
        "scale": 20.0,
        "translate": [601, 951]
      }
    }
  ]
}

mxif.raster.json

{
  "id": "mxif",
  "url": "https://vitessce-data.storage.googleapis.com/0.0.24/master_release/spraggins/mxif_pyramid",
  "type": "tiff",
  "name": "spraggins mxif",
  "dimensions": [
    {
      "name": "channel",
      "type": "nominal",
      "values": [
        "Cy3 - Synaptopodin (glomerular)",
        "Cy5 - THP (thick limb)",
        "DAPI - Hoescht (nuclei)",
        "FITC - Laminin (basement membrane)"
      ]
    },
    { "name": "y", "type": "quantitative" },
    { "name": "x", "type": "quantitative" }
  ],
  "isPyramid": true
}

IMS dataset should be [mz, y, x] not [mz, x, y] in ordering

Due to the row-major default in numpy (and zarr.js), storing the zarr in [mz, y, x] will be most effective for giving a particular m/z channel to vitessce image viewer.

Should be a very easy fix.

Instead of checking in bash, python should skip if file exists

with cell_reader.py producing so many outputs now, we need to be smarter about only regenerating what is needed. Instead of having the file existence check in bash, it should be in python, and there should be a separate one for each output file.

Colors for the raster data

I think certain channels have certain colors traditionally associated with them so setting this sort of information would be nice. Like, DAPI is usually blue, from what I understand.

pip install doesn't work on python 3.8

I made a fresh environment and was curious whether I'd have any problems with 3.8. I got this error:

$ pip install -r requirements.txt
Collecting loompy==2.0.16
  Downloading loompy-2.0.16.tar.gz (31 kB)
Collecting pypng==0.0.19
  Downloading pypng-0.0.19.tar.gz (293 kB)
     |████████████████████████████████| 293 kB 5.3 MB/s
Collecting scikit-learn==0.20.3
  Downloading scikit-learn-0.20.3.tar.gz (11.8 MB)
     |████████████████████████████████| 11.8 MB 6.9 MB/s
Collecting iiif==1.0.6
  Downloading iiif-1.0.6.tar.gz (546 kB)
     |████████████████████████████████| 546 kB 21.4 MB/s
Collecting pandas==0.24.1
  Downloading pandas-0.24.1.tar.gz (11.8 MB)
     |████████████████████████████████| 11.8 MB 11.7 MB/s
    ERROR: Command errored out with exit status 1:
     command: /opt/anaconda3/envs/v-data/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/setup.py'"'"'; __file__='"'"'/private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/pip-egg-info
         cwd: /private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/
    Complete output (18 lines):
    Traceback (most recent call last):
      File "/opt/anaconda3/envs/v-data/lib/python3.8/site-packages/pkg_resources/__init__.py", line 360, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/private/var/folders/2f/yvyq4r852yxg3xf902p52w5h0000gn/T/pip-install-bjzh85dh/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/opt/anaconda3/envs/v-data/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1145, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/opt/anaconda3/envs/v-data/lib/python3.8/site-packages/pkg_resources/__init__.py", line 362, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Each dataset should get own script

The processing for the Dries-Yuan data ended up in linnarsson-osmfish.sh: Each data set should get its own script, with one master script to run them all.

Create tiles from hdf5 image data

Copied from Slack:

pete [12:29 PM]

Check out clodius/tiles/mrmatrix.py
also check out this file: https://github.com/higlass/clodius/blob/pkerpedjiev/tsv-to-mrmatrix/scripts/tsv-to-mrmatrix.py
it’s brand new but I used it on a 87K x 87K matrix

pete [12:33 PM]

it loads a dense matrix as a tsv file into an hdf5 file and then creates a data pyramid

pete [1 month ago]

you should be able to test it out using higlass-python

pete [1 month ago]

import higlass
import higlass.client as hgc
import higlass.tilesets as hgti

f = h5py.File('../blah.h5', 'r')

ts1 = hgti.mmatrix('../blah.h5')

tr1 = hgc.Track('heatmap', 
                position='center',
                height=400,
                tileset=ts1)

view1 = hgc.View([tr1])
(display, server, viewconf) = higlass.display([view1])

pete [1 month ago]

https://github.com/higlass/higlass-python

pete [12:33 PM]

in particular take a look at the coarsen function which takes an hdf5 array and creates the downsampled layers
the 120 lines of code should be pretty straightforward. Let me know if you have any issues.

Color output

test.sh has a lot of output: Some color would help identify sections.

Fix issues with duplicated cell coordinates in the Linnarsson dataset

The cells.json file contains cells with different IDs, but all of the same values for t-SNE, xy, factors, genes - but conflicting values for PCA.

For more context: vitessce/vitessce#638 (comment)

Check that bucket target == branch?

Nothing enforces that the bucket target is reset when we begin new work... but we could do that. Perhaps, like the demos, check that there are no uncommitted changes, and probably also prohibit running from master?

Would it be weird to have the script itself set the target, based on the branch name?

Create and use ome-tiff-tiler docker container

There are a lot of tricky details that may be hard to make platform independent: A docker container would be good for this. It probably has two mounts: The ome-tiff, and the directory where the tiles should be created. Everything else should be an environment variable. And maybe make a pythonic wrapper, so the end user doesn't need to think in terms of envvars.

Code to generate spraggins.ome.tiff fixture?

The spraggins.ome.tif fixture is not a properly formatted OME-TIFF. Do we know where this fixture came from?

Make a smaller sample input for cells

Neighborhood generation, in particular, is taking a long time.

Load Rueben's data

Here's what he wrote on slack: (There are files to download there, too.)

Hi Chuck, my apologies again for my late reply, it’s been a very hectic last weeks for me and GC wants me to submit the Giotto pipeline together with Qian’s visualization tool, before working on integrating it with other tools

Nevertheless I’ve shared a processed giotto object with you and an R script with a function that converts the giotto object into a json format and you can select the factors and mappings that you want.

To use the giotto object you probably want to install giotto, see https://github.com/RubD/Giotto

Single-cell spatial analysis pipeline. Contribute to RubD/Giotto development by creating an account on GitHub.

It’s all still under active development, but here you can find some more instructions.
When Giotto is installed you can also access the raw expression matrices and cell centroid coordinates for the cortex and OB data, as they are part of the package

Anyway, let me know if you have any questions. Unfortunately I’ll be on holidays the first 2 weeks of July, but I should have much more time after I’m back.

I noticed that the previous one did not have the spatial locations of the cell centroids
so I’ve added that as well

Try libvips for tiling

Brew install didn't work for me... Think I should upgrade OS; Even then, I'm not sure how much faster it will be.

vitessce / vitessce-data Goto Github PK

vitessce-data's Introduction

Why Vitessce

Interactive

Integrative

Serverless

Usage

Development

Changesets

Branches

Pull requests

Monorepo organization

Meta-updater script

Testing

Linting

Troubleshooting

Deployment

Staging

Publish staged development site

Publish staged docs to vitessce.io

Release

From GitHub Actions

From local machine

Version bumps

Patch version bumps

Minor version bumps

Major version bumps

Related repositories

Old presentations

Citation

vitessce-data's People

Contributors

Stargazers

Watchers

Forkers

vitessce-data's Issues

Description

Proposed Solution

Overview

Genes/Clusters (Heatmap)

Cell-Sets/Cells

Raster

Molecules

Example 1 -- Use case: single image url, single color/selector components

Example 2 -- Use case: multiple image urls, single color/selector component + component to trigger raster.add event

Example 3-- Use case: multiple image urls, single color/selector component, shared attributes in source

Example 4 -- Use case: multiple image urls, multiple color/selector components, no selector for to trigger raster.add event

ims.raster.json

mxif.raster.json

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Example 2 -- Use case: multiple image urls, single color/selector component + component to trigger `raster.add` event

Example 4 -- Use case: multiple image urls, multiple color/selector components, no selector for to trigger `raster.add` event