funkelab / gunpowder Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 53.0 30.73 MB

A library to facilitate machine learning on multi-dimensional images.

Home Page: https://funkelab.github.io/gunpowder/

License: MIT License

Python 99.32% Makefile 0.10% Dockerfile 0.28% Singularity 0.31%

computer-vision gunpowder machine-learning multidimensional pipeline

gunpowder's People

Contributors

Stargazers

Watchers

gunpowder's Issues

Documentation search links are broken

When you search on the GitHub.io documentation, it provides a list of links (as intended) but all of the links result in 404 error.

Artifact in output N5 for all-zero blocks

Some all-zero blocks in N5 volumes created by N5Write erroneously contain part of the last non-zero block in the batch data array (in scan order).

This does not occur for Hdf5Write, so is not in the data array itself, and I've verified it's actually in the N5 and not a bug in BDV/n5-view.

z5py version 1.4.1

It seems most likely to be an artifact from Hdf5LikeWrite/z5py slicing/broadcasting or empty/uniform fill value block optimization. cc @constantinpape

I'll try to create a minimal reproduction, but wanted to create an issue first in case anyone else has encountered this.

ZarrSource doesn't support Stores

Current Gunpowder ask for Zarr path in order to get the data. Which limite the use cases.
E.g:

Remote data
A new Zarr Store

I think it will be more useful to have Zarr Store input instead of Zarr Path https://github.com/funkey/gunpowder/blob/773b5578b01d16d34ef3cc466904ab876bba8bf1/gunpowder/ext/zarr_file.py#L11
And relay on Zarr to provide the data

@funkey

node_dependencies batch_id

The node_dependencies branch contains a crop and merge step for batches at the end of the provide function of a BatchFilter. This leads to a new Batch being created and returned with a new batch_id.
If a pipeline gets long, or is run by many threads in parallel, it seems like the batch_id will be incremented excessively and the over head of retrieving a lock could become significant.

One option to resolve this that seems quite nice is to replace the batch_id with a request_id since that should be more consistent and useful for grouping batches that belong together.

make recipe for running tests and cleaning

I get a test failure due to a directory not existing, when running python2 tests/test_suite.py - not sure if I'm doing something wrong.

======================================================================
ERROR: test_output (cases.tensorflow_train.TestTensorflowTrain)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cbarnes/work/synapse_detection/gunpowder/tests/cases/tensorflow_train.py", line 115, in test_output
    batch = pipeline.request_batch(request)
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/nodes/batch_provider.py", line 139, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/batch_provider_tree.py", line 45, in provide
    return self.output.request_batch(request)
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/nodes/batch_provider.py", line 139, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/nodes/batch_filter.py", line 133, in provide
    self.process(batch, request)
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/nodes/generic_train.py", line 151, in process
    self.train_step(batch, request)
  File "/home/cbarnes/work/synapse_detection/gunpowder/gunpowder/tensorflow/nodes/train.py", line 189, in train_step
    checkpoint_name)
  File "/home/cbarnes/.pyenv/versions/gp2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1675, in save
    raise exc
ValueError: Parent directory of tf_graph_checkpoint_100 doesn't exist, can't save.

Also, tests erroring out leaves a bunch of files scattered in the project root; it would be nice to be able to make clean those away, or even more ideally, for TestCases which generate files create a temp directory where all those files go, to be deleted in a tearDownClass method.

Add keras support

include keras as external module (similar to caffe)
implement keras.Train
implement keras.Process

Make loss scale a gunpowder volume

Currently, the loss scale is computed inside the caffe.Train node. To have better control over it, and to include it in snapshots, error scale should be a new VolumeType LOSS_SCALE.

Update Chunk node for batch volumes

Chunk is still using the deprecated (and non-functional) input and output ROIs of a batch request and provider specs. This needs to be updated to general volume ROIs.

RandomLocation memory issues

I recently started facing memory issues in a simple pipeline. I was able to pin the issue down to RandomLocation, which is strange because I have not observed any issues before. I do have a minimum working example, for which memory usage continuously increases, amassing 4GB within the first minute (when run with the --use-random-location option):

import glob

from gunpowder import ArrayKey, BatchRequest, Coordinate, Hdf5Source, RandomLocation, build, Pad

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--use-random-location', action='store_true')
args = parser.parse_args()

RAW_KEY   = ArrayKey('RAW')

pattern = '/groups/saalfeld/home/hanslovskyp/data/from-arlo/interpolated-combined/sample_*.h5'
fn      = glob.glob(pattern)[0]

shape        = Coordinate((43, 430, 430))
voxel_size   = Coordinate((360.0, 36.0, 36.0))
shape_world  = voxel_size * shape

data_provider = Hdf5Source(fn, datasets={RAW_KEY: 'volumes/raw'})

request = BatchRequest()
request.add(RAW_KEY, shape_world, voxel_size=voxel_size)
train_pipeline = data_provider

if args.use_random_location:
    print('Using RandomLocation')
    train_pipeline += RandomLocation()
else:
    print('Using Pad')
    train_pipeline += Pad(RAW_KEY, None)

print("Starting training...")
with build(train_pipeline) as b:
    for i in range(500000):
        b.request_batch(request)

print("Training finished")

Usage:

# Just pad, no noticable increase in memory use:
python gunpowder_memory_issues_mwe.py
# Random location, memory use increases significantly:
python gunpowder_memory_issues_mwe.py --use-random-location

I tried on my workstation in a fresh conda environment:

conda create -n gunpowder-memory python=3.6
conda activate gunpowder-memory
pip install git+https://github.com/funkey/augment
pip install git+https://github.com/funkey/[email protected]

I will try on my laptop, as well, to confirm.

RandomLocation surprised me as the potential cause for this issue as it seems to have been stable for a long time.

Duplicated loop

Loops beginning on https://github.com/funkey/gunpowder/blob/03d15ee0dcd63e1250f851f108c487fff30a473a/gunpowder/torch/nodes/train.py#L265
and https://github.com/funkey/gunpowder/blob/03d15ee0dcd63e1250f851f108c487fff30a473a/gunpowder/torch/nodes/train.py#L289 are duplicates.

Example of RasterizePoints for sparse GT

Hello friends, Andrew here. Hope all is well on the farm.

I've got some volumes that are sparsely labeled, a few positive voxels in each volume. I would like to represent the labels as a list of points and render the volume on the fly with RasterPoints, but I'm a little hazy on the details.

Suppose I've got an h5 like:

/volumes/raw - (n_channels, x_extent, y_extent, z_extent)
/volumes/labels - (n_labels, n_dim)

Would this be the proper way to define the PointsKey?

raw = gp.ArrayKey('RAW')
labels = gp.PointsKey('GT')

source =
    gp.Hdf5Source(
        'example.hdf',
        {
            raw: 'volumes/raw',
            labels: 'volumes/labels'
        }
    )

And then I use the rasterizer like RasterizePoints(raw, labels)?

Would you expect this approach to outperform pre-rasterizing and using full volumes for GT?

Torch multiprocessing probelms

What do I want? The ability to use a pipeline that looks something like:

pipeline = gunpowder.torch.nodes.Predict(...) + gunpowder.torch.nodes.Scan(...)

What goes wrong:
First error: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Attempted to solve with using multiprocessing.set_start_method("spawn")
Second error: RuntimeError: context has already been set
I think this is the first problem that needs fixing: importing gunpowder should not set the spawn method before a user gets the chance to determine what the spawn method should be. I've tracked this down to the class variable __next_id created in the class Batch in file "gunpowder/batch.py". It seems this line sets the spawn method to "fork" if no method has been set yet.
Additionally the LocalServer in "gunpowder/tensorflow/local_server.py" also has a class variable that leads to the spawn method being set on import.

Further problems:
After removing the two mentioned variables that cause Multiprocessing to set a default spawn method, you can set the spawn method with multiprocessing.set_spawn_method("spawn"). You then get an error: AssertionError: can only join a started process. This seems to be getting thrown in "gunpowder/producer_pool.py" on line 76 at self.__watch_dog.join(). It seems to happen

Does this mean we can't use double underscore methods if we want to be multiprocessing friendly?

Continuing:
After removing all "__{name}" methods and replacing them with "_{name}" in ProducerPool, the precache.py tests still seem to hang forever. This seems to be an issue with logging errors in tests, so I created some simple scripts to do the same thing outside of a testing framework. Now I get the following error:
RuntimeError: Attempted to send CUDA tensor received from another process; this is not currently supported. Consider cloning before sending.
The stack trace here is particularly unhelpful so I am having a lot of difficulty figuring out where this is coming from.

See branch torch-multiprocessing for changes I have made to explore this problem.

TypeError: init() takes at least 2 arguments (1 given)

I have built the gunpowder image using docker first. When I try to run a sample, such as https://github.com/funkey/gunpowder/blob/master/examples/cremi/request_batch.py, I get a bug:
root@28d57bb6feec:/userhome/huangwei/mala/mala/cremi# python request_batch.py
DEBUG:gunpowder.array:Registering array key RAW
DEBUG:gunpowder.array:Registering array key GT_LABELS
DEBUG:gunpowder.array:Registering array key GT_LABELS_2
DEBUG:gunpowder.array:Registering array key GT_LABELS_4
DEBUG:gunpowder.array:Registering array key GT_IGNORE
DEBUG:gunpowder.array:Registering array key GT_AFFINITIES
DEBUG:gunpowder.array:Registering array key GT_BOUNDARY
DEBUG:gunpowder.array:Registering array key GT_BOUNDARY_GRADIENT
DEBUG:gunpowder.array:Registering array key GT_BOUNDARY_DISTANCE
DEBUG:gunpowder.array:Registering array key LOSS_SCALE
DEBUG:gunpowder.array:Registering array key ALPHA_MASK
Traceback (most recent call last):
File "request_batch.py", line 148, in
train()
File "request_batch.py", line 50, in train
for sample in ['sample_A_20160501.hdf','sample_B_20160501.hdf','sample_C_20160501.hdf']
File "request_batch.py", line 50, in
for sample in ['sample_A_20160501.hdf','sample_B_20160501.hdf','sample_C_20160501.hdf']
TypeError: init() takes at least 2 arguments (1 given)
I don't understand it and don't know how to deal with it.
I guess it could go wrong in Normalize() function or RandomLocation() function.
I'd like you to tell me the some solution. Thank you very much.

Add support for points

Similar to volumes, add support to request and augment sets of points. This will be used for annotations like synaptic sites and skeletons.

z5py-based tests don't run on CI

As discussed in #29

z5py has a runtime dependency on some GCC-related libraries which are more recent than those available on conda and travis. More recent systems don't seem to have this issue. It is difficult to trigger this issue in a minimal test case like https://github.com/clbarnes/z5_boost_bug , but may be resolved by future updates to z5py, travis, conda, or gunpowder.

Possible performance bug with predict nodes and shared memory

Turns out that the default value for max_shared_memory
https://github.com/funkey/gunpowder/blob/e523b49ca846a9fd46ab6fc0dd1040cc4a4d53b4/gunpowder/tensorflow/nodes/predict.py#L71
does not allocate 1GB of rather 4GB because the value type ctypes.c_float is used when for creating the RawArray.
https://github.com/funkey/gunpowder/blob/e523b49ca846a9fd46ab6fc0dd1040cc4a4d53b4/gunpowder/tensorflow/nodes/predict.py#L90

With 4GB per each input and output array, this means each predict worker is allocating 8GB of shared memory. With 4 workers, job memory should then be at least 32GB. Now, here is the performance bug. I did not know about the shared memory requirement and have always run my inference pipeline with 4 workers with only 8GB of memory (to minimize my resource usage counting :)). You'd think that gunpower would run out of memory and be killed, but actually it will not! Turns out, the reason for this is because Python multiprocessing package creates a temp file and mmap that whenever RawArray is used: src. So it does not matter how many workers are being run and how much shared memory is being allocating Python will happily chug along although with possible slow downs of memory being swapped in and out to disk.

The most immediate slow down is during initialization when the array is set to zero. I have seen my inference jobs taking more than 15m to initialize all of the RawArrays to disk tempfiles (vs less than 1m if there were enough memory).

The second-order bug would be during runtime and data is paged out from main memory to disk. In my inference jobs, once getting past the initialization, 8GB was actually enough for four workers, but I can imagine scenarios where not enough memory was requested for the jobs and data is paged out on every transfer. I don't know exactly what the mechanism is inside the OS for it to decide when to page something out from a memory mapped file, but we probably should avoid this scenario at all time because it can be an opaque performance bug.

My recommendations are:

At the very least, the max_shared_memory argument should be made more transparent to the user. Maybe something like shared_memory_per_worker_GB and then calculate the appropriate max_shared_memory from that.
The default for max_shared_memory should be decreased substantially. I'm guessing that most production jobs won't be transferring more than a few hundred MBs, so maybe the default should be capped to something like 64MB or 128MB and the more experimental users can increase it accordingly.

Compatibility with TensorFlow v2.x

I saw the gunpowder documentation and it looks like TensorFlow v1, would I need to use tf.compat.v1 in TensorFlow v2 or is Gunpowder compatible with v2.x?

Docs v1.2 search links broken

All links in an example search currently lead to a 404.

Consistency of provider specs

A provider of a limited volume (like an Hdf5Source) should map VolumeType to a Roi in ProviderSpec.volume_rois. If an unlimited volume is provided (like in RandomLocation or Pad with infinite padding), ProviderSpec.volume_rois should map VolumeType to None.

Same goes for point_rois.

Check that this is done for all geometry changing nodes:

Functionality questions for ome-zarr formatted data

Greetings from the Mehta Lab and apologies in advance for the long post!
I am attempting to use gunpowder as a dataloader for (float32) data in the ome-zarr format. I have run into a few issues trying to get functionality to work with some data I have. I have enumerated some questions I have below.

Support for multiple zarr stores in the OME-HCS Zarr format
If I have data stored in ome-zarr format as a series of hierarchical groups (row > col > position > data_arrays), when I create datasets inside of a source node, they need to be specified by inputting the full hierarchy path to the dataset source:

raw = gp.ArrayKey('RAW')
source = gp.ZarrSource(
    filename=zarr_dir,
    datasets={raw: 'Row_0/Col_1/Pos_1/arr_0'},
    array_specs={raw: gp.ArraySpec(interpolatable=True)}
)

Because of this format, we store arrays containing data in different rows that are all part of one 'dataset' in different zarr stores. Is it possible to create a single source that can access multiple zarr stores?

Inconsistent behavior of BatchRequest objects
When applying some augmentations (for example the SimpleAugment node), re-usage of a BatchRequest without redefining the request or pipelines will randomly result in data returned with the wrong indices:

For example, I define a dataset and a pipeline with and without a simple augmentation node:

raw = gp.ArrayKey('RAW')

source = gp.ZarrSource(
    zarr_dir,  # the zarr container
    {raw: 'Row_0/Col_1/Pos_1/arr_0'},  # arr_0 is 3 channels of 3D image stacks, dims: (1, 3, 41, 2048, 2048) 
    {raw: gp.ArraySpec(interpolatable=True)} 
)

simple_augment = gp.SimpleAugment(transpose_only=(-1,-2))

pipelines = [source, source + simple_augment]

Then I define a batch request:

request = gp.BatchRequest()
request[raw] = gp.Roi((0,0,0,0,0), (1,3,1,768,768))

Then I use that request to generate two batches from each pipeline in sequence:

#First loop is fine, second loop has 2nd dimension flipped/transposed
for n in range(2):
  batches = []
  for pipeline in pipelines: #for both augmented and plain pipeline
    with gp.build(pipeline):
      batch = pipeline.request_batch(request) #get batch
      batches.append(batch)

  # visualize the content of the batches
  fig, ax = plt.subplots(len(pipelines), 3, figsize = (14,10))
  for i in range(len(pipelines)):
    for j in range(3):
      ax[i][j].imshow(batches[i][raw].data[0,j,0])
  ax[0][1].set_title('source')
  ax[1][1].set_title('source + aug')    
  plt.show()

The result is the following:
Visualization of batch from loop 1 Visualization of batch from loop 2

I am confused as to why the behavior changes when the data, pipeline, and batch request haven't changed. Is there a reason that the second augmentation batch returns with reversed channels?

Thanks!!

Can't run the cremi example.

Hi,
I tried to run the cremi example from: http://funkey.science/gunpowder/tutorial.html.

I did this in a conda environment and ended up with the following error. It comes when calling the build function. Does anyone know what to do about this?

	ERROR:gunpowder.build:something went wrong during the setup of the pipeline, calling tear down
	Traceback (most recent call last):
	  File "train.py", line 147, in <module>
	    train(200000)
	  File "train.py", line 139, in train
	    with gp.build(pipeline):
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/build.py", line 12, in __enter__
	    self.batch_provider.setup()
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup
	    self.__rec_setup(self.output)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
	    self.__rec_setup(upstream_provider)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
	    self.__rec_setup(upstream_provider)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
	    self.__rec_setup(upstream_provider)
	  [Previous line repeated 4 more times]
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 71, in __rec_setup
	    provider.setup()
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup
	    self.__rec_setup(self.output)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
	    self.__rec_setup(upstream_provider)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
	    self.__rec_setup(upstream_provider)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/batch_provider_tree.py", line 71, in __rec_setup
	    provider.setup()
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/nodes/hdf5like_source_base.py", line 73, in setup
	    with self._open_file(self.filename) as data_file:
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/gunpowder/nodes/hdf5_source.py", line 39, in _open_file
	    return h5py.File(filename, 'r')
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__
	    swmr=swmr)
	  File "/home/sid/anaconda3/envs/tf-py36-gunpowder/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
	    fid = h5f.open(name, flags, fapl=fapl)
	  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
	  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
	  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
	OSError: Unable to open file (truncated file: eof = 149520384, sblock->base_addr = 0, stored_eof = 1476647581)

The conda environment I have installed has python=3.6, scikit-image==0.14.2, numpy==1.16, tensorflow-gpu==1.14.0, Nvidia Driver Version: 410.78 CUDA Version: 10.0.

As this is run on the examples provided by the cremi dataset, sample_A_padded_20160501.hdf, it seems a bit strange. Maybe someone can point out my mistake.

Thank you.

merging batch requests only updates roi

I'm trying to write a node that requires its input to be of a specific datatype, plus a cast node to meet that requirement but have been running into issues.
I think what's happening is that the current process for merging batch requests (for spatial arrays that are updated in place) only updates the roi, not the other attributes in a spec. As an example, in Normalize setting dtype = None has no effect which can lead to errors if downstream nodes request a specific datatype. https://github.com/funkey/gunpowder/blob/b074e502cd90e27cdc14ed54cd8dace3a6b2b530/gunpowder/nodes/normalize.py#L44-L48
Is this the intended behavior and requesting a specific datatype is a bad idea?
I identified this as the relevant part of the code:
https://github.com/funkey/gunpowder/blob/b074e502cd90e27cdc14ed54cd8dace3a6b2b530/gunpowder/batch_request.py#L100-L116

User-defined VolumeTypes

Currently, the volume types are pre-defined, and each new type has to be added to the library. This makes it difficult to add a new volume type on a per-usecase basis.

Even worse, metadata (so far only interpolate) is defined decentralized, it is the volume creator's responsibility to set it correctly.

Volume types and their meta-data should be defined centrally, in a way that users can add their own.

Question: 2D data from 3D provider

Hi,

I have a question: Is there an option to drop a spatial dimension in a pipeline?
Currently I have 3D volume (c, x,y,z) and would like to extract random (x,y) slices.
I figured, that something like:

pipeline = source + gp.RandomLocation() + gp.Squeeze([raw], axis=3)
with build(pipeline):
    batch = pipeline.request_batch(
        BatchRequest({
            raw: ArraySpec(roi=Roi((0, 0), (100, 100))),
       })
    )

Would probably work, but I get the error message:

gunpowder/nodes/random_location.py:236: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [0:100, 0:100, 0:100] (100, 100, 100), by = (0, 0)

    def shift(self, by):
        '''Shift this ROI.'''
    
>       return Roi(self.__offset + by, self.__shape)

gunpowder/roi.py:258: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = (0, 0, 0), other = (0, 0)

    def __add__(self, other):
    
        assert isinstance(other, tuple), f"can only add Coordinate or tuples to Coordinate. {type(other)} is invalid"
>       assert self.dims() == len(other), "can only add Coordinate of equal dimensions"
E       AssertionError: can only add Coordinate of equal dimensions
E       assert 3 == 2
E         +3
E         -2

gunpowder/coordinate.py:46: AssertionError

So I am stuck here and would appreciate any help....

Eric

rename attribute resolution to voxel_size

Currently, the attribute resolution is used to specify the size of voxels ([usually in nanometer]) of the underlying datasets/volumes. This information is important as some operations are carried out in voxel while other operations are carried out in physical space.

A renaming from resolution to voxel_size would make the meaning of this attribute more clear.

ZarrWrite upstream of Scan with multiple workers leads to empty patches in written array

Below is a minimal example of what I'd like to do. Note that the request size (in voxels) is not a power of 2.
The chunk size for the new array is chosen automatically, in this example it is set to (64, 128, 128) .

~~Presumably due to the leading dimension of the chunk size being smaller than its counterpart in the request~~ Presumably as the chunk size does not divide the request size, the array on disk has some empty patches from conflicting concurrent write operations on the same chunk.

Can we simply expose the chunk size of a new array in ZarrWrite to avoid this?

import datetime
import numpy as np
import gunpowder as gp

timestamp = datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%S")

voxel_size = gp.Coordinate((5,) * 3)
input_size = gp.Coordinate((110,) * 3) * voxel_size

raw = gp. ArrayKey('RAW')

request = gp.BatchRequest()
request.add(raw, input_size)


class Source(gp.BatchProvider):
    def __init__(self, voxel_size, array_key):
        self.shape = gp.Coordinate((512,) * 3) * voxel_size
        self.roi = gp.Roi((0, 0, 0), self.shape)
        self.array_key = array_key

        self.dtype = np.uint8
        data = np.random.randint(0, 256, (512,) * 3, dtype=self.dtype)

        self.manual_spec = gp.ArraySpec(
            roi=self.roi,
            voxel_size=voxel_size,
            dtype=data.dtype,
            interpolatable=True,
        )

        self.array = gp.Array(data, self.manual_spec)

    def setup(self):
        self.provides(self.array_key, self.manual_spec)

    def provide(self, request):
        out = gp.Batch()
        out[self.array_key] = self.array.crop(request[self.array_key].roi)
        return out


pipeline = (
    Source(voxel_size, raw)
    + gp.ZarrWrite(
        dataset_names={raw: 'raw'},
        output_filename=f"{timestamp}.zarr",
    )
    + gp.Scan(reference=request, num_workers=5)
)

with gp.build(pipeline) as p:
    p.request_batch(gp.BatchRequest())

SimpleAugment.process should iterate over keys from request

SimpleAugment.process gets array keys from batch, when it would make more sense to use the keys in the request:

for (array_key, array) in batch.arrays.items():

SimpleAugment should only modify the arrays in the request, not any other array that may be attached to the batch.

The most simple way to fix this would be to add a check

if array_key not in request:
   continue

in the appropriate places. I can create a PR but would need to know if you agree on that, first.

gunpowder.Stack only output a single random image

I am trying to use the gunpowder.Stack class to request a stack of random images from the source (saying 5).
The returned image stacks always have one new random image appended on the previous batch request. (saying 4 old images + 1 new image).

To reproduce the issue:

source = gp.ZarrSource(
    pathZarr,
    {
      raw: 'tr/tr000',
      gt: 'gt/gt000'
    },
    {
      raw: gp.ArraySpec(interpolatable=True),
      gt: gp.ArraySpec(interpolatable=False)
    })

pipeline = (
    source
    + gp.RandomLocation()
    + gp.Stack(5)
)
    
request = gp.BatchRequest()
request[raw] = gp.Roi((0, 0), (256, 256))
request[gt] = gp.Roi((0, 0), (256, 256))

with gp.build(pipeline):
    for i in range(3):
        batch = pipeline.request_batch(request)
        imshow(batch[raw].data, batch[gt].data) # custom plot function

The image shows outputs from batch requests:

# custom imshow function
def imshow(*args, n=5, figurezize=(10, 4)):
    num_args = len(args)
    plt.figure(figsize=figurezize)
    for row_idx, image_set in enumerate(args): 
        image_set = np.array(image_set)
        n = min(n, image_set.shape[0])
        for col_idx in range(n):
            ax = plt.subplot(num_args, n, row_idx * n + col_idx + 1)
            plt.imshow(image_set[col_idx])
            plt.axis('off')
    plt.show()

AddAffinities node casts labels to np.int32, why?

AddAffinities node does this:

affinities = seg_to_affgraph(
            batch.arrays[self.labels].data.astype(np.int32),
            self.affinity_neighborhood
).astype(self.dtype)

and in seg_to_affgraph:

aff = np.zeros((nEdge,)+shape,dtype=np.int32)

which gives empty affs if labels has really big uint64 values, which it might. is there a reason why labels is cast such a way? it seems to work alright if i change the type cast to np.uint64

thanks!

Keyboard interrupt does not work

It looks like KeyboardInterrupt exceptions might be getting caught inside a try/except block

Update link to documentation

The linked http://funkey.science/gunpowder (Readme and description of repo) returns a 404.
I was able to guess the new url https://funkelab.github.io/gunpowder/ ;).

Check empty should not use sum != 0

In the tensorflow prediction node if skip_empty==True the check statement seems incorrect:

I am referring to
https://github.com/funkey/gunpowder/blob/bb204c84293b637137b23195b8de18caf5ca1197/gunpowder/tensorflow/nodes/predict.py#L118
where the prediction is not skipped if all
batch[array_key].data.sum() are not equal to zero.

However, the sum of all values can be zero even if there are non zero values in the input image. I suggest we replace it by
if batch[array_key].data.any():
as proposed here:
https://stackoverflow.com/questions/18395725/test-if-numpy-array-contains-only-zeros

DaisyRequestBlocks fails to create a daisy context

For making large-scale predictions, I want to swap a Scan node for the DaisyRequestBlocks node, which as far as I understand should avoid Scan's behaviour of filling up memory with the composite Batch that gets assembled from Scan's requests.

Here's the error I get:

ERROR:daisy.context:DAISY_CONTEXT environment variable not found!
Process Process-2:
Traceback (most recent call last):
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bengallusser/code/src/gunpowder/gunpowder/nodes/daisy_request_blocks.py", line 98, in __get_chunks
    daisy_client = daisy.Client()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/client.py", line 66, in __init__
    self.context = Context.from_env()
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/site-packages/daisy/context.py", line 30, in from_env
    tokens = os.environ['DAISY_CONTEXT'].split(':')
  File "/home/bengallusser/miniconda3/envs/lsd/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'DAISY_CONTEXT'

DaisyRequestBlocks does not pass a context to the daisy client.
Do I have to to specify the DAISY_CONTEXT manually?

I'm using gunpowder=1.1.5 and tried combining with both daisy-0.2.1 available on PyPI and the daisy branch 0.3-dev, I get the same error.

Add rule to build docs

I noticed there's no rule in the makefile for building docs. That sure would be nice.

Looks like the following works from the gunpowder top-level dir: sphinx-build docs/build/ docs/

Do I have that right?

Node that simply maps numpy array to new value

For prototyping it would be very useful to have a node that applies a user-provided mapping to the voxels of an array for a key/multiple keys (the shape of the data may not change). Is there a node with such functionality in gunpowder?

I currently have a MapNumpyArray node that does exactly that:

class MapNumpyArray(BatchFilter):
    def __init__(
            self,
            mapping,
            *keys):
        super(BatchFilter, self).__init__()
        self.keys    = keys
        self.mapping = mapping

    def setup(self):
        pass

    def prepare(self, request):
        pass

    def process(self, batch, request):

        for key in self.keys:
            if not key in request:
                logger.debug('Ignoring key %s: not in request %s', key, request)
                continue

            assert key in batch.arrays, 'Requested key %s not in batch arrays %s' % (key, batch)

            array = batch.arrays[key]
            logger.debug('Array data dtype for key %s before mapping: %s', key, array.data.dtype)
            mapped_data = self.mapping(array.data)
            assert mapped_data.shape == array.data.shape, 'Mapping must not change shape of data: Original shape is {}, mapped shape is {}'.format(array.data.shape, mapped_data.shape)
            array.data = mapped_data
            logger.debug('Array data dtype for key %s after mapping: %s', key, batch.arrays[key].data.dtype)

Could this be useful for the broader gunpowder community? If so, I'll file a PR.

Passing cuda device as an argument to Train, Predict nodes

It would be helpful if we could pass a cuda device explicitly to the train and predict nodes for DGX systems, to be able to spawn multiple models on different cards simultaneously on a single machine.
Currently both scripts default to "cuda:0" on device availability: "cuda:0"

I have added this into a fork of the repo, wanted to discuss if this is something useful to integrate here.

Best,
Samia

Seeded batch request randomness

Have batch requests carry a seed, used to draw random numbers, such that requesting the same batch with the same seed two times results in exactly the same outcome.

This facilitates testing, debugging, and "peeking": Nodes like Reject can have a "peek" at what a part of a batch would look like by requesting a subset of the batch. On acceptance, they can request the rest. This requires that multiple requests are equally randomized.

random seed bug in stack node

The stack node copies the request and calls provide:
https://github.com/funkey/gunpowder/blob/773b5578b01d16d34ef3cc466904ab876bba8bf1/gunpowder/nodes/batch_provider.py#L184-L188

In provide request_batch is called for each element in the batch;
https://github.com/funkey/gunpowder/blob/773b5578b01d16d34ef3cc466904ab876bba8bf1/gunpowder/nodes/stack.py#L28-L31
In request_batch the random seed is updated, once for each element, so at the end it is batch_size steps further:
https://github.com/funkey/gunpowder/blob/773b5578b01d16d34ef3cc466904ab876bba8bf1/gunpowder/nodes/batch_provider.py#L176

So far everything is fine. The problem is in the following iterations. When a new batch is requested, the random seed of the original request is advanced only once. Therefore when it is back at the stack node it has the same state as after the first element of the batch of the previous iteration. Thus the first batch_size-1 elements in the second iteration are the same as the last batch_size-1 elements of the first iteration, and so on, it shifts by one in every iteration.
I hope this description makes somewhat sense.

Unit-test batch providers

pip install does not include augment dependency

See funkey/augment#1

Py3 compatibility

https://pythonclock.org/

I got the tests to pass with a single change in nodes/chunk.py and a couple of coercions to integer in the test code.

If there's good reasons (like dependency issues) for keeping things old school, that's fine (and it'd be good to document that here, and in the README and other docs), but if it's not likely to be huge amounts of work then I can give it a crack.

Points for clarification:

Classic division. I've imported future division in every file which did any division, for consistency - it seems from existing coercions that true division is wanted practically all of the time anyway. The only case where I'm not sure this is true is in Coordinate (division dunder methods) and Roi (get_center).

Rename `BatchRequest.volumes` -> `BatchRequest.volume_rois`

This better reflects the purpose of this member: The request contains ROIs, only the batch contains the volumes. Similar for BatchRequest.points.

Automatic discovery of needed inputs

Whenever a BatchProvider needs A to provide B, it should be sufficient to request B. The provider can extend the request to include A. If the request already contains A, the provider extends A to the ROI needed, and crops back to serve the original request.

This can be facilitated by adding helper methods to BatchRequest and Batch:

BatchRequest.merge(request) will merge request into another request. Existing volume/point ROIs are enlarged, if necessary
Batch.crop(request) will crop each volume/point in batch to the ROIs in request (and throw an exception if this is not possible)

Better handling of nonspatial arrays, rois that are None

Original issue: Batch.get_total_roi() kept throwing an error because it assumes every ArraySpec has a non-None Roi. For nonspatial arrays, this is not true. For spatial arrays, this SHOULD be true, but there are no checks (so when I forgot to mark an array as nonspatial, it gave me a confusing error that NoneType has no attribute get_begin()).

The ArraySpec documentation currently says that 1) roi can be none for BatchProviders that allow requests for arrays everywhere, but 2) will always be set for array specs that are part of an Array. 2) is not true for nonspatial arrays. 1) seems weird, because we have unbounded Rois to handle this case.

I'm not sure if the documentation is outdated, or if it still reflects the code accurately, but it seems like ArraySpecs should always have a roi that is not None, unless they are nonspatial. I'd propose adding this check to ArraySpec, and then handling the nonspatial case in functions like Batch.get_total_roi(), but a deeper dive is needed to make sure this doesn't break anything else like BatchProvider.

Combination of RandomLocation and Reject provide repeated batches from the same source Roi

A standard gunpowder pipeline with RandomLocation and Reject leads to consecutive batches that come from the exact same source Roi, supposedly due to a bug in Reject's random seeding.

Below is a minimal example that showcases this.
Running with logging.setLevel(logging.DEBUG) sheds light at the problem in detail: Due to the fixed random seeding of requests, if n batches with Rois r0, r1 ... rn-1 are rejected, r1,r2, ..., rn-1 will be queried again in next request coming from downstream, leading to n consecutive identical Rois rn that are actually passed downstream by Reject.

I have implemented a quick fix by updating request._random_seed in Reject's loop, which requests batches until a feasible one is found, by putting request._random_seed += random.randint(0, 2**63 - 1)in line 75 in Reject https://github.com/funkey/gunpowder/blob/773b5578b01d16d34ef3cc466904ab876bba8bf1/gunpowder/nodes/reject.py#L75.
However, I do not fully understand how the request seeding is designed to work, so this is probably not the proper fix for this problem. Happy to discuss and then file a PR.

You might need to use gunpowder's master branch to run this without other errors.

import logging

import numpy as np
from skimage.morphology import disk

import gunpowder as gp

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


class Source(gp.BatchProvider):
    """Toy source with a padded 2D disk, raw and labels array."""

    def __init__(self, raw_key, labels_key, voxel_size):

        self.voxel_size = voxel_size
        self.raw_key = raw_key
        self.labels_key = labels_key

        data = disk(64)
        data = np.pad(data, 96, mode='constant')

        roi = gp.Roi((0,) * 2, data.shape)

        # RAW
        noise = np.random.normal(0, 0.5, data.shape)
        raw = data + noise
        raw = raw.clip(0, None)
        raw = raw / raw.max()
        raw = (raw * 255).astype(np.uint8)

        self.raw_spec = gp.ArraySpec(
            roi=roi,
            voxel_size=self.voxel_size,
            dtype=raw.dtype,
            interpolatable=True
        )
        self.raw_array = gp.Array(raw, self.raw_spec)

        # LABELS
        labels = data
        self.labels_spec = gp.ArraySpec(
            roi=roi,
            voxel_size=self.voxel_size,
            dtype=np.uint8,
            interpolatable=False
        )
        self.labels_array = gp.Array(labels, self.labels_spec)

    def setup(self):

        self.provides(self.raw_key, self.raw_spec)
        self.provides(self.labels_key, self.labels_spec)

    def provide(self, request):

        out = gp.Batch()

        if self.raw_key in request:
            out[self.raw_key] = self.raw_array.crop(request[self.raw_key].roi)
        if self.labels_key in request:
            out[self.labels_key] = self.labels_array.crop(
                request[self.labels_key].roi)

        return out


class SaveBlockPosition(gp.BatchFilter):
    """Save offset and shape of a provided 2D array into non-spatial array for
    logging purposes.

    Args:

        raw (:class:`ArrayKey`):

            Data :class:`ArrayKey`, whose coordinates we log.

        log (:class:`ArrayKey`):

            Non-spatial :class:`ArrayKey` to store the coordinates.
    """

    def __init__(self, raw: gp.ArrayKey, log: gp.ArrayKey):

        self.raw = raw
        self.log = log

    def setup(self):

        self.provides(
            self.log,
            gp.ArraySpec(nonspatial=True)
        )

    def prepare(self, request):
        deps = gp.BatchRequest()
        deps[self.raw] = request[self.log].copy()
        return deps

    def process(self, batch, request):

        log = np.zeros((2, 2), dtype=int)
        roi = batch.arrays[self.raw].spec.roi
        log[0] = np.copy(roi.get_offset())
        log[1] = np.copy(roi.get_shape())

        logger.debug(f'Block offset and shape: {log}')

        batch = gp.Batch()
        batch[self.log] = gp.Array(
            log, gp.ArraySpec(nonspatial=True)
        )

        return batch


voxel_size = gp.Coordinate((1,) * 2)
input_size = gp.Coordinate((64,) * 2) * voxel_size

raw = gp.ArrayKey('RAW')
labels = gp.ArrayKey('LABELS')
log = gp.ArrayKey('LOG')

request = gp.BatchRequest()
request.add(raw, input_size)
request.add(labels, input_size)

request[log] = gp.ArraySpec(nonspatial=True)

pipeline = (
    Source(raw, labels, voxel_size)
    + SaveBlockPosition(
        raw=raw,
        log=log
    )

    + gp.RandomLocation()

    + gp.Reject(
        mask=labels,
        min_masked=0.05,
        reject_probability=0.9,
    )
)

with gp.build(pipeline) as p:
    for i in range(100):
        batch = p.request_batch(request)
        logger.info(
            f"Returned ROI with original offset and shape {batch[log].data}")

v1.0 not on PyPI

I can only find release 0.3.1 on PyPI

Did you stop releasing gunpowder on PyPI?

Contrib folder / gunpowder node exchange

Most projects using gunpowder have a few custom gunpowder nodes to enhance the pipeline. While some of these nodes are tailored to a specific project, I've also stumbled upon quite some nodes that are not part of the gunpowder core, but which were immediately applicable and very beneficial for my work.
I would like to create some form of exchange, for example in the form of a contrib folder, where gunpowder users can upload, exchange, discuss and develop new nodes and ideas.
Happy to hear your thoughts.

ROIs in world units

Rois should all be in world units. The following nodes need to be adjusted:

Weird logic for provided spec in RandomProvider

I was searching for places where we might be iterating through dictionaries using items() and deleting elements, since this is not allowed with python3, and I found this puzzling bit of code.

https://github.com/funkey/gunpowder/blob/e6ee429ab7b3c784aaafc041d11dacfbc3a9f8ed/gunpowder/nodes/random_provider.py#L41-L49

I didn't look too closely to see what the right solution is, but deleting a key after first checking to make sure that it is not there seems wrong. If the correct solution does involve deleting elements from a dictionary while iterating over it, we should use list(<dict>.items()).

SimpleAugment transposes axes in the ROI request, instead of transposing the axes in the array

If the global ROI is a subset of an entire dataset and therefore has three different offsets for x, y and z , the current transpose can lead to requests far away from the global ROI.

funkelab / gunpowder Goto Github PK

gunpowder's People

Contributors

Stargazers

Watchers

Forkers

gunpowder's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs