girder / girder_worker Goto Github PK

View Code? Open in Web Editor NEW

34.0 15.0 30.0 3.1 MB

Distributed task execution engine with Girder integration, developed by Kitware

Home Page: http://girder-worker.readthedocs.io/

License: Apache License 2.0

CMake 0.19% Python 91.39% Shell 2.76% Makefile 0.52% Dockerfile 0.45% JavaScript 2.81% Stylus 0.50% Pug 1.40%

data-science data-analytics kitware

girder_worker's Introduction

Girder Worker

A flexible, simple script execution engine that integrates with the Girder data management system to run distributed batch jobs.

The worker supports running tasks in many environments including Python, R, and Docker, and supports fetching and pushing data to and from various data sources, including Girder.

Visit the documentation for more details and API documentation.

We'd love for you to contribute to Girder Worker.

girder_worker's People

Contributors

Stargazers

Watchers

girder_worker's Issues

Name of ansible role

In the README, it says the name of the role is girder.girder_worker, but in ansible galaxy the role name is girder.girder-worker. Easy enough to fix in the README, but since this is already under the girder namespace, would it be better to have the role name of girder.worker?

Add timer for flushing output

We rate limit the flushing of output up to the job log, but we do so synchronously, so undesirable things can happen. Namely, if a bunch of output is written all at once while the timer is still waiting, but then nothing is written for a very long time, that output is just sitting there unseen to the user. We should use a mechanism akin to the twisted reactor's callLater(), where on each call to write() a timeout is updated a couple seconds in the future that will perform the flush if nothing else was written.

The string "text" format should be named "string"

This is a silly inconsistency on my part when putting together the initial types and formats. The other basic types all use the type name as the default format (number, integer, boolean).

The least breaking way to do this is probably to duplicate the small number of converters to add a string format that behaves exactly like text, including a back and forth noop converter between the two formats. We can then remove text from the docs and eventually deprecate it.

Docker from within docker

When running girder_worker inside of a docker container, running docker tasks can be problematic.

For reference, we have three components we need to refer to: (1) host - the host machine, (2) worker docker - girder_worker running in a docker container, and (3) task docker - the task we want to run in a docker container via the girder_worker docker plugin.

There are two basic approaches to run a docker container from within another docker container.

Approach 1: run the task docker on the host, which can be done by mounting two volumes on the worker docker: -v /usr/bin/docker:/usr/bin/docker -v /var/run/docker.sock:/var/run/docker.sock (assuming host and worker_docker are appropriate flavors of linux).

Approach 2: https://github.com/jpetazzo/dind , where a bash script does a bunch of magic as part of the start up of the task docker.

I don't like approach 2. We already run a script in the task docker which assumes that the task docker is debian-like and has bash. This doesn't work right in alpine or BusyBox where there are no groupadd and useradd commands. This would further limit what can be run as a task.

In Approach 1, the task docker can run but the mount points for the temporary directory and scripts directory end up referring to locations on the host, but were intended to refer to locations on the worker docker. A crude work around for this would be to copy our script to the temp directory (requiring a change in the docker plugin) and to set the tmp_root to a path that is identical on the host and the worker docker and is volume mounted between the two.

I'm not sure if Approach 2 would work easily, as I haven't actually tried it out.

Any other recommendations on how to accomplish running docker tasks in a docker worker?

Update Github project website and specify Readthedocs url

Edit project description on Github and add http://girder.readthedocs.org/ as website URL

Fixed version requirements cause problems

I have a job that requires Pillow version >= 3.4, but girder_worker has Pillow 3.2.0 as a hard requirement. The core plugins will NOT load with a different version of Pillow. Requirements should allow a version range or only specify a minimum version unless there is a known reason to have a fixed version.

Since requirements.txt should be the frozen version used for testing, setup.py should either specify packages explicitly (rather than reading a file), or, when reading the requirements.txt file should change simple == version requirements to ~=.

Investigate using Girder as a Celery results backend

If we need to move away from RabbitMQ it might be possible to implement a results backend that puts the results directly into Girder.

http://docs.celeryproject.org/en/latest/userguide/tasks.html#task-result-backends

Create a common development and integration testing framework

Mocking is only going to get us so far. We are going to need tests that actually run Celery tasks. Particularly for more complex workflows.

Add ability to enable or configure logging in Ansible role

I'd want to be able to enable logging of the system service of girder_worker installed by the Ansible role.

Ability to provide metadata for girder_io inputs

Similar to #91, this would provide the JSON metadata for girder inputs to the processing task if requested.

Docker-gc removes too many docker images

Docker-gc is too aggressive in what it removes. Specifically, I have installed docker images via the slicer_cli_web plugin, and then those images get removed by docker-gc, which breaks the cli enumeration.

Clean up Celery 3 & 4 support in app.py

Deliverables:

Transition to celery 4
Remove confusing code in app.py

Technical Notes:

Depends on: None

Use the same flake rules as Girder

Make girder_io support pushing a directory

Right now it assumes the path provided is a file. If it's a directory, we should recursively upload it into the given parent.

Support chaining celery tasks

Deliverables:

Demonstrate syntax like the following from girder REST endpoint:

	(task1.s () | task2.s()).delay()

Demonstrate asynchronously calling a girder worker task from another girder worker task (e.g. like taskflow - allows for conditionals etc).

Technical Notes:

Note on dependencies:
- We need a REST endpoint for creating child jobs because in the case of a chain, the second task is actually “produced” from a girder_worker instance.
- Parent child relationships need to exist inside girder because otherwise there is no way to visually relate chains of tasks
- Celery 3 has no clear way to relate one task to another. To allow chaining we need Celery 4 to be the default

Depends on:

Creation of girder job models via REST endpoints in worker plugin (girder/girder#2018)
Allow parent-child relationships between girder job models (girder/girder#2017)
Clean up Celery 3 & 4 support in app.py (#139)

Dealing with string inputs

I was interested in having dynamic text input (lets say the filename where I would like to save the blurred image, in the context of the example given in the documentation of girder worker). I created the outputFileName object and fed that to the execution of the save_image task.
I am following the convention of the input image (the lenna object in the code below) to create the outputFileName.
However, I got the error (as posted below). Can you explain if this is a bug or point out the way to handle this. I read the whole doc of girderWorker but couldn't figure it out.
---------------------python code-----------girderWorkerStandAlone.py--------------

import girder_worker
from girder_worker.specs import Workflow
wf = Workflow()

#create an input object
lenna = {
    'type': 'image',
    'format': 'png',
    'url': 'https://upload.wikimedia.org/wikipedia/en/2/24/Lenna.png'
}

'''
task to save the image
'''
save_image = {
    'inputs': [
        {'name': 'the_image', 'type': 'image', 'format': 'pil'},
        {'name': 'file_Name', 'type': 'string', 'format': 'text'}
        ],
    'outputs': [],
    'script': '''
from datetime import datetime
#file_Name = '/home/vagrant/tangelo/tangelo_demo/proj/lenna__'+ datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]+'.jpg'
the_image.save(file_Name)
'''
}

outputFileName = {
        'type': 'string',
        'format': 'jpg',
        'text': "/home/vagrant/tangelo/tangelo_demo/proj/lennaimage.jpg"
}

#running the single task
output = girder_worker.run(save_image, {'the_image': lenna, 'file_Name': outputFileName})

-----------error----------

Traceback (most recent call last):
  File "girderWorkerStandAlone.py", line 37, in <module>
    output = girder_worker.run(save_image, {'the_image': lenna, 'file_Name': outputFileName})
  File "/home/vagrant/girder_env/local/lib/python2.7/site-packages/girder_worker/utils.py", line 291, in wrapped
    return fn(*args, **kwargs)
  File "/home/vagrant/girder_env/local/lib/python2.7/site-packages/girder_worker/__init__.py", line 251, in run
    d, **dict({'task_input': task_input}, **kwargs))
  File "/home/vagrant/girder_env/local/lib/python2.7/site-packages/girder_worker/io/__init__.py", line 104, in fetch
    return _fetch_map[mode](spec, **kwargs)
  File "/home/vagrant/girder_env/local/lib/python2.7/site-packages/girder_worker/io/__init__.py", line 31, in _inline_fetch
    return spec['data']
KeyError: 'data'

Unify code style between girder_worker and girder

This probably means adopting girders style 😸

Mark timed out tasks as such in Girder

If a task times out (via Celery), the task failure hook should be able to tell that the exception that caused the task to fail was a time limit exception. GW should then mark the Girder job as timed out rather than failed, this would require a new "Timed out" status be added to Girder.

Resolve error that occurs when docker task is run as non-root user

@zachmullen

When i run a docker task that uses matplotlib i get the following error:

Pulling docker image: dsarchive/histomicstk
Running container: "docker run -u 1000 -v /media/common/EmoryImageAnnotationPlatform/code/girder_worker/srclnx/tmp/tmp8mAMxi:/data dsarchive/histomicstk StandaloneColorDeconvolution --stainColor_3 "0.0, 0.0, 0.0" /data/A.png "0.65, 0.7, 0.29" "0.07, 0.99, 0.11" /data/stain_1.png /data/stain_2.png /data/stain_3.png"
Traceback (most recent call last):
  File "StandaloneColorDeconvolution/StandaloneColorDeconvolution.py", line 3, in <module>
    import skimage.io
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/__init__.py", line 15, in <module>
    reset_plugins()
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 93, in reset_plugins
    _load_preferred_plugins()
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 73, in _load_preferred_plugins
    _set_plugin(p_type, preferred_plugins['all'])
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 85, in _set_plugin
    use_plugin(plugin, kind=plugin_type)
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 255, in use_plugin
    _load(name)
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 299, in _load
    fromlist=[modname])
  File "/build/miniconda/lib/python2.7/site-packages/skimage/io/_plugins/matplotlib_plugin.py", line 3, in <module>
    import matplotlib.pyplot as plt
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
    rcParams = rc_params()
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 965, in rc_params
    fname = matplotlib_fname()
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 794, in matplotlib_fname
    configdir = _get_configdir()
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 649, in _get_configdir
    return _get_config_or_cache_dir(_get_xdg_config_dir())
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 626, in _get_config_or_cache_dir
    return _create_tmp_config_dir()
  File "/build/miniconda/lib/python2.7/site-packages/matplotlib/__init__.py", line 555, in _create_tmp_config_dir
    tempdir = os.path.join(tempdir, 'matplotlib-%s' % getpass.getuser())
  File "/build/miniconda/lib/python2.7/getpass.py", line 158, in getuser
    return pwd.getpwuid(os.getuid())[0]
KeyError: 'getpwuid(): uid not found: 1000'

A fix for the same error is reported here on the matplotlib repository but that wont be available until matplotlib 1.5.2 is released.

@jcfr proposed a fix to a similar problem here

girder_worker errors when dockerd is run with --selinux-enabled

Running girder_worker with the docker plugin enabled when dockerd was started with the flag --selinux-enabled results in errors relating to file access and chmod when attempting to run a container. Cf. the output below. Starting dockerd without this flag results in a clean run.

INFO:root:Created LRU Cache for 'tilesource' with 1934 maximum size
WARNING:ctk_cli.module:'reference' attribute of 'file' is not part of the spec yet (CTK issue #623)

>> CLI Parameters ...

Namespace(analysis_mag=20.0, analysis_roi=[14504.0, 17107.0, 767.0, 811.0], analysis_tile_size=4096.0, foreground_threshold=60.0, inputImageFile='/mnt/girder_worker/data/TCGA-02-0010-01Z-00-DX4.07de2e55-a8fe-40ee-9e98-bcb78050b9f7.svs/TCGA-02-0010-01Z-00-DX4.07de2e55-a8fe-40ee-9e98-bcb78050b9f7.svs', local_max_search_radius=10.0, max_radius=30.0, min_fgnd_frac=0.5, min_nucleus_area=80.0, min_radius=20.0, outputNucleiAnnotationFile='/mnt/girder_worker/data/output.anot', reference_mu_lab=[8.63234435, -0.11501964, 0.03868433], reference_std_lab=[0.57506023, 0.10403329, 0.01364062], scheduler_address='', stain_1='hematoxylin', stain_2='eosin', stain_3='null')
Traceback (most recent call last):
  File "NucleiDetection/NucleiDetection.py", line 368, in <module>
    main(CLIArgumentParser().parse_args())
  File "NucleiDetection/NucleiDetection.py", line 182, in main
    raise IOError('Input image file does not exist.')
IOError: Input image file does not exist.
[2017-04-25 10:44:36,516] ERROR: Error setting perms on docker tempdir /home/neal/work/DSA-dev/tmp/tmpYxc3a3.
STDOUT: 
STDERR:chmod: /mnt/girder_worker/data: Permission denied
chmod: /mnt/girder_worker/data: Permission denied

Exception: Docker tempdir chmod returned code 1.
  File "/home/neal/work/DSA-dev/virtualenv/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/neal/work/DSA-dev/virtualenv/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/neal/work/DSA-dev/girder_worker/girder_worker/tasks.py", line 17, in run
    return core.run(*pargs, **kwargs)
  File "/home/neal/work/DSA-dev/girder_worker/girder_worker/core/utils.py", line 122, in wrapped
    return fn(*args, **kwargs)
  File "/home/neal/work/DSA-dev/girder_worker/girder_worker/core/__init__.py", line 366, in run
    events.trigger('run.finally', info)
  File "/home/neal/work/DSA-dev/girder_worker/girder_worker/core/events.py", line 73, in trigger
    handler['handler'](e)
  File "/home/neal/work/DSA-dev/girder_worker/girder_worker/plugins/docker/__init__.py", line 99, in task_cleanup
    raise Exception('Docker tempdir chmod returned code %d.' % p.returncode)

Ability to have girder_worker copy the whole parent item for input file parameters

Issue

This issue concerns the development of a mechanism to instruct girder_worker to optionally download/copy the whole parent item of input file parameters into the docker container.

Background

I faced this need while developing an infrastructure to automatically generate REST end-points for slicer execution model CLIs. A large part of this infrastructure is the conversion from slicer execution model xml spec to girder_worker task spec along with the input/output bindings.

For girder_worker tasks where one of the inputs is a girder item, if the item contains a single file then girder_worker will copy the file to the worker machine and provide a path of this file to the docker task. Instead, if the input girder item contains multiple files, then girder_worker will copy all files of the item into a newly created directory on the worker machine and give the path of the this directory to the docker task.

With image files, the following two scenarios occur:

Image information spanning multiple files: This is the case with dicom series images wherein the header information is stored in the .mhd file and image data is stored in dicom files located adjacent to the .mhd file.
Image information is in a single file: This is the case with images stored in .mha, .nrrd, and .png formats wherein both the header and image data is stored in a single file.

To support both the aforementioned scenarios, I would like to map input parameters of type image in slicer xml spec to a file on girder and have an ability to instruct girder_worker to copy the whole parent item into a directory on the worker machine and get a path to the specified file within it for use in the docker task.

Switch to CircleCI 2.0

Running tasks in parallel

Hi,

Does the girder worker support parallel execution of tasks inside a workflow.
For instance, lets say I have a workflow having tasks T1,T2,T3,T4... Is there a way I can run tasks T2 and T3 in parallel after task T1 is completed?

Thanks.

Flush logs before status is complete

Currently, the job status can change to a completed state (error, success, or cancelled) without the logs being fully flushed.

Docker: support flag-type container args

The Slicer CLI specification supports boolean input types, but does not allow them to be passed in the form of --flag=false, only --flag or not. We currently don't support this if the flag is a task input, but we could do so using a new token syntax in container_args, e.g.

$flag{--bool-flag-name}

Which would be bound to a boolean input that would cause --bool-flag-name to be passed to the CLI if true, and omitted if false.

Look at migrating docker plugin to use Docker SDK for Python

Move away from popen to https://github.com/docker/docker-py

converter_path raises NetworkXNoPath exception without specifying src and dest types

@zachmullen @danlamanna

worker.format.converter_path raises NetworkXNoPath but does not specify source and target types between which the conversion is being made.

Below is my task spec:

{
  "auto_convert": true,
  "cleanup": true,
  "inputs": {
    "foreground_threshold": {
      "data": "160",
      "format": "json",
      "mode": "inline",
      "type": "number"
    },
    "inputImageFile": {
      "api_url": "http://localhost:8080/api/v1",
      "format": "string",
      "id": "573cc705a848737a396e529e",
      "mode": "girder",
      "name": "Easy1.png",
      "resource_type": "item",
      "token": "NOWAfxaQe3Es7SfGMa4VbEDFhnamHhKyJCjBX54nh0Yv1yGNd1eJxZcN6Gh0EplM",
      "type": "string"
    },
    "local_max_search_radius": {
      "data": "10",
      "format": "json",
      "mode": "inline",
      "type": "number"
    },
    "max_radius": {
      "data": "7",
      "format": "json",
      "mode": "inline",
      "type": "number"
    },
    "min_nucleus_area": {
      "data": "80",
      "format": "json",
      "mode": "inline",
      "type": "number"
    },
    "min_radius": {
      "data": "4",
      "format": "json",
      "mode": "inline",
      "type": "number"
    },
    "stain_1": {
      "data": "hematoxylin",
      "format": "json",
      "mode": "inline",
      "type": "string"
    },
    "stain_2": {
      "data": "eosin",
      "format": "json",
      "mode": "inline",
      "type": "string"
    },
    "stain_3": {
      "data": "null",
      "format": "json",
      "mode": "inline",
      "type": "string"
    }
  },
  "jobInfo": {
    "headers": {
      "Girder-Token": "S3N0OqmQ5dOHXW4YMpNKT8PE5I1jnxWlBMacouZ0PkuVby9VgM5G7tFZ6VNYEnM1"
    },
    "logPrint": true,
    "method": "PUT",
    "reference": "573cca08a8487305d18b1a4f",
    "url": "http://localhost:8080/api/v1/job/573cca08a8487305d18b1a4f"
  },
  "outputs": {
    "outputNucleiAnnotationFile": {
      "api_url": "http://localhost:8080/api/v1",
      "format": "string",
      "mode": "girder",
      "name": "Easy1_nuclei.anot",
      "parent_id": "573cc72ca848737a396e52a0",
      "parent_type": "folder",
      "token": "NOWAfxaQe3Es7SfGMa4VbEDFhnamHhKyJCjBX54nh0Yv1yGNd1eJxZcN6Gh0EplM",
      "type": "string"
    },
    "outputNucleiMaskFile": {
      "api_url": "http://localhost:8080/api/v1",
      "format": "string",
      "mode": "girder",
      "name": "Easy1_seg.png",
      "parent_id": "573cc72ca848737a396e52a0",
      "parent_type": "folder",
      "token": "NOWAfxaQe3Es7SfGMa4VbEDFhnamHhKyJCjBX54nh0Yv1yGNd1eJxZcN6Gh0EplM",
      "type": "string"
    }
  },
  "task": {
    "container_args": [
      "NucleiSegmentation",
      "--foreground_threshold",
      "160",
      "--local_max_search_radius",
      "10",
      "--max_radius",
      "7",
      "--min_nucleus_area",
      "80",
      "--min_radius",
      "4",
      "--stain_1",
      "hematoxylin",
      "--stain_2",
      "eosin",
      "--stain_3",
      "null",
      "/mnt/girder_worker/data/Easy1.png",
      "/mnt/girder_worker/data/Easy1_seg.png",
      "/mnt/girder_worker/data/Easy1_nuclei.anot"
    ],
    "docker_image": "dsarchive/histomicstk:dev",
    "inputs": [
      {
        "format": "string",
        "id": "inputImageFile",
        "name": "Input Image",
        "target": "filepath",
        "type": "string"
      },
      {
        "default": {
          "data": 160,
          "format": "number"
        },
        "format": "number",
        "id": "foreground_threshold",
        "type": "number"
      },
      {
        "default": {
          "data": 10,
          "format": "number"
        },
        "format": "number",
        "id": "local_max_search_radius",
        "type": "number"
      },
      {
        "default": {
          "data": 7,
          "format": "number"
        },
        "format": "number",
        "id": "max_radius",
        "type": "number"
      },
      {
        "default": {
          "data": 80,
          "format": "number"
        },
        "format": "number",
        "id": "min_nucleus_area",
        "type": "number"
      },
      {
        "default": {
          "data": 4,
          "format": "number"
        },
        "format": "number",
        "id": "min_radius",
        "type": "number"
      },
      {
        "default": {
          "data": "hematoxylin",
          "format": "string"
        },
        "format": "string",
        "id": "stain_1",
        "type": "string"
      },
      {
        "default": {
          "data": "eosin",
          "format": "string"
        },
        "format": "string",
        "id": "stain_2",
        "type": "string"
      },
      {
        "default": {
          "data": "null",
          "format": "string"
        },
        "format": "string",
        "id": "stain_3",
        "type": "string"
      }
    ],
    "mode": "docker",
    "name": "NucleiSegmentation",
    "outputs": [
      {
        "format": "string",
        "id": "outputNucleiMaskFile",
        "name": "Output Nuclei Segmentation Mask",
        "path": "Easy1_seg.png",
        "target": "filepath",
        "type": "string"
      },
      {
        "format": "string",
        "id": "outputNucleiAnnotationFile",
        "name": "Output Nuclei Annotation File",
        "path": "Easy1_nuclei.anot",
        "target": "filepath",
        "type": "string"
      }
    ],
    "pull_image": true
  },
  "validate": false
}

which raises the following exception:

<class 'networkx.exception.NetworkXNoPath'>: 
  File "/media/common/EmoryImageAnnotationPlatform/code/girder_worker/srclnx/girder_worker/__main__.py", line 28, in run
    retval = girder_worker.run(*pargs, **kwargs)
  File "girder_worker/utils.py", line 295, in wrapped
    return fn(*args, **kwargs)
  File "girder_worker/__init__.py", line 277, in run
    {'task_input': task_input, 'fetch': False}, **kwargs))
  File "girder_worker/__init__.py", line 150, in convert
    Validator(type, output['format'])):
  File "girder_worker/format/__init__.py", line 113, in converter_path
    raise NetworkXNoPath

Update Ansible to call Docker GC

see #97

At least provide the option to set up cron jobs at some scheduled interval to call Docker GC.

When pulling docker images without a tag, we get ALL tags

This was introduced as an issue in PR #96.

This happens both with implicit pulls and explicit pulls, including running BusyBox.

Specifically, there is no equivalent of not specifying --all-tags for the pull command when accessed through python. For BusyBox, we can fix this by adding :latest to the image name. For the general case, before we pull or run an image, we would need to check if there is a tag or digest. If not, we would need to query if there is already a local image of the specified name, and, if not, ask the remote server (probably hub.docker.com) what tags are available, and then use one of them (latest being preferred).

Streaming IO support in docker mode

Just talked with @cdeepakroy about this and wanted to dump the proposal here so people could comment on it before I actually get started.

As of right now, IO for docker mode is limited to the following:

Files mapped into the container as file inputs
Command line arguments as inputs
stdout/stderr pipes from the container process as outputs

All of this is done synchronously due to the design of our execution model, i.e. the output must be completely gathered by the worker before it is sent on its way. I propose adding two new features:

The ability to stream task outputs as they happen rather than waiting for the entire process to complete before sending the first byte of output. This will involve codifying streaming types for IO in some way or another, and I'd only be targeting their usage in the docker mode at first, but they could be generalized to other modes later.
The ability to use named pipes instead of just stdout/stderr, since those might be overloaded for things like logging. This could even be used to support something like progress reporting by having a special progress pipe. Ostensibly, docker should support mapping FIFO named pipes into the container just like normal files.

Please provide any feedback on these features, as well as ideas on how to structure the API/spec for them.

Application plugins not being loaded after entry point refactor

[celery]
app_main=girder_worker
# broker=mongodb://127.0.0.1/girder_worker
broker=amqp://[email protected]/

[girder_worker]
# Root dir where temp files for jobs will be written
tmp_root=/tmp
# tmp_keep=true
# Comma-separated list of plugins to enable
plugins_enabled=r,girder_io,docker
# Colon-separated list of additional plugin loading paths
plugin_load_path=

Ability to report progress from girder_worker tasks using named pipes

This issue involves developing a way to enable girder_worker tasks to report progress using named pipes that are currently used to stream read/write inputs/outputs.

It would be nice to abstract out the creation of progress reporting pipes in the task code by developing an API function that can simply called to report progress.

Ability to include metadata on girder_io outputs

This would just be an additional metadata field on a Girder output binding that would get set on the item after any data upload.

girder-worker gives warning about pickle

Running girder-worker gives the following warning. It seems we should programmatically set the specified option to get rid of this warning.

[2016-06-10 09:05:18,289: WARNING/MainProcess] /Users/jeff/.virtualenvs/girder_worker/lib/python2.7/site-packages/celery/apps/worker.py:161: CDeprecationWarning:
Starting from version 3.2 Celery will refuse to accept pickle by default.

The pickle serializer is a security concern as it may give attackers
the ability to execute any command.  It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.

If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::

    CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']

You must only enable the serializers that you will actually use.


  warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))

Bump girder client version

For general updates and specifically to address girder/girder#1654 (comment)

Ansible role doesn't set plugins_enabled in config file

To activate plugins, they must be added to the config file line plugins_enabled=.

I've fixed this for my own case by adding the following to the bottom of pip.yml. Two problems with this approach are that changes the worker.dist.cfg in place, and instead should alter a worker.local.cfg, and this may not work with multiple plugins to properly form the config file line correctly.

- lineinfile:
     dest: "{{ girder_worker_path }}/girder_worker/worker.dist.cfg"
     regexp: '^plugins_enabled='
     line: 'plugins_enabled={{ girder_worker_plugins }}'

Consider removing compatibility error in docker plugin

This references the exception raised in the docker plugin when starting in a non-linux platform. The native docker binaries for Windows and Mac have been out for a while now. I've been using the docker plugin on Mac without any issues. Perhaps we could replace it with a warning about updating to native docker instead?

Add task_uri to workflows which are resolved by load()

In order to construct workflows more easily from separate JSON components, enable task_uri as an alternative to the task key in workflow steps. These would be resolved to task specs within girder_worker.load().

objectlist to jsonlines table converter missing

Create this converter. Without it certain conversions (e.g. csv to jsonlines) is not possible.

Add more informative type/format validation

Currently a bad type and missing format will report that the format field is missing. Better would be to inform the user of the bad or missing type or format.

Bind volumes in Docker

Provide a set of volumes that can be mounted in all docker containers. This could be in the config file, for instance.

Specifically, this could be used to allow docker containers to access a list of volumes in a standardized way.

One possible format would be a JSON-encoded list:

[docker]
volumes=["/home/ubuntu/files:/opt/files:ro","/home/ubuntu/data:/opt/data:ro"]

I'd recommend defaulting mounted volumes to read-only unless they are explicitly specified as read-write.

Using XML or JSON spec to create a girder worker

Hi,

I was interested to know if girder worker can use XML or JSON spec to populate and run a workflow. I haven't seen such so far in the doc.
Rather than hard coding all the tasks of the workflow, if we can have a xml representation then it would be more dynamic.

Thanks.

girder / girder_worker Goto Github PK

girder_worker's Introduction

Girder Worker

girder_worker's People

Contributors

Stargazers

Watchers

Forkers

girder_worker's Issues

Deliverables:

Technical Notes:

Depends on: None

Deliverables:

Technical Notes:

Depends on:

Issue

Background

Recommend Projects

Recommend Topics

Recommend Org

Jobs