GithubHelp home page GithubHelp logo

yadage's Introduction

yadage - yaml based adage

arXiv DOI PyPI version GitHub Actions Status: CI Code Coverage Language grade: Python CodeFactor Documentation Status Code style: black

This package reads and executes workflows adhering to the workflow JSON schemas defined at https://github.com/yadage/yadage-schemas such as the ones stored in the community repository https://github.com/yadage/yadage-workflows. For executing the individual steps it mainly uses the packtivity python bindings provided by https://github.com/yadage/packtivity.

Example Workflow

cat << 'EOF' > workflow.yml
stages:
- name: hello_world
  dependencies: [init]
  scheduler:
    scheduler_type: singlestep-stage
    parameters:
      name: {step: init, output: name}
      outputfile: '{workdir}/hello_world.txt'
    step:
      process:
        process_type: 'string-interpolated-cmd'
        cmd: 'echo Hello my Name is {name} | tee {outputfile}'
      publisher:
        publisher_type: 'frompar-pub'
        outputmap:
          outputfile: outputfile
      environment:
        environment_type: 'docker-encapsulated'
        image: busybox
EOF

You can try this workflow via

yadage-run -p name="John Doe"

For more thorough examples, please see the documentation

Possible Backends:

Yadage can run on various backends such as multiprocessing pools, ipython clusters, or celery clusters. If human intervention is needed for certain steps, it can also be run interactively.

Published versions of related packages (main dependencies of yadage)

package version
packtivity PyPI version
yadage-schemas PyPI version
adage PyPI version

yadage's People

Contributors

alintulu avatar lukasheinrich avatar matthewfeickert avatar vvolkl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yadage's Issues

upgrade pydot2 to be forward-setuptools-friendly

Current behaviour

Yadage depends on pydot2 for visualisation:

$ rg -C 3 pydot2 setup.py 
49-        "viz": [
50-            # manually adding extras of adage[extra] because of pip
51-            # issue https://github.com/pypa/pip/issues/3189
52:            "pydot2",
53-            "pygraphviz",
54-            "pydotplus",
55-        ],

The last release of pydot2 happened in January 2014.

With the new versions of setuptools, this leads to installation problems. Here is a small demonstrator:

$ mkvirtualenv test -p python3.8
$ pip install --upgrade pip setuptools
$ cd src/yadage
$ pip install '.[viz]'
...
   Complete output (1 lines):
    error in pydot2 setup command: use_2to3 is invalid.

Workaround

Do not update setuptools, then things will work.

Expected behaviour

It would be really great to make yadage dependent on latest-greatest dot-handling packages, instead of relying on the too old pydot2 package. For example, upgrade to pydot, which is being actively maintained.

Notes

Ditto for other viz dependencies, e.g. pydotplus was last released in December 2014. (But it works with latest setuptools.)

Issue installing on OSX

Hi!

Just wanted to post that I had issues installing yadage via pip. I had been getting errors as in the attached output log (output.txt) from a simple pip install yadage. After googling around for a bit, I finally found that manually updating jq to its HEAD fixed the issue as done in:

doloopwhile/pyjq#1

After the jq install was finished, the straightforward pip install worked fine.

Just in case this comes up -- If others run into this, it might be good to have a note in the docs?

Thanks!
-Larry

`yadage.manualcli.preview` failing

The CI is currently failing for PR #108 with the following error

______________________________ test_manual_remove ______________________________

tmpdir = local('/tmp/pytest-of-runner/pytest-0/test_manual_remove0')

    def test_manual_remove(tmpdir):
        runner = CliRunner()
        workdir = os.path.join(str(tmpdir), "workdir")
        metadir = os.path.join(str(tmpdir), "metadir")
        statefile = os.path.join(str(tmpdir), "state.json")
        result = runner.invoke(
            yadage.manualcli.init,
            [
                workdir,
                "workflow.yml",
                "-t",
                "tests/testspecs/local-helloworld",
                "-s",
                "filebacked:" + statefile,
                "-p",
                "par=value",
                "--metadir",
                metadir,
            ],
        )
        assert result.exit_code == 0
    
        result = runner.invoke(
            yadage.manualcli.preview, ["-s", "filebacked:" + statefile, "/init"]
        )
>       assert result.exit_code == 0
E       assert 1 == 0
E        +  where 1 = <Result TypeError("object of type 'dict_keyiterator' has no len()")>.exit_code

I'm not sure what is going on here, so if @lukasheinrich has input on this that would be great.

coerce dataarg PR causing regression in utils.prepare_meta

PR #92 introduced a regression that is hitting recast-atlas and by proxy is also hitting reana-workflow-engine-yadage (c.f. reanahub/reana-workflow-engine-yadage#220 and reanahub/reana-workflow-engine-yadage#219) when it got into patch release 0.20.2.

This is showing up in calls to the steering API where metadir=None

ys = YadageSteering.create(
metadir=metadir,

def create(cls, **kwargs):
dataopts = kwargs.get("dataopts") or {}
if kwargs["dataarg"].startswith("local:"):
dataarg = kwargs["dataarg"].split(":", 1)[1]
metadir = kwargs.get("metadir")
metadir = metadir or "{}/_yadage/".format(dataarg)
if dataopts.get("overwrite") and os.path.exists(metadir):
shutil.rmtree(metadir)
else:
metadir = kwargs["metadir"]
accept_metadir = kwargs.pop("accept_metadir", False)
kw = copy.deepcopy(kwargs)
kw["metadir"] = metadir
prepare_meta(
metadir, accept_metadir
) # meta must be here because data model might store stuff here

where prepare_meta is passing a metadir of None

yadage/yadage/utils.py

Lines 222 to 229 in 333fedc

def prepare_meta(metadir, accept=False):
"""
prepare workflow meta-data directory
:param metadir: the meta-data directory name
:param accept: whether to accept an existing metadata directory
"""
if os.path.exists(metadir):

Minimal failing example

(base) feickert@ThinkPad-X1:/tmp$ pyenv virtualenv 3.9.6 venv
(base) feickert@ThinkPad-X1:/tmp$ pyenv activate venv
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install --upgrade pip setuptools wheel
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install 'yadage[viz]==0.20.2'
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install recast-atlas[local]
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip list | grep 'adage\|recast\|pydot'
adage              0.10.3
pydot              1.4.2
pydotplus          2.0.2
recast-atlas       0.1.9
yadage             0.20.2
yadage-schemas     0.10.7
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip list
Package            Version
------------------ ---------
adage              0.10.3
attrs              21.4.0
certifi            2021.10.8
charset-normalizer 2.0.11
checksumdir        1.2.0
click              8.0.3
decorator          5.1.1
glob2              0.7
idna               3.3
jq                 1.2.2
jsonpath-rw        1.4.0
jsonpointer        2.2
jsonref            0.2
jsonschema         4.4.0
mock               4.0.3
networkx           2.6.3
packtivity         0.14.24
pip                22.0.3
ply                3.11
psutil             5.9.0
pydot              1.4.2
pydotplus          2.0.2
pygraphviz         1.7
pyparsing          3.0.7
pyrsistent         0.18.1
PyYAML             6.0
recast-atlas       0.1.9
requests           2.27.1
setuptools         60.8.1
six                1.16.0
urllib3            1.26.8
wheel              0.37.1
yadage             0.20.2
yadage-schemas     0.10.7
(venv) feickert@ThinkPad-X1:/tmp$ recast run testing/busyboxtest --backend local --tag hello
2022-02-07 11:45:57,129 | packtivity.asyncback |   INFO | configured pool size to 12
2022-02-07 11:45:57,141 | recastatlas.subcomma |  ERROR | caught exception
Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 21, in run_workflow
    run_workflow(**spec)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_api.py", line 19, in run_workflow
    with steering_ctx(*args, **kwargs):
  File "/home/feickert/.pyenv/versions/3.9.6/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_api.py", line 89, in steering_ctx
    ys = YadageSteering.create(
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_object.py", line 61, in create
    prepare_meta(
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/utils.py", line 229, in prepare_meta
    if os.path.exists(metadir):
  File "/home/feickert/.pyenv/versions/3.9.6/lib/python3.9/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/subcommands/run.py", line 56, in run
    run_sync(name, spec, backend=backend)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/__init__.py", line 77, in run_sync
    BACKENDS[backend].run_workflow(name, spec)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 23, in run_workflow
    raise FailedRunException
recastatlas.exceptions.FailedRunException
Error: Workflow failed

Example of bug location

There was a very small number of changes to yadage src between v0.20.1 and v0.20.2 (c.f.: v0.20.1...v0.20.2) and if I locally revert PR #92 and install this version in the same working Python virtual environment as above then things pass

(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ pwd
/home/feickert/Code/GitHub/yadage/yadage
(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ git log | head -n 14
commit b83c5470d8dc6762d1971b5dd704ac4a3488841f
Author: Matthew Feickert <[email protected]>
Date:   Mon Feb 7 11:35:48 2022 -0600

    Revert "coerce dataarg (#92)"
    
    This reverts commit b03a637f2afa53f5967472e5df7d17b229905e0a.

commit 333fedcc57db752d398bf1f1a63c2c166788768f
Author: Matthew Feickert <[email protected]>
Date:   Thu Feb 3 14:30:13 2022 -0600

    Bump version: 0.20.1 โ†’ 0.20.2

(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ python -m pip --quiet install --upgrade -e .
(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ python -m pip show yadage
Name: yadage
Version: 0.20.2
Summary: yadage - YAML based adage
Home-page: https://github.com/yadage/yadage
Author: Lukas Heinrich
Author-email: [email protected]
License: UNKNOWN
Location: /home/feickert/Code/GitHub/yadage/yadage
Requires: adage, checksumdir, click, glob2, jq, jsonpath_rw, jsonpointer, jsonref, jsonschema, packtivity, psutil, pyyaml, requests, yadage-schemas
Required-by:
(venv) feickert@ThinkPad-X1:/tmp$ cd /tmp/
(venv) feickert@ThinkPad-X1:/tmp$ recast run testing/busyboxtest --backend local --tag hello
2022-02-07 12:03:09,834 | packtivity.asyncback |   INFO | configured pool size to 12
2022-02-07 12:03:10,246 |      yadage.creators |   INFO | no initialization data
2022-02-07 12:03:10,246 |    adage.pollingexec |   INFO | preparing adage coroutine.
2022-02-07 12:03:10,246 |                adage |   INFO | starting state loop.
2022-02-07 12:03:10,321 |     yadage.wflowview |   INFO | added </hello:0|defined|unknown>
2022-02-07 12:03:11,011 |     yadage.wflowview |   INFO | added </world:0|defined|unknown>
2022-02-07 12:03:11,880 |    adage.pollingexec |   INFO | submitting nodes [</hello:0|defined|known>]
2022-02-07 12:03:11,881 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2022-02-07 12:03:11,882 |      pack.hello.step |   INFO | starting file logging for topic: step
2022-02-07 12:03:13,733 |           adage.node |   INFO | node ready </hello:0|success|known>
2022-02-07 12:03:13,734 |    adage.pollingexec |   INFO | submitting nodes [</world:0|defined|known>]
2022-02-07 12:03:13,735 |      pack.world.step |   INFO | starting file logging for topic: step
2022-02-07 12:03:15,650 |           adage.node |   INFO | node ready </world:0|success|known>
2022-02-07 12:03:15,671 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2022-02-07 12:03:15,671 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2022-02-07 12:03:15,671 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 2 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2022-02-07 12:03:18,196 |                adage |   INFO | adage state loop done.
2022-02-07 12:03:18,197 |                adage |   INFO | execution valid. (in terms of execution order)
2022-02-07 12:03:18,197 |                adage |   INFO | workflow completed successfully.
2022-02-07 12:03:18,197 |  yadage.steering_api |   INFO | done. dumping workflow to disk.
2022-02-07 12:03:18,198 |  yadage.steering_api |   INFO | visualizing workflow.
2022-02-07 12:03:18,545 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result testing/busyboxtest recast-hello:
--------------
[]

Release patch release v0.20.3 with API breaking changes reverted

As outlined in #116 (comment), to deal with the fact that PR #92 introduced an API breaking change into patch release v0.20.2 we should

  • Make branch v0.20.x in yadage where we revert PR #92 so there isn't an API breaking change in it. This branch is a release branch for v0.20.X bug fixes into the future.
  • Ensure the test suite is passing on branch v0.20.x
  • Make yadage release v0.20.3 off of branch v0.20.x so that we can get the visualization fixes.
  • Test v0.20.3 against recast-atlas and reana-workflow-engine-yadage and once that is working we make a PR that is the equivalent of reanahub/reana-commons#333.

A bug in step.yml when including {workdir} in a comment in script

I want to report an error when running reana workflows if I have {workdir} as a part of a comment in step.yaml. To reproduce the issue, one can use this example.

The example runs as expected: following the readme or simply checkout and run

source /afs/cern.ch/user/r/reana/public/reana/bin/activate
export REANA_SERVER_URL=https://reana.cern.ch/
export REANA_ACCESS_TOKEN=<your_reana_token>
reana-client create -n myanalysis
export REANA_WORKON=myanalysis
reana-client upload
reana-client start

However, if I add line 14 to workflow/steps.yml as shown in the attached picture, it won't run. With the failed workflow, reana-client logs -w <workflow> gives following errors:

2022-04-16 20:08:34,310 | adage | MainThread | ERROR | some weird exception caught in adage process loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/adage/__init__.py", line 51, in run_polling_workflow
    for stepnum, controller in enumerate(coroutine):
  File "/usr/local/lib/python3.8/site-packages/adage/pollingexec.py", line 93, in adage_coroutine
    process_dag(controller,submit_decider)
  File "/usr/local/lib/python3.8/site-packages/adage/pollingexec.py", line 64, in process_dag
    controller.submit_nodes(nodes)
  File "/usr/local/lib/python3.8/site-packages/adage/wflowcontroller.py", line 45, in submit_nodes
    ctrlutils.submit_nodes(nodes, self.backend)
  File "/usr/local/lib/python3.8/site-packages/adage/controllerutils.py", line 99, in submit_nodes
    nodeobj.resultproxy = backend.submit(nodeobj.task)
  File "/usr/local/lib/python3.8/site-packages/yadage/backends/federatedbackend.py", line 23, in submit
    return self.routedsubmit(task)
  File "/usr/local/lib/python3.8/site-packages/yadage/backends/packtivitybackend.py", line 83, in routedsubmit
    return self.backends["packtivity"].submit(
  File "/usr/local/lib/python3.8/site-packages/reana_workflow_engine_yadage/externalbackend.py", line 89, in submit
    job = build_job(spec["process"], parameters, state, self.config)
  File "/usr/local/lib/python3.8/site-packages/packtivity/syncbackends.py", line 101, in build_job
    return handler(process, parameters, state)
  File "/usr/local/lib/python3.8/site-packages/packtivity/handlers/process_handlers.py", line 36, in interp_script
    script = process_spec["script"].format(**flattened_kwargs)
KeyError: 'workdir'

The issue is only related to step.yml, if the workflow doesn't use step.yml but only workflow.yml, it seems working.

Since this is just a comment in bash command, it is not expected be interpreted as a variable. So save behaviour should be expected with and without the additional line.

image

`yadage-validate` exit codes

We'd like to use yadage-validate in Travis CI recipes. I noticed that yadage always returns exit code 0 even in case of failures.

Example with good YAML:

$ yadage-validate world_population_analysis.yaml 
workflow validates against schema
$ echo $?
0

Example with bad YAML:

$ yadage-validate invalid.yaml 
ERROR:yadage.validator_workflow:validation error
Traceback (most recent call last):
  File "/home/simko/.virtualenvs/yadage/lib/python2.7/site-packages/yadage/validator_workflow.py", line 36, in main
    workflow, toplevel=toplevel, schemadir=schemadir, validate=True)
  File "/home/simko/.virtualenvs/yadage/lib/python2.7/site-packages/yadage/workflow_loader.py", line 5, in workflow
    data = yadageschemas.load(source, toplevel, schema_name, schemadir, validate)
  File "/home/simko/.virtualenvs/yadage/lib/python2.7/site-packages/yadageschemas/__init__.py", line 172, in load
    validator(schema_name,schemadir).validate(data)
  File "/home/simko/.virtualenvs/yadage/lib/python2.7/site-packages/jsonschema/validators.py", line 130, in validate
    raise error
ValidationError: {'step': {'process': {'cmd': 'jupyter nbconvert --output-dir="{outputdir}" world_population_analysis.ipynb', 'process_type': 'string-interpolated-cmd'}, 'publisher': {'publisher_type': 'frompar-pub', 'outputmap': {'outputfile': 'outputfile'}}, 'environment': {u'imagetag': u'latest', 'image': 'reanahub/reana-demo-worldpopulation', u'envscript': u'', 'environment_type': 'docker-encapsulated', u'env': {}, u'resources': []}}, 'scheduler_type': 'singlestep-stage', 'parameters': None} is not valid under any of the given schemas

Failed validating u'oneOf' in schema[u'properties'][u'stages'][u'items'][u'properties'][u'scheduler']:
    {u'oneOf': [{u'$ref': u'scheduler/singlestep-stage-schema.json#'},
                {u'$ref': u'scheduler/multistep-stage-schema.json#'}],
     u'type': u'object'}

On instance[u'stages'][0][u'scheduler']:
    {'parameters': None,
     'scheduler_type': 'singlestep-stage',
     'step': {'environment': {u'env': {},
                              'environment_type': 'docker-encapsulated',
                              u'envscript': u'',
                              'image': 'reanahub/reana-demo-worldpopulation',
                              u'imagetag': u'latest',
                              u'resources': []},
              'process': {'cmd': 'jupyter nbconvert --output-dir="{outputdir}" world_population_analysis.ipynb',
                          'process_type': 'string-interpolated-cmd'},
              'publisher': {'outputmap': {'outputfile': 'outputfile'},
                            'publisher_type': 'frompar-pub'}}}
workflow does not validate against schema
$ echo $?
0

It would be more comfortable if yadage-validate returned 0 when things are OK and 1 otherwise, as is the usual convention with other validating programs.

Allow empty arguments

Context

There are certain occasions where a Yadage step argument could, or could not be provided, making it optional. Let's imagine a user would like to propagate a sequence of space separated key=value pairs downstream. Something like:

--my_optional_arg "alpha=0.5 beta=0.8 ..."

All these values have their correspondent default values, so, in certain scenarios, it makes sense to leave the argument empty, so the default values are picked:

--my_optional_arg ""

However, this is not always possible in the current state of Yadage. ๐Ÿšซ

The problem

Sometimes, analyses use input YAML files containing some of the analyses configuration values. When specifying the path to these files, a convenient way to do so is by just passing the file name, and setting the dataopt argument initdir to the highest level folder containing them.

yadage-run \
    .workdir \
    worklow.yml \
    -p ... \
    -p ... \
    -d initdir=...

On those cases, Yadage treats empty string parameters as some kind of path, as all init step arguments which are empty strings get filled with the path specified in the -d initdir=<PATH>, which is not what the user wants.

This behaviour comes from the use of the relative workflow flag.

How to replicate

Launch a yadage analysis leaving an empty argument, and using the -d initdir=... syntax.

Possible workaround

Yadage syntax allow the the interpolation of strings, so there is a lot of flexibility when it comes to arguments propagation. However, as far as I know, there is not syntax for optional arguments. The only workaround I found, it is to interpolate not only the argument value, but the argument key too:

...

process_type: string-interpolated-cmd
cmd: my_binary --mandatory_arg_A 100 --mandatory_arg_B "input.yml" {optional_args}
# Where optional_args is "--my_optional_arg 'alpha=0.5 beta=0.8'"

I find this approach quite inconvenient, as the string interpolation syntax masks the optional argument keys (in this case there is only --my_optional_arg), making future iterations harder to debug.


Is there a native way to handle this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.