GithubHelp home page GithubHelp logo

packtivity's Introduction

packtivity

DOI PyPI

codecov Documentation Status

This package aims to collect implementations of both synchronous and asynchronous execution of preserved, but parametrized scientific computational tasks that come with batteries included, i.e. with a full specification of their software dependencies. In that sense they are packaged activities -- packtivities.

This package provides tools to validate and execute data processing tasks that are written according to the "packtivity" JSON schemas defined in yadage-schemas.

Packtivities define

  • the software environment
  • parametrized process descriptions (what programs to run within these environment) and
  • produces human and machine readable outputs (as JSON) of the resulting data fragments.

At run-time they are paired with a concrete set of parameters supplied as JSON documents and and external storage/state to actually execute these tasks.

Packtivity in Yadage

This package is used by yadage to execute the individual steps of yadage workflows.

Example Packtivity spec

This packtivity spec is part of a number of yadage workflow and runs the Delphes detector simulation on a HepMC file and outputs events in the LHCO and ROOT file formats. This packtivity is stored in a public location from which it can be later retrieved:

process:
  process_type: 'string-interpolated-cmd'
  cmd: 'DelphesHepMC  {delphes_card} {outputroot} {inputhepmc} && root2lhco {outputroot} {outputlhco}'
publisher:
  publisher_type: 'frompar-pub'
  outputmap:
    lhcofile: outputlhco
    rootfile: outputroot
environment:
  environment_type: 'docker-encapsulated'
  image: lukasheinrich/root-delphes

Usage

You can run the packtivity in a synchronous way by specifying the spec (can point to GitHub), all necessary parameters and attaching an external state (via the --read and --write flags).

packtivity-run -t from-github/phenochain delphes.yml \
  -p inputhepmc="$PWD/pythia/output.hepmc" \
  -p outputroot="'{workdir}/output.root'" \
  -p outputlhco="'{workdir}/output.lhco'" \
  -p delphes_card=delphes/cards/delphes_card_ATLAS.tcl \
  --read pythia --write outdir

Asynchronous Backends

In order to facilitate usage of distributed resources, a number of Asynchronous backends can be specified. Here is an example for IPython Parallel clusters

packtivity-run -b ipcluster --asyncwait \
  -t from-github/phenochain delphes.yml \
  -p inputhepmc="$PWD/pythia/output.hepmc" \
  -p outputroot="'{workdir}/output.root'" \
  -p outputlhco="'{workdir}/output.lhco'" \
  -p delphes_card=delphes/cards/delphes_card_ATLAS.tcl \
  --read pythia --write outdir

You can replacing the --asyncwait with --async flag in order to get a JSONable proxy representation with which to later on check on the job status. By default the proxy information is written to proxy.json (customizable via the -x flag):

packtivity-run -b celery --async \
  -t from-github/phenochain delphes.yml \
  -p inputhepmc="$PWD/pythia/output.hepmc" \
  -p outputroot="'{workdir}/output.root'" \
  -p outputlhco="'{workdir}/output.lhco'" \
  -p delphes_card=delphes/cards/delphes_card_ATLAS.tcl \
  --read pythia --write outdir

And at a later point in time you can check via:

packtivity-checkproxy proxy.json

External Backends

Users can implement their own backends to handle the JSON documents describing the packtivities. It can be enabled by using the fromenv backend and setting an environment variable specifying the module holding the backend and proxy classes. The format of the environment variable is module:backendclass:proxyclass. E.g.:

export PACKTIVITY_ASYNCBACKEND="externalbackend:ExternalBackend:ExternalProxy"

packtivity's People

Contributors

dependabot[bot] avatar lukasheinrich avatar matthewfeickert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ecaldwe1 nollde

packtivity's Issues

Tests fail on Celery for Python 3.7 only

=================================== FAILURES ===================================
__________________________________ test_known __________________________________

    def test_known():
        for known_backend in [
            "celery",
            "multiproc:4",
            "multiproc:auto",
            "foregroundasync",
            "externalasync:default",
        ]:
>           b = backend_from_string(known_backend)

tests/test_backends.py:13: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
packtivity/backendutils.py:192: in backend_from_string
    return backends[k]["default"](backendstring, backendopts)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

backendstring = 'celery', backendopts = {}

    @backend("celery")
    def celery_backend(backendstring, backendopts):
>       backend = asyncbackends.CeleryBackend(**backendopts)
E       AttributeError: module 'packtivity.asyncbackends' has no attribute 'CeleryBackend'

packtivity/backendutils.py:132: AttributeError
_________________________________ test_celery __________________________________

    def test_celery():
>       from packtivity.asyncbackends import CeleryProxy
E       ImportError: cannot import name 'CeleryProxy' from 'packtivity.asyncbackends' (/home/runner/work/packtivity/packtivity/packtivity/asyncbackends.py)

tests/test_proxies.py:5: ImportError

RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode

The use of bufsize=1 in subprocess.Popen calls

if stdin_content:
log.debug("stdin: \n%s", stdin_content)
argv = shlex.split(command_string)
log.debug("argv: %s", argv)
proc = subprocess.Popen(
argv,
stdin=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
bufsize=1,
close_fds=True,
)
proc.stdin.write(stdin_content.encode("utf-8"))
proc.stdin.close()
else:
proc = subprocess.Popen(
shlex.split(command_string),
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
bufsize=1,
close_fds=True,
)

is causing runtime warnings of the form (example: https://gitlab.cern.ch/recast-atlas/examples/helloworld)

/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:848: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:842: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
/home/feickert/.pyenv/versions/3.8.11/lib/python3.8/subprocess.py:848: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)

Minimal example

feickert@ThinkPad-X1:/tmp$ pyenv virtualenv 3.8.7 packtivity-issue77
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ pyenv activate packtivity-issue77
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ python -m pip install --upgrade pip 'setuptools<58.0.0' wheel  # c.f. https://github.com/reanahub/reana-client/issues/558
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ python -m pip install 'recast-atlas[local]==0.1.8' six
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ git clone ssh://[email protected]:7999/recast-atlas/examples/helloworld.git
(packtivity-issue77) feickert@ThinkPad-X1:/tmp$ cd helloworld/
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ cat run.sh 
#!/bin/bash

export RECAST_AUTH_USERNAME=secret
export RECAST_AUTH_PASSWORD=secret
export RECAST_AUTH_TOKEN=secret

eval "$(recast auth setup -a ${RECAST_AUTH_USERNAME} -a ${RECAST_AUTH_PASSWORD} -a ${RECAST_AUTH_TOKEN} -a default)"
eval "$(recast auth write --basedir authdir)"

$(recast catalogue add "${PWD}")
recast catalogue ls
recast catalogue describe examples/helloworld
recast catalogue check examples/helloworld

recast run examples/helloworld --backend local --tag debug
(packtivity-issue77) feickert@ThinkPad-X1:/tmp/helloworld$ bash run.sh 
You password is stored in the environment variables RECAST_AUTH_USERNAME,RECAST_AUTH_PASSWORD,YADAGE_SCHEMA_LOAD_TOKEN,YADAGE_INIT_TOKEN,RECAST_REGISTRY_USERNAME,RECAST_REGISTRY_PASSWORD,RECAST_REGISTRY_HOST,PACKTIVITY_AUTH_LOCATION. Run `eval $(recast auth destroy)` to clear your password or exit the shell.
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /home/feickert/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Wrote Authentication Data to authdir (Note! This includes passwords/tokens)
NAME                               DESCRIPTION                                                 EXAMPLES            TAGS                
atlas/atlas-conf-2018-041          ATLAS MBJ                                                   default                                 
examples/checkmate1                CheckMate Tutorial Example (Herwig + CM1)                   default                                 
examples/checkmate2                CheckMate Tutorial Example (Herwig + CM2)                   default                                 
examples/helloworld                An example recast configuration of ATLAS                    default                                 
examples/rome                      Example from ATLAS Exotics Rome Workshop 2018               default,newsignal                       
testing/busyboxtest                Simple, lightweight Functionality Test                      default                                 

examples/helloworld 
--------------------
description  : An example recast configuration of ATLAS
author       : lukasheinrich
toplevel     : /tmp/helloworld/specs

Nice job! Everything looks good.
2021-12-14 22:51:22,531 | packtivity.asyncback |   INFO | configured pool size to 12
2021-12-14 22:51:22,573 |      yadage.creators |   INFO | initializing workflow with initdata: {'name': 'hello'} discover: True relative: True
2021-12-14 22:51:22,573 |    adage.pollingexec |   INFO | preparing adage coroutine.
2021-12-14 22:51:22,573 |                adage |   INFO | starting state loop.
2021-12-14 22:51:22,627 |     yadage.wflowview |   INFO | added </init:0|defined|unknown>
2021-12-14 22:51:23,339 |     yadage.wflowview |   INFO | added </hello_world:0|defined|unknown>
2021-12-14 22:51:24,247 |    adage.pollingexec |   INFO | submitting nodes [</init:0|defined|known>]
2021-12-14 22:51:24,739 |       pack.init.step |   INFO | publishing data: <TypedLeafs: {'name': 'hello'}>
2021-12-14 22:51:24,739 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2021-12-14 22:51:25,732 |           adage.node |   INFO | node ready </init:0|success|known>
2021-12-14 22:51:25,732 |    adage.pollingexec |   INFO | submitting nodes [</hello_world:0|defined|known>]
2021-12-14 22:51:25,733 | pack.hello_world.ste |   INFO | starting file logging for topic: step
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:844: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:838: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdin = io.open(p2cwrite, 'wb', bufsize)
/home/feickert/.pyenv/versions/3.8.7/lib/python3.8/subprocess.py:844: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
2021-12-14 22:51:27,780 |           adage.node |   INFO | node ready </hello_world:0|success|known>
2021-12-14 22:51:27,800 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2021-12-14 22:51:27,800 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2021-12-14 22:51:27,801 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 2 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2021-12-14 22:51:30,227 |                adage |   INFO | adage state loop done.
2021-12-14 22:51:30,227 |                adage |   INFO | execution valid. (in terms of execution order)
2021-12-14 22:51:30,227 |                adage |   INFO | workflow completed successfully.
2021-12-14 22:51:30,227 |  yadage.steering_api |   INFO | done. dumping workflow to disk.
2021-12-14 22:51:30,228 |  yadage.steering_api |   INFO | visualizing workflow.
2021-12-14 22:51:30,605 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result examples/helloworld recast-debug:
--------------
- name: My Result
  value: Hello my Name is hello

Other information

This has been seen in many places elsewhere online, including:

jqlang release v1.7 breaks packtivity

jq (the language) had its first release in 5 years with jq v1.7, which includes some breaking changes.

jq the Python library added support for jqlang v1.7 in jq v1.6.0 and so v1.6.0 is also a breaking change for packtivity.

Action plan

of <1.6.0

Use new base image for Dockerfile as CentOS 8 is EOL

The current builds of the Dockerfile fail at

Step 2/6 : RUN dnf install -y python3
 ---> Running in fc62e180257f
CentOS Linux 8 - AppStream                      [26](https://github.com/yadage/packtivity/runs/5047398152?check_suite_focus=true#step:4:26)5  B/s |  38  B     00:00    
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
The command '/bin/sh -c dnf install -y python3' returned a non-zero code: 1

which is happening because

CentOS 8 went EOL at the end of December [2021] and in line with all the public announcements, the content of the CentOS 8 repos has been moved to vault.centos.org.

So that means that all CentOS 8 Dockerfiles are now broken forever. ๐Ÿ˜ข

We need to choose a new base image for packtivity's Dockerfile, so do we go to CentOS 7, switch to Debian, or go to Fedora?

Allow for alternative Singularity data mount or better mount point detection

Issue

Currently packtivity selects the root path of a user's $HOME to use as the data mount point

def run_containers_in_singularity_runtime(config, state, log, metadata, race_spec):
import tempfile
import shutil
tmpdir_home = tempfile.mkdtemp(prefix="_sing_home_")
tmpdir_work = tempfile.mkdtemp(prefix="{}/".format(tmpdir_home))
homemount = "/".join(os.path.expanduser("~").split("/")[:2])
cmdline = singularity_execution_cmdline(
state,
log,
metadata,
race_spec,
dirs={"work": tmpdir_work, "home": tmpdir_home, "datamount": homemount},
)

This means somewhere like LXPLUS8 you get

$ echo $HOME
/afs/cern.ch/user/f/feickert

and so

datamount="/afs"

while if you're somewhere like the Analysis Facility at UChicago you'd get

$ echo $HOME
/home/feickert

and so

datamount="/home"

Example

This is fine, but can cause problems for the way that Singularity interacts with the local file system when mounting. For example, if a user makes a Python virtual environment and installs recast-atlas[local] and tries to run the examples/rome workflow at the UChicago AF with a script like

#!/bin/bash

export PACKTIVITY_CONTAINER_RUNTIME=singularity
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity"

mkdir -p "${SINGULARITY_CACHEDIR}"

# Confirm workflow
recast catalogue ls
recast catalogue describe examples/rome
recast catalogue check examples/rome

recast run examples/rome --backend local --tag examples-rome

it will fail as the steps that the eventselection stage runs through in the container include

source /home/atlas/release_setup.sh

With the data mount set to /home the command packtivity runs would be something like

singularity exec -C  -B /home:/home --pwd /tmp/_sing_home_82sbnqfs/r6cndyad -H /tmp/_sing_home_82sbnqfs docker://reanahub/reana-demo-atlas-recast-eventselection:1.0 sh -c bash

which given how Singularity handles bind mounts means that the path /home/atlas in the container doesn't exist anymore as it has gotten clobbered by the UChicago filesystem's /home, causing the workflow to fail.

Proposed Solution or Idea

There should either be some way to set an alternative datamount in

def run_containers_in_singularity_runtime(config, state, log, metadata, race_spec):

(maybe via environmental variable?), or there should be an alternative method for setting the datamount (this seems hard to do in general). @lukasheinrich might have smarter ideas here.

Other Related Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.