GithubHelp home page GithubHelp logo

jhclark / ducttape Goto Github PK

View Code? Open in Web Editor NEW
116.0 116.0 14.0 35.69 MB

A workflow management system for researchers who heart Unix.

Home Page: http://jhclark.github.com/ducttape

License: Other

Makefile 0.20% Shell 1.83% Awk 0.03% Scala 78.08% TeX 0.69% Emacs Lisp 0.34% Vim Script 1.91% CSS 3.28% JavaScript 11.40% HTML 2.24%

ducttape's People

Contributors

armatthews avatar dowobeha avatar jhclark avatar neubig avatar nschneid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ducttape's Issues

Bound output that isn't created should trigger an error

When a task has a bound output, and that task fails to successfully create the bound output, ducttape should throw an error after the task completes.

task foo > out=my_dir/file.txt {
   mkdir mydir
   echo "hello" > mydir/file.txt
}

Make multiple instances play nice together

When multiple instances of ducttape are run targeted at the same destination directory, the first instance should be allowed to finish any tasks that it believes it is responsible for. The second instance of ducttape should only attempt to complete its tasks after the first instance either successfully completes those tasks or runs the task to completion and fails.

Unexpected behavior when input parameter is empty string

In the following example, the value of the input variable ${irstlm_dir} is set to the current working directory. This is confusing and unexpected, since the assigned value is the empty string.

The workaround is to use a parameter (:: irstlm_dir) instead of an input (> irstlm_dir)

task moses < irstlm_dir="" {
if [[ -n "${irstlm_dir}" && -d "${irstlm_dir}" ]]; then
irstlm_flag="--with-irstlm=${irstlm_dir}"
else
irstlm_flag=""
fi

./bjam ${irstlm_flag}
}

Keep an atomic version number for runs of the workflow

This will be implemented as a directory in the $CONF directory, which in turn stores:

  • a copy of the .tape file
  • and .conf file used to run the workflow
  • the command line invocation of ducttape
  • which vertices were planned
  • the PID and hostname of the managing process
  • the version number assigned to that run of the workflow

Allow branch grafts for config variables

Currently, branch grafts as extensions of task variable references. E.g.

in=$output@task_name[BranchName:branchToGraft]

However, a config variable may contain a branch point:

global { var=(X: x1=1 x2=2) }

The user might forseeably want to use a graft with this:

in=$var[X:x1]

However, this last line is not currently allowed.

Allow output without using sandboxed dir, at least in simple use cases.

For new users, it is confusing why the output files are buried in a deeply nested directory structure. This is potentially off-putting to new users, especially when running very simple example files.

It would be nice to have a flag that allows directory sandboxing to be turned off.

Turning off sandboxing may not be possible or practical in complex use cases involving lots of branching. But at least for simple cases with little or no branching, this proposed feature would be nice to have.

Branch grafting fails for params

The following should succeed, but does not:

task preproc :: in=(DataSet: train=big.txt test=small.txt) {
   wc -l ${in}
}

task trainer :: in=$in@preproc[DataSet:train] {
   wc -l ${in}
}

Implement schedulers

The examples 1-shell.tape and 2-sge.tape in syntax/tutorial/6-schedulers should work, but don't.

Allow submitter variables to be omitted in tasks

Certain submitter variables may only be used in advanced special circumstances, or may have reasonable default values set by the submitter administrator. It would be nice to allow the submitter block to be written in such a way that such variables are passed to the actual submitter executable iff they are defined, and tasks that use the submitter should be allowed to omit such variables in order to use the environment-specific default value.

submitter sge :: vmem .... {
action run {
....
if [ ${vmem} != "" ]; then
echo "#$ -l virtual_free=${vmem}" >> $wrapper
fi
...
}
}

Note the following task does NOT define .vmem

task hello :: .submitter=sge {
echo hello
}

Use of params in submitter triggers duplicate runs

Steps to recreate. Use this code to produce the result below:

submitter bad :: foo COMMANDS {
action run {
wrapper="ducttape_job.sh"
echo "echo foo=${foo}" >> $wrapper
echo "$COMMANDS" >> $wrapper
bash $wrapper
}
}

task hello_bad :: .submitter=bad .foo=bar {
echo bad
}

submitter good :: COMMANDS {
action run {
wrapper="ducttape_job.sh"
echo "$COMMANDS" >> $wrapper
bash $wrapper
}
}

task hello_good :: .submitter=good {
echo good
}

DuctTape v0.2
By Jonathan Clark

Have 19 previous workflow versions
Using default one-off realization plan
Checking for completed steps...
Checking hello_good/baseline
Task incomplete hello_good/baseline: No previous output
Checking hello_bad/baseline
Task incomplete hello_bad/baseline: No previous output
Checking hello_bad/baseline
Task incomplete hello_bad/baseline: No previous output
Finding packages...
Found 0 packages
Checking for already built packages...
Checking inputs...
Work plan:
RUN: /path/to/hello_good/baseline
RUN: /path/to/hello_bad/baseline
Retreiving code and building...
Moving previous partial output to the attic...
Considering hello_good/baseline
Considering hello_bad/baseline
Considering hello_bad/baseline
Executing tasks...
Running hello_good in /path/to/hello_good/baseline
good
Running hello_bad in /path/to/hello_bad/baseline
foo=bar
bad
Running hello_bad in /path/to/hello_bad/baseline
foo=bar
bad
foo=bar
bad

SBT not detecting some unit test failures

This was originally observed as of revision af3cd96, before the deadlock detection code in the unpacked walker unit test was fixed.

To reproduce (for that revision):

sbt test

Notice that the test fails with a large stack trace, but sbt reports success.

Allow nested branch points

Currently nested branch points fail:

x=(BranchPointOne: first=(BranchPointTwo: second=x.txt))

This should be implemented by extending MetaHyperDags to support phantom vertices as internal vertices (not just as source vertices).

Implement visualization tool to view run status

Once a tape has launched, it would be nice to be able to determine (via a command-line tool, GUI, &/or web-based tool) what tasks have completed, what tasks are in progress, and what tasks have not yet started.

Switch-case statement for branching

The following is proposed syntax for switch-case statements in ducttape:

It allows for pattern matching on branch points that have already been previously defined by some upstream task.

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three : ar_seg < ar_model=/path {
    echo $hello $in
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}

CLI needs a way to invalidate everything

Sometimes everything is messed up, and you don't need the old results moved to the attic. There should be a way to tell ducttape to delete things instead of moving them to the attic.

Allow shorthand syntax for importing config/global variables

The following is currently required to use a global or config variable:

global {
   foo=bar
}

task hi :: foo=${foo} {
   echo ${foo}
}

We should consider allowing the following shorthand syntax:

global {
  foo=bar
}

task hi :: ${foo} {
   echo ${foo}
}

Implement The Attic

The attic is a directory inside the $CONF directory where partial and invalidated outputs are stored in lieu of deleting them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.