jhclark / ducttape Goto Github PK
View Code? Open in Web Editor NEWA workflow management system for researchers who heart Unix.
Home Page: http://jhclark.github.com/ducttape
License: Other
A workflow management system for researchers who heart Unix.
Home Page: http://jhclark.github.com/ducttape
License: Other
When a task has a bound output, and that task fails to successfully create the bound output, ducttape should throw an error after the task completes.
task foo > out=my_dir/file.txt {
mkdir mydir
echo "hello" > mydir/file.txt
}
A working versioner for git should be bundled with ducttape, so that users don't have to define one.
When multiple instances of ducttape are run targeted at the same destination directory, the first instance should be allowed to finish any tasks that it believes it is responsible for. The second instance of ducttape should only attempt to complete its tasks after the first instance either successfully completes those tasks or runs the task to completion and fails.
There is no documentation for ducttape --purge, even when you run ducttape --help
In the following example, the value of the input variable ${irstlm_dir} is set to the current working directory. This is confusing and unexpected, since the assigned value is the empty string.
The workaround is to use a parameter (:: irstlm_dir) instead of an input (> irstlm_dir)
task moses < irstlm_dir="" {
if [[ -n "${irstlm_dir}" && -d "${irstlm_dir}" ]]; then
irstlm_flag="--with-irstlm=${irstlm_dir}"
else
irstlm_flag=""
fi
./bjam ${irstlm_flag}
}
The tutorial example 3-hyper/6-branch-grafting.tape should work, but currently fails.
This will be implemented as a directory in the $CONF directory, which in turn stores:
DirectoryArchitect assignOutFile refers to work dir. I believe this is a bug, since work isn't used anymore.
Currently, branch grafts as extensions of task variable references. E.g.
in=$output@task_name[BranchName:branchToGraft]
However, a config variable may contain a branch point:
global { var=(X: x1=1 x2=2) }
The user might forseeably want to use a graft with this:
in=$var[X:x1]
However, this last line is not currently allowed.
Unit test should succeed, but currently fails.
The following should work, but does not:
task first > out {
echo foo > ${out}
}
task second :: a=$out@first {
cat ${a}
}
The parser for the bash code blocks should detect this error at parse time:
task foo < // this is a valid ducttape comment
{ // this is an illegal comment in bash
... bash code ...
}
The sample submitter in 6-schedulers/2-sge-simple.tape uses bogus SGE options
Workflow is invalid otherwise.
The following throws an Exception. If it's going to fail, it should fail with a sensible error message:
task foo :: j=8 {
echo ${j
}
For new users, it is confusing why the output files are buried in a deeply nested directory structure. This is potentially off-putting to new users, especially when running very simple example files.
It would be nice to have a flag that allows directory sandboxing to be turned off.
Turning off sandboxing may not be possible or practical in complex use cases involving lots of branching. But at least for simple cases with little or no branching, this proposed feature would be nice to have.
The following should succeed, but does not:
task preproc :: in=(DataSet: train=big.txt test=small.txt) {
wc -l ${in}
}
task trainer :: in=$in@preproc[DataSet:train] {
wc -l ${in}
}
The examples 1-shell.tape and 2-sge.tape in syntax/tutorial/6-schedulers should work, but don't.
The example in syntax/tutorial/3-hyper/7-glob-branch.tape should work but currently fails.
Certain submitter variables may only be used in advanced special circumstances, or may have reasonable default values set by the submitter administrator. It would be nice to allow the submitter block to be written in such a way that such variables are passed to the actual submitter executable iff they are defined, and tasks that use the submitter should be allowed to omit such variables in order to use the environment-specific default value.
submitter sge :: vmem .... {
action run {
....
if [ ${vmem} != "" ]; then
echo "#$ -l virtual_free=${vmem}" >> $wrapper
fi
...
}
}
task hello :: .submitter=sge {
echo hello
}
Steps to recreate. Use this code to produce the result below:
submitter bad :: foo COMMANDS {
action run {
wrapper="ducttape_job.sh"
echo "echo foo=${foo}" >> $wrapper
echo "$COMMANDS" >> $wrapper
bash $wrapper
}
}
task hello_bad :: .submitter=bad .foo=bar {
echo bad
}
submitter good :: COMMANDS {
action run {
wrapper="ducttape_job.sh"
echo "$COMMANDS" >> $wrapper
bash $wrapper
}
}
task hello_good :: .submitter=good {
echo good
}
DuctTape v0.2
By Jonathan Clark
Have 19 previous workflow versions
Using default one-off realization plan
Checking for completed steps...
Checking hello_good/baseline
Task incomplete hello_good/baseline: No previous output
Checking hello_bad/baseline
Task incomplete hello_bad/baseline: No previous output
Checking hello_bad/baseline
Task incomplete hello_bad/baseline: No previous output
Finding packages...
Found 0 packages
Checking for already built packages...
Checking inputs...
Work plan:
RUN: /path/to/hello_good/baseline
RUN: /path/to/hello_bad/baseline
Retreiving code and building...
Moving previous partial output to the attic...
Considering hello_good/baseline
Considering hello_bad/baseline
Considering hello_bad/baseline
Executing tasks...
Running hello_good in /path/to/hello_good/baseline
good
Running hello_bad in /path/to/hello_bad/baseline
foo=bar
bad
Running hello_bad in /path/to/hello_bad/baseline
foo=bar
bad
foo=bar
bad
The following should work, but doesn't:
config {
modelblocks=/path/to/modelblocks
giza_dir=/path/to/giza
}
To allow for the use of highly experimental code such as the tape linked below, allow the following:
https://github.com/jhclark/ducttape/blob/master/syntax/crazy-ideas/lane-advanced-moses-package.tape
task foo :: .versioner=git .url="git://path/to/software.git" .ref=HEAD {
}
The following should be allowed and tested:
global {
var1=$var2
var2=42
}
This was originally observed as of revision af3cd96, before the deadlock detection code in the unpacked walker unit test was fixed.
To reproduce (for that revision):
sbt test
Notice that the test fails with a large stack trace, but sbt reports success.
Currently nested branch points fail:
x=(BranchPointOne: first=(BranchPointTwo: second=x.txt))
This should be implemented by extending MetaHyperDags to support phantom vertices as internal vertices (not just as source vertices).
Once a tape has launched, it would be nice to be able to determine (via a command-line tool, GUI, &/or web-based tool) what tasks have completed, what tasks are in progress, and what tasks have not yet started.
The following is proposed syntax for switch-case statements in ducttape:
It allows for pattern matching on branch points that have already been previously defined by some upstream task.
switch switch_task_name on WhichThing < in=$out@prev_task > out {
# Handle a special case (e.g. segment Japanese)
case thing_one : juman {
echo "hello $in"
}
# Can handle multiple branches at once (e.g. Segment various Arabic dialects)
case thing_two, thing_three : ar_seg < ar_model=/path {
echo $hello $in
}
# Handle all other cases not previously mentioned (e.g. tokenize Western languages)
default : moses {
echo "$hello $in"
}
}
The examples in syntax/tutorial/2-packages should work. They don't.
Sometimes everything is messed up, and you don't need the old results moved to the attic. There should be a way to tell ducttape to delete things instead of moving them to the attic.
Create unit test suite for prefixtree.scala
The following should work, but doesn't:
global {
modelblocks=/path/to/modelblocks
giza_dir=/path/to/giza
}
When a workflow has been run with non-comparable versions, a warning should be generated. Both before execution (so that the user can invalidate old versions if desired) and when reports are generated.
It would be nice to allow the following:
task foo
:: dir=path/to/dir
a=$dir/a.txt
b=$dir/b.txt {
... Bash code ...
}
The following is currently required to use a global or config variable:
global {
foo=bar
}
task hi :: foo=${foo} {
echo ${foo}
}
We should consider allowing the following shorthand syntax:
global {
foo=bar
}
task hi :: ${foo} {
echo ${foo}
}
The following does not work, but should:
global {
foo=bar
}
task hi {
echo ${hi}
}
Instead of the directory in which the workflow file resides.
It would be cool if this worked:
task hello { echo hello }
It doesn't now. Is there a good reason for this?
The following should work, but does not:
task foo :: a=(A: a1 a2 a3) {
echo ${a}
}
plan blocks should be recognized and used as filters instead of the current implementation, which reads in hacky plan files.
The following should be allowed, but is not:
action wrap > wrapper {
echo "#$ -S /bin/bash" >> wrapper
}
The workaround is to use single quotes:
action wrap > wrapper {
echo '#$ -S /bin/bash' >> wrapper
}
The following should succeed, but fails:
task foo > a=a.txt {
echo "hello" > $a
}
task bar < in=$a@foo {
cat $in
}
When redirecting ducttape output to a file, the color is a nuisance, since it shows up as weird characters in the output file.
It would be nice if the color could be turned off automatically in this case, or at least via a flag.
The following should work:
global {
foo=bar
}
The attic is a directory inside the $CONF directory where partial and invalidated outputs are stored in lieu of deleting them.
3-hyper/4-realization-plans.tape contains syntax for referring to branches of a sequence branch point that I believe should work, but it doesn't. Is this intentional?
https://github.com/jhclark/ducttape/blob/master/tutorial/03-04-realization-plans.tape
The following should be allowed, but is not currently accepted by the parser:
/* Hello, world
The following should be allowed, but currently fails.
task foo
bar
< x
{
....bash code...
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.