yuch7 / cwlexec Goto Github PK

A new open source tool to run CWL workflows on LSF

License: Other

Shell 2.39% Java 79.36% Common Workflow Language 14.44% JavaScript 3.65% C 0.01% Python 0.15%

cwlexec's Introduction

cwlexec

cwlexec implements running CWL (Common Workflow Language) workflows on IBM Spectrum LSF. It is written in Java and tested for Java 8, with the following features:

Tight integration with IBM® Spectrum LSF
Leverages LSF features (such as native container support)
Implements CWL draft-3 and v1.0 with a few exceptions (SoftwareRequirement, include directive, remote location in File/Directory specification)

Install

Installing cwlexec is a simple process of downloading and extracting the package.

Before downloading the package, make sure you installed IBM Spectrum LSF 10.1.0.3 (or above) and Java Runtime Environment (version 8), and that you set the JAVA_HOME environment variable.

Download the latest release package from https://github.com/IBMSpectrumComputing/cwlexec/releases and extract the package.

tar xzvf cwlexec-0.2.2.tar.gz

Add the extracted directory cwlexec-0.2.2 with the cwlexec command to the PATH environment variable.

Run

Make sure that you sourced the LSF environment, then run cwlexec or cwlexec -h to view help.

The following is a typical command to run a CWL workflow:

cwlexec [options] workflow-description-location [input-settings-location]

Build

You can build the package from source. Make sure that you have Maven installed

git clone https://github.com/IBMSpectrumComputing/cwlexec.git # Clone cwlexec repo
cd cwlexec         # Switch to source directory
mvn package        # build package

After the build, the cwlexec-0.2.2.tar.gz package is generated in the target directory.

Test

cd cwlexec
mvn clean package # build package and run unit test
cd src/test/integration-test
./run.sh

All conformance test cases (127) are pased, except src/test/integration-test/v1.0/envvar.cwl, due to LSF limitation: LSF does not support propagating the $HOME variable.

Run your conformance tests

For instructions on running conformance tests refer to https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md

Features

cwlexec has the following features:

bsub options support

By default, cwlexec submits steps/jobs without any extra bsub options. cwlexec provides a separate configuration file in JSON format to be used for workflow execution --exec-config|-c. This enables users to specify LSF-specific options while keeping CWL definitions generic and portable.

cwlexec -c myconfig.json myflow.cwl myinput.yml

Field	Type	Description
queue	String	Specify the LSF queue option `–q <queue>`
project	String	Specify the LSF project option `–P <project>`
rerunnable	Boolean	Specify the LSF rerunnable option `-r`
app	String	Specify the LSF app option `–app <application>`
processors	String	Specify the the number of tasks in the LSF job, it is same as `bsub –n <the number of tasks in the job>`
res_req	String	Specify the LSF resource option `–R res_req`. Beware that this option will override the `ResourceRequirement` defined. If `res_req` is not specified in exec-config, LSF using following syntax for `ResourceRequirement` specification coresMin:`bsub -n` coresMax:`bsub –n coresMin,coresMax` ramMin:`bsub –R mem>ramMin` ramMax:`bsub –M ramMax`

The configuration file suppports workflow level and step level settings:

Workflow setting: The options in this part are enabled for each workflow step. For example, if a user specifies a queue in this part, cwlexec adds the –q queue_name option for each step/job.

Step setting: The options in this part are enabled only for the current step/job. If the current step is a subworkflow, the options are enabled for each step in the subworkflow.

If the same options appear in the workflow level and step level configuration, the step level setting overrides the workflow level settings.

Examples of execution configuration:

Specify a queue and enable jobs to be rerunnable for all steps:

{
    "queue": "high",
    "rerunnable": true
}

Specify a queue for all steps, specify an application profile for step1, and specify a resource requirement for step2:

{
    "queue": "high",
    "steps": {
        "step1": {
            "app": "dockerapp"
        },
        "step2": {
            "res_req": "select[type==X86_64] order[ut] rusage[mem=512MB:swp=1GB:tmp=500GB]"
        }
    }
}

Specify a queue for all steps, enable the rerunnable option, specify resource requirements for mainstep, and specify the application profile for one subworkflow step:

{
    "queue": "high",
    "steps": {
        "mainstep": {
            "rerunnable": false,
            "res_req": "select[type==X86_64] order[ut] rusage[mem=512MB:swp=1GB:tmp=500GB]"
        },
        "subflow/step1": {
            "app": "dockerapp"
        }
    }
}

Docker Support

Indicates that a workflow component should be run in a Docker container, and specifies how to fetch or build the image.

Before you start, ensure you configure the following for your environment:

Docker Engine, Version 1.12, or later, must be installed on an LSF server host. The Docker daemon must be started on this host and can successfully start containers.

cwlexec has two ways to submit docker job in LSF: use the bsub -app option to submit a job to a docker application profile, or use bsub –R <res_req> to specify a docker resource and use "docker run" directly.

Use Docker application profile to submit job The LSF administrator must complete the following configuration steps as a pre-requisite:

Configure the application in your environment. For more details, refer to https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_kc_docker.html

Note:

Use $LSB_CONTAINER_IMAGE in your application configuration; do not hardcode your image. Configure your registry for your image. For example, image(register_server_path/$LSB_CONTAINER_IMAGE)

Specify your shell script for preparing Docker variables in CONTAINER, such as (@/path/dockeroptions.sh). This location should be in shard directory.

If $LSB_CONTAINER_IMAGE is from Docker-register, configure your register_server_path to your image, such as image(register_server_path/$LSB_CONTAINER_IMAGE)

Create your dockerOptions.sh with the following content

#!/bin/bash
for OPTION in $LSB_CONTAINER_OPTIONS
do
    echo $OPTION
done

cwlexec passes volume mappings to the docker job through the $LSB_CONTAINER_OPTIONS environment variable, such as workdir, input, output, and $HOME and envDef defined in EnvVarRequirements. You can add more options in dockerOptions.sh as needed, for example

…
echo --rm
echo --net=host
echo --ipc=host
…

The end user must specify the Docker application profile in app in the exec-config file, for example

app.json
{
    "steps": {
        "step1": {
            "application": "dockerapp"
        }
    }
}

Run workflow

cwlexec –c app.json docker.cwl docker-job.yml

Note: The docker image must be ready and can be pulled with docker pull.

Specify docker resource to submit job The LSF administrator must complete the following configuration steps as pre-conditions:

Make sure the job submission user is in the docker user group.
If the Docker engine is installed on all LSF server hosts, end users can run Docker jobs without any configuration.
```
cwlexec docker.cwl docker-job.yml
```
If the Docker engine is not installed on all LSF server hosts, define the docker boolean resource on hosts that can run Docker jobs. For more details, refer to
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_docker/lsf_docker_prepare.html

End users must specify the docker resource in res_req in exec-config file, for example

res.json
{
    "steps": {
        "step1": {
            "res_req": "docker"
        }
    }
}

Run workflow

./cwlexec –c res.json docker.cwl docker-job.yml

Note: Since the job submission user must be in the docker user group, which is a security concern, use bsub -app to submit docker jobs.

Interrupt an executing workflow

You can use Ctrl+C to interrupt an executing command. When a command captures this signal, the command exits with 130, and the executing workflow exits. The submitted jobs continue to run, but no new jobs are submitted.

Rerun a workflow

The workflow exits as long as any step exits. You can rerun the exited workflow with the workflow ID and the workflow is rerun from the failed step.

cwlexec -r|--rerun [--debug] <workflow-id>

When the workflow is rerun if the workflow has running jobs, the command prompts the user to kill the running jobs.

The workflow has running jobs. Do you want to kill them before rerunning the workflow? (Y/N)

Choose "Yes" if you want to kill all running jobs before rerunning the workflow. Choose "No" and the command will exit and do nothing.

Post-failure script support

You can configure a post-failure script for the workflow. When a step is exited, the post-failure script executes to try to recover the job.

The post-failure script can be configured at the step or flow level, just like the bsub options in the exec-config configuration file that works at different levels.
When the script fails (that is, exits with a non-zero code), the exit code of the step will still be the exit code from the job, not the one from the script.

The following environment variables are passed to the post-failure script:

Variable	Description
CWLEXEC_JOB_ID	job ID
CWLEXEC_JOB_BSUB	bsub command
CWLEXEC_JOB_CMD	job command
CWLEXEC_JOB_CWD	job working directory
CWLEXEC_JOB_OUTDIR	job output directory
CWLEXEC_JOB_RESREQ	job resource requirement
CWLEXEC_RETRY_NUM	number of retry time

The post-failure script:

Create your post-failure script, for example, /path/recoverscript.sh

#!/bin/sh
JOB_ID=$CWLEXEC_JOB_ID
brequeue -aH $JOB_ID
bmod -Z "job command" $JOB_ID
bresume $JOB_ID

Configure the post-failure script in exec_conf, for example, postscript.json

{
    ...
    "post-failure-script": {
        "script": "/path/recoverscript.sh"
        "timeout": 10
        "retry": 3
    }
    ...
}

Field	Required	Description
Script	Yes	The absolute path of post-failure script
Timeout	No	The timeout of post-failure script. By default 10 seconds
Retry	No	The maximum retry times. By default retry 1 time

Run your workflow with post-failure script support

cwlexec –c postscript.json workflow.cwl workflow-job.yml

List executed workflows

The cwlexec --list|-l command lists all your submitted workflow's information, and the cwlexec --list|-l <workflow-id> command displays a workflow information in detail.

Field	Description
ID	The unique identifier for this workflow
Name	The name of workflow
Submit Time	The time that the workflow is submitted
Start Time	The time that the workflow is started to execute
End Time	The time that the workflow is finished
Exit State	The workflow exit state, DONE or EXITED
Exit Code	0~255
Working Directory	The workflow work directory
Output Directory	The workflow output directory
CWL File	The path for workflow description file
Input Setting Files	The path for workflow input settings file

Exit Code Definition

If all steps of the workflow are done and the workflow is successful, the workflow exit code is 0. By default, if a workflow step exit code is 0 and its outputs match the output schema, the step was treated as done; otherwise the step is treated as exited.

If a user defines the success code for a workflow step, the step exit code is in the successCodes, and its outputs match the output schema, the step is treated as done; otherwise the step is treated as exited.

If any step exits in a workflow, the workflow exits and the command exit code will be the exit code of the exited step. If the workflow exits, all submitted jobs continue to run, but no new jobs are submitted.

Exit Code	Description
0	The workflow is done
33	There is an unsupported feature in the workflow
130	User used Ctrl + C to interupt the workflow
250	The workflow input/output cannot be found
251	Fail to parse workflow
252	Fail to load workflow inputs
253	Fail to evaluate the expression in workflow
254	Fail to capture the workflow/step output after the workflow/step is done
255	System exception. For example, command arguments are wrong; the CWL workflow description file cannot be found; bsub/bwait command cannot be found

Implementation

Overview on how cwlexec is implemented

Overview

cwlexec includes three packages:

com.ibm.spectrumcomputing.cwl.model: defines the Java beans for CWL document
com.ibm.spectrumcomputing.cwl.parser: parses CWL document to a Java object and binds the input settings to the parsed object
com.ibm.spectrumcomputing.cwl.exec: executes the workflow

Working Directory

The workflow work directory is used to store intermediate files of the workflow execution. It must be a shared directory for the LSF cluster.

Each workflow work directory is under the user specified -w work directory top. By default the top directory is $HOME/cwl-workdir. The work directory has the following structure:

WORKDIR_TOP
  |-workflow_id
      |- inputs
      |- ...
      |- step_id
      |    |- inputs
      |    |- ...
      |    |- output_id
      |    |- ...
      |- ...
  |- ...

The workflow id is a global unique id (UUID)

Record the workflow execution states

Each workflow information and execution states will be recorded to an embedded database HyperSQL. For each cwlexec command user, the embedded database records are persisted to $HOME/.cwlexec

There are two tables that are used to persist the workflow records

Workflow Execution

The execution sequence of a CWL workflow is as follows:

Parse the CWL document to yield a Java object and resolve the dependencies for each step.
Load the input settings and bind them for parsed object (if needed).
Evaluate the parsed object expressions.
Traverse the parsed object and submit the all of workflow steps.
- CommandLineTool steps are handled in one of three ways:
  1. Independent step: Build the step command by step inputs and arguments first, then submit (bsub) the step with the command. Set the step to running, record the LSF job ID, and send a start event (include the step job id) to its main workflow.
  2. A step that has dependencies and the dependencies are from the main workflow inputs: Build the step command by step inputs, arguments and dependent main workflow inputs first, then submit (bsub) the step with the command. Set the step to running, record the LSF job ID, and send a start event (include the step job id) to its main workflow.
  3. A step that has dependencies and the dependencies are from other workflow steps outputs: Create a placeholder execution script (a shell script with blank content) for this step first, then submit (bsub -H) the step with the placeholder execution script. Set the step to waiting and record the LSF job ID.
- If the step is a subworkflow, repeat the previous step.
- If the step is a scatter, create a placeholder script (exit 0) for it, then submit this step (bsub -H). Set the step to waiting and record the LSF job ID. After the scatter is done, change the step state to done and send a start event to its main workflow, then resume (bresume) this step.
After the main workflow receives the step start event, it broadcasts the event to its waiting steps. When a step receives the start event, it checks its dependencies. If all the dependencies are ready (all dependencies corresponding start events are received), wait (bwait -w) for the ready dependencies. After the wait action is met, this step validates the dependencies' outputs. If all outputs are validated, build the command for this step by the outputs and fill the command to the corresponding placeholder script. The step then sends a done event for all of the dependencies' steps to its main workflow and this step is resumed (bresume). Finally, set this step to running and send a start event to its main workflow.
After the main workflow receives the step done event, the workflow counts the done steps. If all steps in the workflow are done, the workflow itself is done.
If any wait (bwait) action is failed, the step sends an exit event (include the exit step job ID) to its main workflow.
If any output validation is failed, the step sends an exit event (include the exit step job ID) to its main workflow.
After the main workflow receives the step exit event, the workflow is exited, and all of waiting steps are killed (bkill), but running jobs will continue to run.

Community Contribution Requirement

Community contributions to this repository must follow the IBM Developer's Certificate of Origin (DCO) process, and only through GitHub Pull Requests:

Contributor proposes new code to the community.
Contributor signs off on contributions (that is, attaches the DCO to ensure contributor is either the code originator or has rights to publish. The template of the DCO is included in this package).
IBM Spectrum LSF Development reviews the contribution to check for: i) Applicability and relevancy of functional content ii) Any obvious issues
If accepted, the contribution is posted. If rejected, work goes back to the contributor and is not merged.

cwlexec's People

Contributors

Stargazers

Watchers

Forkers

drjrm3 davisjam qiangjia skeeey drkennetz liangasdfgmail vindamle thewitness

cwlexec's Issues

Optional array workflow input does not use tool's default value

Hi,

We have a simple example workflow (foo_wf.cwl) that takes an optional string array as input. It calls foo.cwl which sets a default for the input array. This workflow works when given the string array as input, but when given no inputs it throws the following error in errfile.txt:

com.ibm.spectrumcomputing.cwl.model.process.parameter.type.NullValue cannot be cast to java.util.List

This only happens at the workflow level with the optional array input. I believe since it is optional for the workflow, it should return null as input to the step calling foo.cwl where it then uses the default array for the command line tool input because the input is null. This is the behavior for a non-array optional input.

OptionalArrayInputError.tar.gz

Default queue is used in a workflow even though another queue is specified in a config file

I have a simple foo.sh script which is wrapped with foo.cwl. foo_wf.cwl is a workflow which scatters over foo.cwl. When specifying a queue in a config file, all scatter jobs correctly hit that queue, but the final Scatter gather job action is sent to my default queue, not the queue specified in my config file.

See attached for a fully reproducible example (aside from queues 'priority' and 'short' being specified).

WrongQueueError.tar.gz

Fail to write scatter values upon scattering on files

We have a 3 step pipeline (map, foo, reduce) where map creates N files, foo transforms a file into another file, and reduce cats all files into one. When we scatter with CWLEXEC on foo, it only peforms foo on one file and delivers it (with success) to reduce even though there is a Java error involved:

16:28:26.595 default [pool-4-thread-2] ERROR c.i.s.c.e.u.outputs.OutputsCapturer - Fail to write scatter values

java.nio.file.FileAlreadyExistsException: /home/jmichael/CWLEXEC/FailToWriteScatterValuesError/workdir/a5efe1f2-2a7d-42d9-906b-049c154ffff2/foo/1.foo.txt
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.createFile(Files.java:632)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.writeScatterValues(OutputsCapturer.java:459)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.findScatterOutputValue(OutputsCapturer.java:444)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.findScatterOuputValue(OutputsCapturer.java:262)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputsByType(OutputsCapturer.java:181)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputs(OutputsCapturer.java:94)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.captureStepOutputs(LSFBwaitExecutorTask.java:373)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.makeStepSuccessful(LSFBwaitExecutorTask.java:142)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.waitSteps(LSFBwaitExecutorTask.java:132)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.run(LSFBwaitExecutorTask.java:97)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

FailToWriteScatterValuesError.tar.gz

Cannot use Array type with inputs

For tools which require a flag before each item in an array we can use the methods described in The array-inputs tutorial. This works well for cwltool, but does not seem to get passed to the baseCommand in cwlexec.

Attached is an example command 'foo' which takes multiple --INPUT files and cats them all to a single --OUTPUT file. With cwltool it works but with cwlexec it does not pass any of the --INPUT flags.

MultiInputError.tar.gz

Quotes not recognized in baseCommand

We have a simple workflow which uses baseCommand: [awk, '{print $2}'] and the interpreted command does not keep the ''s. Instead, it interprets the baseCommand as "baseCommand" : [ "awk", "{print $2}" ], (line 40 in the attached outfile.txt) and attempts to execute awk {print $2} (line 148) which fails.
BaseCommandError.tar.gz

Works with Jsrun?

Hello,
Could you say a little about how this works with jsrun? I am working on the Summit supercomputer at ORNL. Has anyone run this on Summit?

Thanks.

Error: Could not find or load main class

Command used:

./cwlexec /home/johnsoni/Innovation-Pipeline/workflows/QC/qc_workflow_wo_waltz.cwl ~/Innovation-Pipeline/test/workflows/EZ_QC_test.yaml
Error: Could not find or load main class com.ibm.spectrumcomputing.cwl.Application

I've downloaded and extracted the 0.2.2 release, is there any advice on this error?

Undefined javascript variable error

Working on #34 again and I can now reproduce the same error in my larger workflow.

Step 1 is split_reads which correctly scatters over the files now after the workaround proposed.

Step 2 scatters over those files and tries to generate a string for output_file from one of the files generated as part of step 1. I am using the following JavaScript to generate this string and it works in cwltool so I had assumed it was the correct approach:

      output_file:
        valueFrom: |
          ${  
            var s = inputs.R1_file.nameroot;
            s = s.replace(".R1","");
            return s + ".out";
          }

However, the error I get with cwlexec is:

[var runtime={"tmpdir":"/home/jmichael/cwl-workdir/79cb5eaa-3438-497f-8be8-85fd9a5523c7","tmpdirSize":"15005232752754688","outdirSize":"15005232752754688","cores":"1","outdir":"/home/jmichael/cwl-workdir/79cb5eaa-3438-497f-8be8-85fd9a5523c7","ram":"1024"};, var inputs={"R
12:20:50.855 default [pool-5-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - Failed to wait for job process_reads <66124407>, Failed to evaluate the expression "${
  var s = inputs.R1_file.nameroot;
  s = s.replace(".R1","");
  return s + ".out";
}
": TypeError: Cannot read property "replace" from undefined in <eval> at line number 3

so it looks like it is not correctly using inputs.R1_file. Am I using the correct approach here? It appears to be the same general issues as in #34 but I don't know that I can use the same workaround since I don't take anything in the inputs section in the scatter so I can't use that as source.

UndefinedVariableError.tar.gz

Undefined file on scatter

In an attempt to overcome #20 (using the same general example as in #33) I have moved the scatter down to the lowest level. However, I found that when building a filename with valueFrom inside the scatter that it returns undefined.

steps:
  foo:
    run: foo.cwl
    scatter: input_file
    in: 
      input_file: input_files
      output_filename:
        valueFrom: ${return inputs.input_file.nameroot + ".out";}
    out:
      [output_file]

However, this is not reproducible in cwltool where the filename gets correctly built. This seems like a cwlexec specific issue, but it could also be that I am not using best practices when building a string from within a scatter.

UndefinedFile.tar.gz

SchemaDefRequirement section: Imports of type definitions are not supported

cwlexec reports the following and exits:

The field [SchemaType] is required by [type].

if a type definition is imported like that:

cwlVersion: v1.0
class: CommandLineTool

requirements:
 - class: SchemaDefRequirement
   types:
    - $import: test_values.yaml

here is a simple test case for reproduction:
test.cwl.txt
test.yaml.txt
test_values.yaml.txt

call:

$ cwlexec test.cwl test.yaml

consider using 'cwljava' for CWL parsing

Currently only classes for CWL v1.2 have been generated, but I'd be happy to generate classes to parse and represent CWL v1.0 and v1.1 documents as well (that is not much work for me)

http://github.com/common-workflow-language/cwljava

Alternatively we add a helper method so that submitted documents would be automatically be upgraded to the latest CWL version

Failed to "Fill out the scatter gather result in the script"

command (just the usual):

cwlexec -p --workdir /home/user/<username>/output/ TranscriptsAnnotation-i5only-wf.cwl TranscriptsAnnotation-i5only-wf.test.job.yaml

cwlexec fails at the scattered functionalAnalysis step and reports the following:

[15:49:15.857] INFO  - The step (functionalAnalysis/runInterproscan) scatter of 1 jobs.
[15:49:15.857] INFO  - Started job (functionalAnalysis/runInterproscan_1) with
bsub \
-cwd \
/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/scatter1 \
-o \
%J_out \
-e \
%J_err \
-env \
all,TMPDIR=/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4 \
-R \
mem > 8192 \
-n \
3 \
/bin/sh -c 'interproscan.sh --outfile /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/transcript-01.p2_transcript-01.p2.i5_annotations --disable-precalc --goterms --pathways --tempdir /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan --input /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/splitSeqs/transcript-01.p2_transcript-01.p2.fasta --applications PfamA --formats TSV'
[15:49:15.877] INFO  - Job (functionalAnalysis/runInterproscan_1) was submitted. Job <1421> is submitted to default queue <normal>.
[15:49:15.877] INFO  - Started to wait for jobs by
bwait \
-w \
done(1421)
[15:50:07.854] INFO  - Fill out the scatter gather result in the script /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan
[15:50:07.855] ERROR - Failed to wait for job functionalAnalysis/runInterproscan <1415>, Failed to write file "/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan": /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan (No such file or directory)
[15:50:07.855] ERROR - The workflow (TranscriptsAnnotation-i5only-wf) exited with <255>.
[15:50:07.855] WARN  - killing waiting job (functionalAnalysis/runInterproscan) <1415>.
[15:50:07.855] WARN  - killing waiting job (functionalAnalysis/combineResults) <1418>.

Optional workflow input with tool-level default evaluates to null

Hi,

I have a simple example (attached) where the input to a workflow is an optional string. The command line tool has a default value for this input. When I use no outputs or a fixed output file name (as in #19), it works. If I modify the command line tool to now glob for $(inputs.foo) to return a file matching the name of the input string, it evaluates the input to null and fails to build the command in this instance. It should be evaluating the default input for the tool if the optional input for the workflow is not given.

EvalOptionalWorkflowInputError.tar.gz

basename is not recognized in JS evaluation of directory

The Directory specification requires a basename attribute but this is currently being evaluated as a null object by cwlexec since it is not included in the fields.

The attached shows a simple example of attempting to evaluate a basename of a directory where all required fields except basename are included so cwlexec fails:

09:50:13.340 default [pool-4-thread-1] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluate js expression "$(inputs.out_dir.basename)" with context
[var inputs={"sample":"MySample","out_dir":{"location":"/research/rgs01/home/clusterHome/kbrown1/DirectoryBasenameError/outdir/MySample","path":"/home/kbrown1/DirectoryBasenameError/workdir/d9623834-8552-4554-b8b5-c58184d22730/MySample","srcPath":"/research/rgs01/home/clusterHome/kbrown1/DirectoryBasenameError/outdir/MySample","listing":[],"class":"Directory"}};]
09:50:13.353 default [pool-4-thread-1] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.out_dir.basename)" to A null object
09:50:13.353 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - Failed to wait for job touch_sample <42464255>, null
09:50:13.354 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - The exception stacks:
java.lang.NullPointerException: null
    at java.lang.String.replace(String.java:2240)
    at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.parsePlaceholder(JSEvaluator.java:136)
    at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.parseExpr(JSEvaluator.java:171)
    at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.evaluate(JSEvaluator.java:56)
    at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.CommandOutputBindingEvaluator.evalGlob(CommandOutputBindingEvaluator.java:64)
    at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputs(OutputsCapturer.java:92)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.captureStepOutputs(LSFBwaitExecutorTask.java:376)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.makeStepSuccessful(LSFBwaitExecutorTask.java:140)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.waitSteps(LSFBwaitExecutorTask.java:133)
    at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.run(LSFBwaitExecutorTask.java:98)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

DirectoryBasenameError.tar.gz

Cannot support to run cwlexec command concurrently by one user

The cwlexec command cannot be executed concurrently by one user, this because cwlexec use HyperSQL (file: database) to record the workflow execution information. (the db file is in the $HOME/.cwlexec by default) , but HyperSQL file model doesn't support to write db concurrently. more information can be found from: http://hsqldb.org/doc/2.0/guide/running-chapt.html#N100CF

Regression after June 7th +29 CWL conformance tests fail

On June 7th we ran the CWL conformance tests against 4ea1396 and there were 20 failures (same as before)

Today we ran the CWL conformance tests against the latest code 023b1b5 and there are 48 failures (28 more)

https://ci.commonwl.org/job/cwlexec/96/console

Newly failed tests

Collecting output for an array of glob patterns based on inputs fails

Hi,

We have a simple example that takes a string as input and outputs two files with names based on the input string. When using glob to find both files based on the name pattern $(inputs.name)_1.txt and $(inputs.name)_2.txt, the glob seems to interpret these as literal strings rather than evaluating $(inputs.name) in each case:

outfile.txt

182   "outputBinding" : {
183     "glob" : {
184       "patterns" : [ "$(inputs.name)_1.txt", "$(inputs.name)_2.txt" ],

then returns:

218 {
219   "out_file" : [ ]
220 }

If $(inputs.name) is changed to the exact string, it works, but it should evaluate these for pattern matching.

GlobOutputArrayError.tar.gz

CWLEXEC fails with Hibernate exception for workflows with more than 20 steps

We do bump into this issue whenever we try to execute workflows in CWLEXEC with more than 20 steps, so 21 steps for instance. Also sometimes CWLEXEC hangs after it has reported the error. Could it be that sessions are not closed properly after each database transaction? We created a simple test workflow so it becomes easy to reproduce for you guys.

Here is the workflow:
test-workflow.zip

This is the command we are running:

cwlexec -debug -L -p -w <work-dir> -o <output-dir> test-workflow.cwl

CWLEXEC reports the following and exits or sometimes just hangs:

17:14:04.579 default [pool-3-thread-20] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_20) was submitted. Job <6076314> is submitted to default queue <research-rh74>.
17:14:04.579 default [pool-3-thread-16] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_16) was submitted. Job <6076305> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-12] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_12) was submitted. Job <6076312> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-14] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_14) was submitted. Job <6076313> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-21] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_21) was submitted. Job <6076319> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-18] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_18) was submitted. Job <6076322> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-17] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_17) was submitted. Job <6076317> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-13] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_13) was submitted. Job <6076321> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-2] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_2) was submitted. Job <6076316> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-5] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_5) was submitted. Job <6076310> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-1] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_1) was submitted. Job <6076324> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-15] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_15) was submitted. Job <6076318> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-4] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_4) was submitted. Job <6076320> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-19] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_19) was submitted. Job <6076307> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-3] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_3) was submitted. Job <6076325> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-7] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_7) was submitted. Job <6076306> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-6] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_6) was submitted. Job <6076315> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-11] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_11) was submitted. Job <6076309> is submitted to default queue <research-rh74>.
17:14:04.581 default [pool-3-thread-10] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_10) was submitted. Job <6076311> is submitted to default queue <research-rh74>.
17:14:04.581 default [pool-3-thread-9] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_9) was submitted. Job <6076323> is submitted to default queue <research-rh74>.
17:14:04.584 default [pool-3-thread-8] INFO  c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_8) was submitted. Job <6076308> is submitted to default queue <research-rh74>.
17:14:04.737 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Failed to submit the step touch_6, The internal connection pool has reached its maximum size and no connection is currently available!
17:14:04.743 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFBsubExecutorTask - The exception stacks:
org.hibernate.HibernateException: The internal connection pool has reached its maximum size and no connection is currently available!
	at org.hibernate.engine.jdbc.connections.internal.PooledConnections.poll(PooledConnections.java:82)
	at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.getConnection(DriverManagerConnectionProviderImpl.java:186)
	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:106)
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:136)
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:254)
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:262)
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:214)
	at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:56)
	at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:409)
	at com.ibm.spectrumcomputing.cwl.exec.service.CWLInstanceService.updateCWLProcessInstance(CWLInstanceService.java:83)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBsubExecutorTask.runStep(LSFBsubExecutorTask.java:108)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBsubExecutorTask.run(LSFBsubExecutorTask.java:56)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17:14:04.744 default [pool-3-thread-6] DEBUG c.i.s.c.e.e.lsf.LSFWorkflowRunner - broadcast event EXIT, touch_6
17:14:04.746 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFWorkflowRunner - The workflow (test-wf) exited with <255>.

Fails to run when input CWLType is an array of File, Directory

Hi,

We have a simple tool (attached) that performs echo on an input, and accepts either a File or Directory. When we run it, it returns the error too many types for one paramter . I've tested this with an array that is [File, string] or [Directory, string] or [string, int] and it seems to work, but the combination of [File, Directory] throws this error.

TooManyTypesError.tar.gz

cwlexec fails to fail when a required input is not provided.

A simple workflow which requires a string runs even when no input is provided. cwltool fails to run on the same example as expected.

[jmichael(BASH)@nodecn011]: cwlexec foo.cwl 
[15:51:20.864] INFO  - Workflow ID: ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO  - Name: foo
[15:51:20.865] INFO  - Description file path: /research/rgs01/home/clusterHome/jmichael/cwlexec_bugs/non-optional-inputs/foo.cwl
[15:51:20.865] INFO  - Output directory: /home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO  - Work directory: /home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO  - Workflow "foo" started to execute.
[15:51:20.870] INFO  - Started job (foo) with
bsub \
-cwd \
/home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9 \
-o \
%J_out \
-e \
%J_err \
-env \
TMPDIR=/home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9 \
echo
[15:51:20.993] INFO  - Job (foo) was submitted. Job <61886769> is submitted to queue <normal>.
[15:51:21.009] INFO  - Started to wait for jobs by
bwait \
-w \
done(61886769)
[15:51:25.188] INFO  - The job (foo) <61886769> is done with stdout from LSF:

{ }
[jmichael(BASH)@nodecn011]: echo $?
0
[jmichael(BASH)@nodecn011]: cat foo.cwl 
#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

baseCommand: echo

inputs:
  foo:
    type: string
    inputBinding:
      position: 1

outputs: []
[jmichael(BASH)@nodecn011]: cwltool foo.cwl 
/hpcf/apps/python/install/3.5.2/bin/cwltool 1.0.20180525185854
Resolved 'foo.cwl' to 'file:///research/rgs01/home/clusterHome/jmichael/cwlexec_bugs/non-optional-inputs/foo.cwl'
usage: foo.cwl [-h] --foo FOO [job_order]
foo.cwl: error: argument --foo is required
[jmichael(BASH)@nodecn011]: cwlexec --version
0.2.0
[jmichael(BASH)@nodecn011]:

Unable to evaluate record types in InitialWorkDirRequirement

Hi,

We have a simple command line tool (attached) that takes a record type as input with two strings, one for a file name and one for a directory name. Using InitialWorkDirRequirement should set up the directory for use by the command, but it fails to evaluate the record in this context.

At line 202 in attached outfile.txt:
09:23:29.529 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.parameters.out_dir)" to A null object

However, it is able to parse the record properly for creating the base command, just not for the above step.
InitialWorkDirError.tar.gz

ExpressionTool cannot return multiple arrays

Here is a simple CWL script "int_to_array.cwl" to convert an int to an int array and a string array:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: ExpressionTool

requirements:

class: InlineJavascriptRequirement

inputs:
number:
type: int
label: a positive integer

outputs:
int_array:
type: int[]
str_array:
type: string[]

expression: |
${ var s_arr = [], i_arr = [];
for (var i = 0; i < inputs.number; i++) {
s_arr.push('hello' + i + '.txt');
i_arr.push(i);
}
return { "int_array": i_arr, "str_array": s_arr };
}

This works with cwltool. But does not work with cwlexec-0.2.2:
$ cwltool int_to_array.cwl int_to_array.yml
/research/rgs01/project_space/yu3grp/software_JY/yu3grp/conda_env/yulab_env/bin/cwltool 1.0.20190228155703
Resolved 'int_to_array.cwl' to 'file:///research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.cwl'
{
"int_array": [
0,
1,
2,
3
],
"str_array": [
"hello0.txt",
"hello1.txt",
"hello2.txt",
"hello3.txt"
]
}
Final process status is success

$ cwlexec int_to_array.cwl int_to_array.yml
[17:24:24.592] INFO - Workflow ID: 20fed44a-9f27-4797-b886-28846559711f
[17:24:24.593] INFO - Name: int_to_array
[17:24:24.593] INFO - Description file path: /research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.cwl
[17:24:24.594] INFO - Input settings file path: /research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.yml
[17:24:24.594] INFO - Output directory: /home/lding/cwl-workdir/20fed44a-9f27-4797-b886-28846559711f
[17:24:24.594] INFO - Work directory: /home/lding/cwl-workdir/20fed44a-9f27-4797-b886-28846559711f
[17:24:24.594] INFO - Workflow "int_to_array" started to execute.
[17:24:24.871] INFO - Job (int_to_array) was submitted. Job <78813896> is submitted to queue .
[17:24:29.446] ERROR - Failed to wait for job int_to_array <78813896>, java.lang.String cannot be cast to java.lang.Long
[17:24:29.446] ERROR - The job (int_to_array) exited.

Arrays are not scattered when passed to subworkflows

Hi,

We have a simple example workflow that seems to be passing array inputs without scattering them to lower level scripts

top_workflow.cwl calls -> subworkflow.cwl calls -> echocat.cwl calls -> echocat.sh which takes 3 inputs (string, file, file).

subworkflow.cwl just has a single step which takes a string input and a File[] input and passes it to the command line tool. This works fine with CWLEXEC. When I use top_workflow.cwl to scatter over an array of strings or an array of arrays of files, they do not get scattered, but instead passed directly to the command line tool, where it fails because the shell script cannot use it this way. The string array as a single string and the File array of arrays as a single array. Attached is the example and in the output.txt file at line 646 the command is built incorrectly.

SubworkflowArrayScatterError.tar.gz

$(inputs.other_file.nameroot) evaluates to Null upon file scatter

We have a simple example where we are trying to concatenate english.txt with french.txt, german.txt, and spanish.txt via scatter using $(inputs.other_file.nameroot) in the CommandLineTool being scattered. This works as desired in cwltool, but in CWLEXEC, we get no output files and the following information in debug mode (found also in the 'outfile.txt', attached):

16:15:47.191 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.other_file.nameroot)" to A null object

NullJSObjectError.tar.gz

CWLEXEC doesn't correctly run several subworkflows

Hi!
I met am issue when tested running several subworkflows in CWLEXEC. My pipeline works fine in CWL (cwltool) but fails in CWLEXEC.
The structure of pipeline is very simple:
step 1:
-- subworkflow 1:
------ copy file from input to another fille
step 2:
-- subworkflow 2:
------ grep the output of step 1 (by condition), output is stdout
------ copy result to another file

The error is:

------------------------------------------------------------
Successfully completed.
Resource usage summary:
    CPU time :                                   0.02 sec.
    Max Memory :                                 -
    Average Memory :                             -
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              -
    Max Threads :                                -
    Run time :                                   7 sec.
    Turnaround time :                            1 sec.
The output (if any) is above this job summary.

[13:32:02.086] INFO  - Fill out commands in the script <path>/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1:
grep 2  <command>
[13:32:02.090] INFO  - Resuming job (step-wf-2/step-subwf-1) <1896579> with
bresume \
1896579
[13:32:02.236] INFO  - Started to wait for jobs by
bwait \
-w \
done(1896579)
[13:32:04.773] INFO  - The job (step-wf-2/step-subwf-1) <1896579> is done with stdout from LSF:
------------------------------------------------------------
Job <<path>/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1> was submitted from host <host> by user <user> in cluster <cluster> at Wed Sep 11 13:32:00 2019
Job was executed on host(s) <host>, in queue <queue>, as user <user> in cluster <cluster> at Wed Sep 11 13:32:03 2019
<dirr> was used as the home directory.
<path/step-wf-2/step-subwf-1> was used as the working directory.
Started at Wed Sep 11 13:32:03 2019
Terminated at Wed Sep 11 13:32:03 2019
Results reported at Wed Sep 11 13:32:03 2019
------------------------------------------------------------
# LSBATCH: User input
path/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1
------------------------------------------------------------
Successfully completed.
Resource usage summary:
    CPU time :                                   0.02 sec.
    Max Memory :                                 -
    Average Memory :                             -
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              -
    Max Threads :                                -
    Run time :                                   2 sec.
    Turnaround time :                            3 sec.
The output (if any) is above this job summary.

[13:32:04.837] ERROR - Failed to wait for job step-wf-2/step-subwf-2 <1896578>, null
[13:32:04.837] ERROR - The workflow (test-pipeline) exited with <255>.
[13:32:04.837] WARN  - killing waiting job (step-wf-2/step-subwf-2) <1896578>.

I didn't meet this problem when run steps without subworkflows. But this case is very important for me because I use similar structure with more complicated workflows and tools.

All scripts attached in archive.
for_issue.zip

Thank you!
Kate

Don't copy result files.

Currently, cwlexec copies files from the work directories to the output directory (here if I am correct).

If possible, avoid copying output files. These files can be huge (e.g. we usually have files 100 GB, but they can be much bigger; this is common with human whole genome sequencing files) and copying is really a waste of space and time. While space may not be a problem, because copies can be deleted after processing, time may be more of a problem in a network-based storage with tight requirements for short processing times (e.g. for routine cancer diagnostics).

Alternatives are (at least on POSIX filesystems):

Symlinking, should always work. One may think about using relative symlinks, in case the output and work directories will be moved.
Hardlinking, if the work and output directories are on the same filesystem.

I am not sure what the standard says about it, but even if the standard says "do copy", for some of our workflows we'd rather drop CWL than accept copies.

It may be desirable to give the user the possibility between copying, symlinking or hardlinking. However, replacing a symlink by the pointed to file is a small problem. So symlinking as default seems to be a reasonable default.

For both linking approaches file ownership may be more of an issue, because the access rights are identical for all hard/softlinks to the same data.

cwltool and cwlexec not scattering jobs in same way

I have a simple workflow in which I have 2 inputs, one is type: File, and the other is an array of files. I want to run a command in which each of the files in the array are used against the single input file from input1.
When I run cwltool it performs as expected. It runs in serial each file in the array against the single input1.
When I run cwlexec it gives me no errors or message, and the job immediately terminates.

the tool is:


cwlVersion: v1.0
class: CommandLineTool

hints:
  SoftwareRequirement:
    packages:
      bedtools:
        version: [ "2.25.0" ]

inputs:
  outputGenomeCov:
    type: File
    inputBinding:
      position: 1
      prefix: -a

  regionsBedFile:
    type: File
    inputBinding:
      position: 2
      prefix: -b

  allPositions:
    type: string
    default: "-c"
    inputBinding:
      position: 3
      prefix: -c

outputs:
  allDepthOutput:
    type: File
    outputBinding: {glob: $(inputs.regionsBedFile.basename)_AtoB.txt}

stdout: $(inputs.regionsBedFile.basename)_AtoB.txt

baseCommand: [bedtools, intersect]

And the workflow is:


cwlVersion: v1.0
class: Workflow

requirements:
 - class: ScatterFeatureRequirement

inputs:
  outputGenomeCov: File
  regionsBedFile: File[]

outputs:
  intersectAB:
    type: File[]
    outputSource: intersect/allDepthOutput

steps:
  intersect:
    run: 2_bedtoolsIntersect.cwl
    scatter: regionsBedFile
    in:
      outputGenomeCov: outputGenomeCov
      regionsBedFile: regionsBedFile
    out: [allDepthOutput]

The .yml file:

outputGenomeCov:
  class: File
  path: /path/to/input.txt

regionsBedFile:
 - {class: File, path: /path/to/bedfile1.bed}
 - {class: File, path: /path/to/bedfile2.bed}
 - {class: File, path: /path/to/bedfile3.bed}

I have another workflow that has designated output files created from the program, but bedtools prints its outputs to stdout. The other workflow works well, but this fails.
Thanks,
Dennis

Clarification: Why not use "bsub -w"?

My understanding

As far as I can tell from perusal of the cwlexec source and the description of its behavior here in the README:

cwlexec obtains job dependency information from the CWL input file.
it then submits one job for each stage in parallel (bsub).
some kind of "job wrapper" then waits for any upstream dependencies to finish (bwait) and actually starts the user's job once all dependencies are completed (bresume).

(If I misunderstand, please correct me!).

My question

LSF has built-in job dependency monitoring via bsub -w. Why does cwlexec dynamically monitor dependency states instead of offloading the job to LSF?

As a note, this would have the side effect of permitting reasoning about the CWL job from the LSF side using bjdepinfo, which might be useful in its own right. Unless bjdepinfo already tracks dependencies listed by bwait -- does it?

cwlexec doesn't support inputs of type Array<enum>

cwlexec reports the following and exits:
The variable type of the field [type] is not valid, "a valid CWL type" is required.

if the input port is defined like that:

inputs:
  - id: applications
    type:
      type: array
      items:
        type: enum
        name: applications
        symbols:
          - PfamA
          - TIGRFAM

on the other hand cwltool & cwl-runner is accepting those type definitions.

coresMin used instead of ramMin

Hi,

We were testing using the ResourceRequirement field in the CWL document and noticed that when using ramMin the sub command submits -R mem>coresMin

Looks like the error is likely simply fixed here by replacing coresMin with ramMin: https://github.com/IBMSpectrumComputing/cwlexec/blob/e3c19121ac9ec8db24f09c542931345a43bb4ef0/src/main/java/com/ibm/spectrumcomputing/cwl/exec/service/CWLLSFCommandServiceImpl.java#L177

Attached is a test case. Even though ramMin is set to 100, it uses coresMin (either the given value or null if it is not given where it produces an error).

ramMinError.tar.gz

Failed to bind value

After implementing the workaround in #34, I was able to get past that step and ran into a new problem at the next step in the workflow. In trying to rebuild a minimal example for this from scratch, I've run into new problems at the same step as #34 again.

My workflow is

Split a bam file into R1/R2 fastq files (simulating this with a 'split_reads.sh' file so that my examples are not dependent on external software. This is accomplished with the split_reads.cwl CLT.
Scatter over multiple input files using scatter_split.cwl. I was hoping the workaround in #34 would let me get past this part.

3+) Continue simulating my real workflow to reproduce the issues I'm seeing.

Step 2, above, is where I ran into issues on #34. In rebuilding a different workflow, I am getting new issues. Specifically, my CLT and scatter_split.cwl workflow both work in cwltool, but the scatter_split.cwl WF fails with CWLEXEC with the error Failed to bind value for [R1_file], The value cannot be found..

I've compared this workflow with the working flow from #34 and I think they are very similar, so I'm not sure why this one is failing. Is this an issue in CWLEXEC or my own code? The script 02_scatter_split_reads.sh in the attached example should reproduce this issue.

BindValueFailure.tar.gz

dockerOptions.sh as documented fails on short options

the pre-exec script dockerOptions.sh

#!/bin/bash
for OPTION in $LSB_CONTAINER_OPTIONS
do
    echo $OPTION
done

works fine with long options (--env=VAR=value) but fails with short options (-e VAR=value) due to the whitespace between option and value. Instead it should simply print out $LSB_CONTAINER_OPTIONS as-is:

#!/bin/bash
echo "$LSB_CONTAINER_OPTIONS"

Failed to execute a workflow, when a step's input is from the previous step's cwl.output.json

When a step's input is from the previous step's cwl.output.json, e.g. there are two steps (s1 and s2), the input of s2 is from the output of s1 and the s1 outputs a cwl.output.json, it includes the s1 input filed, the cwlexec execute failed

Incorrect Null evaluation of file

#37 seems to be resolved now with my minimal example but when I ran my larger workflow again it still failed. After dissecting each piece I found that an unrelated parameter seems to be causing this same error.

When I change the example CLT / WF from #37 and allow the CLT to have an optional boolean and then edit the WF to interpret the boolean as z: {valueFrom: $(true)} I get the same std error I did in #37. However, the new z parameter is (I think) completely unrelated to the parameter involved with the std error.

UndefinedVariableError2.5.tar.gz

add the -n option to model.conf files

I have tried to setup an LSF.conf for a workflow that looks like this:

{
   "queue": "standard",
   "steps": {
       "step1": {
             "rerunnable": false,
             "res_req: "rusage[mem=20000]",
             "num_processors": 4
        }
   }
}

And it fails, so I stepped into your code here:
https://github.com/IBMSpectrumComputing/cwlexec/blob/master/src/main/java/com/ibm/spectrumcomputing/cwl/model/conf/FlowExecConf.java

and found that you have nothing that handles the -n option for LSF which would distribute to multiple processors.

I'm tagging as a feature enhancement because without this, you can't make a job distributable without hard-coding it in the cwl source, which we don't want backend users to have to do.

It seems like you could do it with adding:

    private int processors
    ...

    public int getProcessors(){
        return processors
    }
    /**
     * Sets the LSF resource requirement (-n) option
     *
     * @param processors
     *                     The LSF num_processors requirement option
     */
    public void setProcessors(int processors) {
          this.processors. = processors;
    }

To the files FlowExecConf.java and StepExecConf.java.

I don't know if there is anything else that needs changing but this seems like a great feature to add.

Unable to scatter over multiple arrays using a scatterMethod

I am trying to scatter over 2 arrays of the same size using scatterMethod: flat_crossproduct and CWLEXEC fails with:

com.ibm.spectrumcomputing.cwl.model.process.parameter.type.file.CWLFile cannot be cast to java.lang.CharSequence

This seems to happen when using any scatterMethod. A failed run with debug info is attached.

CWLFileCastError.tar.gz

Failed to evaluate the expression in the [std] field, string is required

The fix for #36 resolved that issue but now I am running into an issue when I try to redirect stdout to a file within the CommandLineTool being scattered over. I have modified that example to show that the first step seems to succeed, but the second step fails with

Failed to evaluate the expression "$(inputs.output_file)" in the [std] field, string is required.

UndefinedVariableError2.tar.gz

Bad Scatter in subworkflow

I have a CommandLineTool (foo.cwl), Workflow (inner.cwl) which calls foo.cwl, and another workflow (outter.cwl) which scatters over inner.cwl. Both foo.cwl and inner.cwl work as intended, but outter.cwl does not properly scatter over inner.cwl. That is, instead of scattering over the 'input_file' as it should, it simply calls 'foo' with all possible 'input_file's as input.

It should be:

foo --INPUT file1.txt
foo --INPUT file2.txt

but instead, it invokes foo as

foo --INPUT file1.txt file2.txt

This may be related to issue 20 which I saw was recently moved to 'enhancement' rather than 'known issue'. In either case I'm hoping we will be able to scatter over subworkflows in this way as it will be very useful for our pipelines.

In the attached 'BadScatter.tar.gz' you will see that '01_foo.sh' gives the expected output, '02_inner.sh' gives the expected output, but '03_outer.sh' fails and that the invocation is a single call to foo with all possible input files, though the intent is for them to be scatter over as described above.

BadScatter.tar.gz

File literal not written to safe path

Test [89/128] Test file literal as input

Test failed: /home/jenkins/cwlexec-0.1/cwlexec --outdir=/tmp/tmpgf1k1pyg --quiet v1.0/cat3-tool.cwl v1.0/file-literal.yml
Test file literal as input
Returned non-zero
Failed to write file "/common-workflow-language-master/v1.0/v1.0/file1-78a66506": /common-workflow-language-master/v1.0/v1.0/file1-78a66506 (Permission denied)

Here the user doesn't have permissions to write to /common-workflow-language-master/v1.0/v1.0/

Empty JS array

Following up on #38 I find that a new error is produced in my actual workflow which I have now duplicated here. Specifically, it looks like the JS interpreter is passing values as [] and so a java IndexOutOfBoundsException error is thrown when trying to evaluate the contents of the array. See lines 1739 and 1742 below (contained in cwlexec.out):

1739 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=output_file, type=string, value=[]) for process_reads
1740 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=threads, type=null, value=2) for process_reads
1741 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=K, type=null, value=NULL) for process_reads
1742 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=Y, type=null, value=[]) for process_reads
1743 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=k, type=null, value=NULL) for process_reads
1744 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=M, type=null, value=NULL) for process_reads
1745 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=R, type=null, value=NULL) for process_reads
1746 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=I, type=null, value=NULL) for process_reads
1747 12:41:59.518 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=fastq, type=File, value=File:/home/jmichael/cwl-workdir/b388f356-940f-4daf-9a99-c722488fc0d7/split_reads/scatter1/input1_R1.fastq.gz) for process_reads

UndefinedVariableError3.tar.gz

ScatterFeature does not accept list of files as input: Java ClassCastException thrown

Not sure if I am doing anything wrong, but it works that way with the CWL tool description reference implementation. Any advice would be appreciated.

Here is the workflow and a YAML job description:
issue_43.zip

command:
$ unzip issue_43.zip
$ cd issue_43
$ cwlexec -X -p --workdir /home/user/output/ cmsearch-multimodel-wf.cwl jobs/cmsearch-multimodel-wf.test.job.yaml

cwlexec stops and reports the following:

16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: 1000 for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: 1000 for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The input (id=covariance_model_database, type=File, value=File:/home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm) of step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The input (id=query_sequences, type=File, value=File:/home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa) of step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - Has Shell Command, build commands as:
[cmsearch, --tblout, mrum-genome.fa.cmsearch_matches.tbl, -o, mrum-genome.fa.cmsearch.out, --cpu, 1, --noali, --hmmonly, -Z, 1000, /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm, /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa]
java.util.ArrayList cannot be cast to com.ibm.spectrumcomputing.cwl.model.process.parameter.type.file.CWLFile
16:13:44.235 default [Thread-3] DEBUG c.i.s.cwl.exec.CWLExec - Stop cwlexec...
16:13:44.236 default [Thread-3] DEBUG c.i.s.cwl.exec.CWLExec - cwlexec has been stopped

Complete support for CWL 1.0

Complete support for CWL 1.0 as seen by the CWL Conformance test suite for cwlexec - https://ci.commonwl.org/job/cwlexec/

ExpressionTool cannot access input directory listing files

Hi,

I have a simple ExpressionTool (attached) I wanted to test with the recently added feature. It takes in a directory as input and return's the directory's listing as a file array. However, I get an error when running this that states

09:54:43.736 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFWorkflowRunner - Failed to capture output for job (directory_to_files): The file "/home/kbrown1/ExpressionToolDirectoryError/workdir/1bd21483-9530-48e9-bc2c-f7d129b428e9/2.tmp" cannot be accessed.

It also exits with exit code 0 instead of a non-zero exit code, but returns no output.

As far as I can tell it seems to be an issue with the input directory listing. I tested swapping the listing attribute for basename to simple return a string of the directory's name and this worked successfully. It seems to know what outputs to collect, just can't actually collect them.

ExpressionToolDirectoryError.tar.gz

Cannot support code fragment in experssion

If an argument valueFrom expression is code fragment, e.g.

arguments:
 - prefix: -c
   valueFrom: |
     import json
     fileString = []
     with open("$(inputs.inputFile.path)", "r") as inputFile:
          for line in inputFile:
               fileString.append(line)
     with open("cwl.output.json", "w") as output:
         json.dump({"fileString": fileString}, output)

cwlexec cannot be evaluated correctly

Fails to resolve dependencies when one subworkflow relies on output of another

Hi,

We have a workflow which has two steps. Each step calls a subworkflow that calls a command line tool. When the steps/subworkflows are independent of each other the run succeeds (two_input_workflow.cwl), but when one step relies on output of another step (one_input_workflow.cwl) it fails to resolve the workflow, giving the following error:

09:18:12.103 default [pool-2-thread-1] ERROR c.i.s.c.e.e.CWLInstanceSchedulerTask - Fail to run one_input_workflow (Failed to resolve the step (flow2) dependencies.)
09:18:12.105 default [pool-2-thread-1] ERROR c.i.s.c.e.e.CWLInstanceSchedulerTask - The exception stacks:
com.ibm.spectrumcomputing.cwl.model.exception.CWLException: Failed to resolve the step (flow2) dependencies.
	at com.ibm.spectrumcomputing.cwl.exec.util.CWLStepBindingResolver.resolveStepInput(CWLStepBindingResolver.java:178)
	at com.ibm.spectrumcomputing.cwl.exec.util.CWLStepBindingResolver.resolveStepInput(CWLStepBindingResolver.java:142)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.prepareStepCommand(LSFWorkflowStepRunner.java:158)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.resovleExpectDependencies(LSFWorkflowStepRunner.java:111)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.<init>(LSFWorkflowStepRunner.java:65)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.addSteps(LSFWorkflowRunner.java:272)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.<init>(LSFWorkflowRunner.java:99)
	at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.runner(LSFWorkflowRunner.java:92)
	at com.ibm.spectrumcomputing.cwl.exec.executor.CWLInstanceSchedulerTask.schedule(CWLInstanceSchedulerTask.java:76)
	at com.ibm.spectrumcomputing.cwl.exec.executor.CWLInstanceSchedulerTask.run(CWLInstanceSchedulerTask.java:62)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The full output (1inp.out) is attached along with both the failure and success example. This seems to happen with just subworkflows as far as we can tell.

SubworkflowDependenciesError.tar.gz

SingularityRequirement

We would like to run containers safely in a multi-user LSF cluster. Docker has many security issues stemming from dockerd being run as root. Singularity is an alternative that gains more and more popularity in Science.

In the foreseeable future we need a way to execute Singularity containers (and thus by Singularities features also Docker!) in our multi-user LSF cluster. Thus, we need a SingularityRequirement analogous to the DockerRequirement in e.g. cwtool.

path provided, yet "The field [path, location or contents] is required."

https://ci.commonwl.org/job/cwlexec/8/testReport/conformance_test_v1/0/Test_dynamic_resource_reqs_referencing_the_size_of_Files_inside_a_Directory/

From a new conformance test:
https://github.com/common-workflow-language/common-workflow-language/pull/690/files