fredhutch / diy-cromwell-server Goto Github PK
View Code? Open in Web Editor NEWA repo containing instructions for running a Cromwell server on Gizmo at the Fred Hutch. (Contact Amy Paguirigan for questions)
A repo containing instructions for running a Cromwell server on Gizmo at the Fred Hutch. (Contact Amy Paguirigan for questions)
the testWorkflows/README.md file lists workflows that are no longer in the repo
Caused by: com.mysql.cj.exceptions.InvalidConnectionAttributeException: The server time zone value 'PDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.
Seems to be an interaction between JDBC, Java8, and Mariadb. Easy fix (i.e. not mucking about server-side) involves specifying the time zone in the connection URL serverTimezone=UTC
(note extreme case sensitivity)
update and test with module when it's up
update singularity stuff with this info: https://cromwell.readthedocs.io/en/stable/tutorials/Containers/#singularity
consolidate additional parameter adjustments if needed
Let's shift to specifying the config file path when we kick off the server rather than as a Param. Will make testing/adapting easier later.
silly question: do you mean for the dirname testWorkflows/hellloSingularityHostname
to have 3 Ls in helllo?
hey,
again, I'm using the examples as a starting point for understand wdl syntax.
tiny picky thing: in the variantCalling-workflow.wdl script, the comment on line 68 (" # Get the basename, i.e. strip the filepath and the extension") - I don't think the script is doing that on line 69? Looks like it's just concatenating a couple of the specified inputs.
I'm sure I will want to learn how to do the filepath stripping thing sometime - I'll look elsewhere to find it.
Janet
hey!
I put this in Slack but realized that this is a better place for it.
I think memory requests made in the runtime{}
block of a wdl are being ignored when running on the local cluster.
I'll attach
(1) a demo WDL
(2) an options json file that goes with it, simply to prevent caching
(3) the script.submit
generated for the task. The CPU request is honored but the memory request is not part of the sbatch
command in script.submit.
Does that make sense?
thanks!
Janet
It looks like the stderr
(and likely stdout
) is getting pruned during execution when Singularity is used to run the script:
VERBOSE [U=0,P=3126] print() Set messagelevel to: 5
VERBOSE [U=0,P=3126] init() Starter initialization
DEBUG [U=0,P=3126] load_overlay_module() Trying to load overlay kernel module
DEBUG [U=0,P=3126] load_overlay_module() Overlay seems not supported by the kernel
DEBUG [U=0,P=3126] get_pipe_exec_fd() PIPE_EXEC_FD value: 9
VERBOSE [U=0,P=3126] is_suid() Check if we are running as setuid
VERBOSE [U=0,P=3126] priv_drop() Drop root privileges
DEBUG [U=34152,P=3126] init() Read engine configuration
tail: shard-1/execution/stderr: file truncated
DEBUG [U=34152,P=3126] Master() Child exited with exit status 0
Note the "file truncated" message. The lines prior to this is output from singularity -d exec
. The resultant stderr
file in the execution directory doesn't contain any of the lines above the "file truncated" message.
I believe this happens when the execution script redirects stdout and stderr. The submit script uses Slurm options to save stdout/stderr to files in the execution directory. From a representative output script (script.submit
):
-o /fh/scratch/delete10/_HDC/user/mrg/cromwell-root/cromwell-executions/parseBatchFile/b9e0a739-23e3-485d-8bab-857349697458/call-test/shard-1/execution/stdout \
-e /fh/scratch/delete10/_HDC/user/mrg/cromwell-root/cromwell-executions/parseBatchFile/b9e0a739-23e3-485d-8bab-857349697458/call-test/shard-1/execution/stderr \
When the script is run (script
) the following lines are truncating any output occurring between the startup of the job on the node and the execution of the script in the container (particularly Singularity startup messages):
tee '/cromwell-executions/parseBatchFile/b9e0a739-23e3-485d-8bab-857349697458/call-test/shard-1/execution/stdout' < "$outb9e0a739" &
tee '/cromwell-executions/parseBatchFile/b9e0a739-23e3-485d-8bab-857349697458/call-test/shard-1/execution/stderr' < "$errb9e0a739" >&2 &
Note that in the container (where script
is run) the path /fh/scratch/delete10/_HDC/user/mrg/cromwell-root/cromwell-executions
has been mounted on /cromwell-executions
). tee
(without other options) will truncate output prior to writing to the file. This removes any output between job start and script execution.
I suspect that adding -a
to the tee
command will fix this problem.
Put in the readme and in teh docs in the Wiki that when using Cromwell right now (v49), in our config we have a line to create softlinks instead of hardlinks for globs.
Currently that means that you can use globs in outputs when you are using env modules, BUT that will break when you use a docker/singularity container. In that case, in your task where you need to glob the output, change it to this:
cp files-to-glob .
ls <glob pattern> > outputFiles.txt
------
output {
Array[File] outputGlobFiles = read_lines("outputFiles.txt")
}
The current mechanism puts the database connection string (including username and password) in the process stack where it is viewable by anyone on the system. This is somewhat undesirable- though I don't know precisely what harm could come, general "best practices" would suggest we improve this.
I've done a little work and it looks like we may be able to specify database connection parameters in a file separate from the general Cromwell config (e.g. fh-slurm-sing-cromwell.conf
) and use HOCON's include
directive:
include required(classpath("application"))
include required(file("database.conf"))
###### FH Slurm Backend, with call caching, without docker/singularity
....
Where database.conf
contains the database
section as specified by Cromwell:
database {
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "com.mysql.cj.jdbc.Driver"
url = "jdbc:mysql://mydb:32222/cromdb?rewriteBatchedStatements=true&serverTimezone=UTC"
user = "username"
password = "password"
connectionTimeout = 5000
}
}
The path above doesn't have to be in the current directory- my understanding is that we could specify any path for that, though I'm not sure if things like "~" are expanded.
When using singularity exec
within tasks executed within scatter there is a race condition when the Docker/Singularity image isn't in the cache. On NFS mounted home directories this apparently results in an error "stale NFS file handle"
To replicate it's necessary to remove the images from the Singularity cache (~/.singularity
). It also is difficult to replicate with simple images (e.g. ubuntu
). The Broad's GATK image seems to reproduce this error fairly reliably.
==> shard-0/execution/stderr <==
2020/04/10 07:06:02 debug unpacking entry path=root/.conda root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=53
2020/04/10 07:06:02 debug unpacking entry path=root/.conda/pkgs root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=53
2020/04/10 07:06:02 debug unpacking entry path=root/.conda/pkgs/urls root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=48
2020/04/10 07:06:02 debug unpacking entry path=root/.conda/pkgs/urls.txt root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=48
2020/04/10 07:06:02 debug unpacking entry path=root/.gradle root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=53
2020/04/10 07:06:02 debug unpacking entry path=root/gatk.jar root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=50
2020/04/10 07:06:02 debug unpacking entry path=root/run_unit_tests.sh root=/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e type=48
DEBUG [U=34152,P=14608] Full() Inserting Metadata
DEBUG [U=34152,P=14608] Full() Calling assembler
INFO [U=34152,P=14608] Assemble() Creating SIF file...
DEBUG [U=34152,P=14608] cleanUp() Cleaning up "/loc/scratch/46802618/rootfs-3371b095-7b34-11ea-ae11-002590e2b58e" and "/loc/scratch/46802618/bundle-temp-539962740"
FATAL [U=34152,P=14608] replaceURIWithImage() Unable to handle docker://broadinstitute/gatk@sha256:0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa uri: unable to build: while creating SIF: while creating container: writing data object for SIF file: copying data object file to SIF file: write /home/mrg/.singularity/cache/oci-tmp/0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa/gatk@sha256_0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa.sif: stale NFS file handle
==> shard-1/execution/stderr <==
2020/04/10 07:06:13 debug unpacking entry path=root/.conda root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=53
2020/04/10 07:06:13 debug unpacking entry path=root/.conda/pkgs root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=53
2020/04/10 07:06:13 debug unpacking entry path=root/.conda/pkgs/urls root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=48
2020/04/10 07:06:13 debug unpacking entry path=root/.conda/pkgs/urls.txt root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=48
2020/04/10 07:06:13 debug unpacking entry path=root/.gradle root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=53
2020/04/10 07:06:13 debug unpacking entry path=root/gatk.jar root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=50
2020/04/10 07:06:13 debug unpacking entry path=root/run_unit_tests.sh root=/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824 type=48
DEBUG [U=34152,P=3126] Full() Inserting Metadata
DEBUG [U=34152,P=3126] Full() Calling assembler
INFO [U=34152,P=3126] Assemble() Creating SIF file...
VERBOSE [U=34152,P=3126] Full() Build complete: /home/mrg/.singularity/cache/oci-tmp/0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa/gatk@sha256_0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa.sif
DEBUG [U=34152,P=3126] cleanUp() Cleaning up "/loc/scratch/46802619/rootfs-4740289f-7b34-11ea-ad57-002590e2b824" and "/loc/scratch/46802619/bundle-temp-701530895"
VERBOSE [U=34152,P=3126] handleOCI() Image cached as SIF at /home/mrg/.singularity/cache/oci-tmp/0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa/gatk@sha256_0dd5cb7f9321dc5a43e7667ed4682147b1e827d6a3e5f7bf4545313df6d491aa.sif
DEBUG [U=34152,P=3126] execStarter() Checking for encrypted system partition
.... output trimmed- log indicates this shard ran the container....
hey Amy,
I'm starting to look more closely at some actual wdl files to understand that end of things.
In testWorkflows/localBatchFileScatter
, there is a local sample.batchfile.tsv
file, but it's not being used.
The parse.inputs.json
instead points to a copy of sample.batchfile.tsv
here: /fh/fast/paguirigan_a/pub/ReferenceDataSets/workflow_testing_data/WDL/batchFileScatter /sample.batchfile.tsv
, which uses files hosted on S3 rather than on /fh/fast
I think that's not what you intend - I'm guessing this workflow won't work for people who haven't got their AWS credentials set up: does that make sense?
Janet
Commit used: 77c318a
"workflowName": "hello_hostname",
"workflowProcessingEvents": [
{
"cromwellId": "cromid-53866a4",
"description": "PickedUp",
"timestamp": "2020-04-07T21:16:47.895Z",
"cromwellVersion": "47"
},
{
"cromwellId": "cromid-53866a4",
"description": "Finished",
"timestamp": "2020-04-07T21:16:47.915Z",
"cromwellVersion": "47"
}
...
"status": "Failed",
"failures": [
{
"causedBy": [],
"message": "/fh/scratch"
}
This is what is in the failure message for my workflow using the baseConfig right now. I'm loading version 49 and via the API when I request the version of the server running, I get V49, but in the workflow itself, I'm still getting version 47 which was the previous one we were using.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.