ewels / clusterflow Goto Github PK

View Code? Open in Web Editor NEW

97.0 14.0 27.0 6.67 MB

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.

Home Page: https://ewels.github.io/clusterflow/

License: GNU General Public License v3.0

Perl 92.64% R 0.83% Python 3.55% HTML 2.97%

clusterflow bioinformatics pipeline bioinfomatics-pipeline perl

clusterflow's Introduction

A user-friendly bioinformatics workflow tool

Cluster Flow is now archived

This project is no longer under active maintenance. You're welcome to use it, but no updates or bug fixes will be posted. We recommend using Nextflow together with nf-core instead.

Many thanks to everyone who used and supported Cluster Flow over the years.

Find Cluster Flow documentation with information and examples at https://ewels.github.io/clusterflow/

Cluster Flow is a pipelining tool to automate and standardise bioinformatics analyses on high-performance cluster environments. It is designed to be easy to use, quick to set up and flexible to configure.

Cluster Flow is written in Perl and works by launching jobs to a cluster (can also be run locally). Each job is a stand-alone Perl executable wrapper around a bioinformatics tool of interest.

Modules collect extensive logging information and Cluster Flow e-mails the user with a summary of the pipeline commands and exit codes upon completion.

Installation

You can find stable versions to download on the releases page.

You can get the development version of the code by cloning this repository:

git clone https://github.com/ewels/clusterflow.git

Once downloaded and extracted, create a clusterflow.config file in the script directory, based on clusterflow.config.example.

Next, you need to add the main cf executable to your PATH. This can be done as an environment module, with a symlink to bin or by adding to your ~/.bashrc file.

Finally, run the setup wizard (cf --setup) and genomes wizard (cf --add_genome) and you're ready to go! See the installation docs for more information.

Usage

Pipelines are launched by naming a pipeline or module and the input files. A simple example could look like this:

cf sra_trim *.fastq.gz

Most pipelines need reference genomes, and Cluster Flow has built in reference genome management. Parameters can be passed to modify tool behaviour.

For example, to run the fastq_bowtie pipeline (FastQC, TrimGalore! and Bowtie) with Human data, trimming the first 6bp of read 1, the command would be:

cf --genome GRCh37 --params "clip_r1=6" fastq_bowtie *.fastq.gz

Additional common Cluster Flow commands are as follows:

cf --genomes     # List available reference genomes
cf --pipelines   # List available pipelines
cf --modules     # List available modules
cf --qstat       # List running pipelines
cf --qdel [id]   # Cancel jobs for a running pipeline

Supported Tools

Cluster Flow comes with modules and pipelines for the following tools:

Read QC & pre-processing	Aligners / quantifiers	Post-alignment processing	Post-alignment QC
FastQ Screen	Bismark	bedtools (`bamToBed`, `intersectNeg`)	deepTools (`bamCoverage`, `bamFingerprint`)
FastQC	Bowtie 1	subread featureCounts	MultiQC
TrimGalore!	Bowtie 2	HTSeq Count	phantompeaktools (`runSpp`)
SRA Toolkit	BWA	Picard (`MarkDuplicates`)	Preseq
	HiCUP	Samtools (`bam2sam`, `dedup`, `sort_index`)	RSeQC (`geneBody_coverage`, `inner_distance`, `junction_annotation`, `junction_saturation`, `read_GC`)
	HISAT2
	Kallisto
	STAR
	TopHat

Citation

Please consider citing Cluster Flow if you use it in your analysis.

Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].
Philip Ewels, Felix Krueger, Max Käller, Simon Andrews
F1000Research 2016, 5:2824
doi: 10.12688/f1000research.10335.2

@article{Ewels2016,
author = {Ewels, Philip and Krueger, Felix and K{\"{a}}ller, Max and Andrews, Simon},
title = {Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].},
journal = {F1000Research},
volume = {5},
pages = {2824},
year = {2016},
doi = {10.12688/f1000research.10335.2},
URL = { + http://dx.doi.org/10.12688/f1000research.10335.2}
}

Contributions & Support

Contributions and suggestions for new features are welcome, as are bug reports! Please create a new issue. Cluster Flow has extensive documentation describing how to write new modules and pipelines.

There is a chat room for the package hosted on Gitter where you can discuss things with the package author and other developers: https://gitter.im/ewels/clusterflow

If in doubt, feel free to get in touch with the author directly: @ewels ([email protected])

Contributors

Project lead and main author: @ewels

Code contributions from: @s-andrews, @FelixKrueger, @stu2, @orzechoj @darogan and others. Thanks for your support!

License

Cluster Flow is released with a GPL v3 licence. Cluster Flow is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. For more information, see the licence that comes bundled with Cluster Flow.

clusterflow's People

Contributors

Stargazers

Watchers

clusterflow's Issues

Remove SGE vf memory option

Remove the option to use vf for memory assignment. Always use h_vmem instead.

@stu2 - is this going to mess up your work flow at all, or is this ok?

Write methylation coverage analysis module

Move the current coverage code out of bismark_methXtract module and in to own module.
Move the genome wide coverage report stuff from bismark_methXtract and just run coverage2cytosine instead.
Use new code written into ngi_visualizations for coverage analysis.

Update documentation

When we go for the version release, update the documentation.

Move --qstatcols to config

The --qstatcols command line flag should really be a configuration option. Move this into a config file instead.

Custom modules in ~/clusterflow/modules aren't found

Typo whereby custom user modules kept in the home clusterflow directory aren't found. Seems to be because there's a bug in clusterflow where it's looking in cf instead of clusterflow

Add contributing instructions.

Add a CONTRIBUTING.md file.

Simplify command line flags

There are now quite a lot of command line flags, and many can be simplified.

For instance, does --list_pipelines really need to be --list_pipelines or can it just be --pipelines? Also see if any others can be moved into config files.

trim_galore variable declaration bug

In the trim_galore module there's a bug where $stringency ends up undefined when it's not specified in the options. The fix in this case is just to declare the variable as the empty string at the top of the file.

Make UPPMAX genomes file

The uppmax file structure makes the wizard really difficult to use. Make a pre-cast genomes.config file for people to use.

Extend --make_config into --setup

At the moment, CF recommends that you manually add a bash alias to use qs and qsa instead of cf --qstat and cf --qstatall

Could be nice to get the cf --make_config wizard to look for these aliases using which and then append them to .bashrc if the user would like.

cf --setup broken

 $  source /home/randeer/.bash_profile
-bash: /home/randeer/.bash_profile: line 17: unexpected EOF while looking for matching `"'
-bash: /home/randeer/.bash_profile: line 23: syntax error: unexpected end of file

Errors generated by summary modules are not highlighted

The cf_all_runs_finished module isn't picking up errors (or highlights, I imagine) from summary module output. As such there is no highlight in the email when it fails.

Run file filename option

Add command line option to use a custom run file filename. The current run file filenames are called input_pipeline.run (where input is the first input file filename and pipeline is the pipeline being run). This is a problem if the same file is being used multiple times, as the runfile will be overwritten.

Enabling a custom filename means that clusterflow can be run many times in parallel.

Merge BI changes into main development flow

@s-andrews and @FelixKrueger - it seems that I can create a Pull Request from your fork to mine, even without push permissions on your fork. It looks like this will delete a lot of my additions though, so I need to figure out how to bring your changes in without wiping everything I've done since you forked.

Summary modules

Cluster Flow currently has a single hard-coded summary module which runs when all jobs in a pipeline are finished: cf_all_runs_finished.cfmod. This module effectively knits together all of the parallelised dependencies so that I can send a single report e-mail.

It would be nice if users could write their own modules with similar post-parallelisation functionality. Initially this could be for custom report generation within specific pipelines, though it could be extended to more complicated tasks in the future (eg. merging files and then running downstream modules on the result).

I think that the best way to handle this is by using a second special character. # is currently used to denote a module, so perhaps > could denote a module with collecting function. eg:

#trim_galore
    #bowtie
        >bowtie_report

..supplied with 3 files:

- trim_galore - bowtie \
- trim_galore - bowtie \
- trim_galore - bowtie \
                        - bowtie_report

This would partially future proof for subsequent modules. For example, the following would work:

#trim_galore
    #bowtie
        >merge_alignments
            >further_processing
                >report

..supplied with 3 files:

- trim_galore - bowtie \
- trim_galore - bowtie \
- trim_galore - bowtie \
                        - merge_alignments - further_processing - report

Note that this extension will not support splitting or partial merging. It's all or nothing.

Once a > module is used in a pipeline, no further # modules can be specified (doing so will raise an error). All subsequent summary modules will be processed in series (indentation will be ignored). In other words, the following would not work:

#trim_galore
    #bowtie
        >merge_alignments
            #further_processing_1
            #further_processing_2
            #further_processing_3
                >report

If written like this, summary modules will have to be supplied with multiple run files. I'm not sure how to handle this yet whilst maintaining compatibility with the way that modules currently work.

Add HPC option for local machine

Would be nice to be able to run pipelines locally instead of having to always submit jobs to the cluster.

Could perhaps write a bash file with the commands, in order. Then submit:

nohup bash cf_commands.sh &

Rewrite genome paths

Currently, genome paths are handled in a very inflexible way. Basically, a genome / bowtie / bowtie2 / gtf path has to be specified and each is dealt with manually. This is bad as it's difficult if not impossible to add other types of genome indices (bismark / other aligners) and involves quite a bit of code duplication.

Allow any @reference tag in the genomes.config file
Parse these into a single hash so we don't care what the tag was
Allow pipelines and modules to use the @require_reference <type> tag for anything that's in the hash
Update pipelines and modules to use this syntax
Alter modules to check for and include paths in a consistent manner

With this, the current listing and genomes wizard will have to be updated:

Listing can be much less verbose and more agnostic to tag name
Wizard should still start by asking for assembly / genome id but then allow any tag
Wizard should start by iterating through a directory and checking for known indices: .fa, .ebwt, .bt2 files, .gtf files and any other known structure
This will make wizard way faster and more accurate.
Confirm each detected path so that we can ignore bogus ones.

Create a test to run once installed

Set up a demo with some example data. Run through in docs.

Change downloads module behaviour

Currently, the downloads module takes URL and filename as parameters on the command line when called. It'd be better to do this as regular filenames in the run file - less parameters, also other modules will know how many files we're starting with.

methylkit module

Could be cool to write a summary module for methylation data using methylkit (see docs).

Especially nice would be the sample correlation plot and the clustering.

Maybe this would be a candidate for the first Cluster Flow module written in R?

Add config option for job parallelisation

On some very busy clusters, Cluster Flow can be slow due to it's style of submitting a separate queue job for every module. Whilst this is in theory the fastest way to process jobs because multiple steps in the same pipeline run can run side by side, in practice it can mean that the pipeline sits and waits for ten hours in the queue waiting to run a 10 second module which sends the completion e-mail.

So, to get around this, I could add a config option to choose how to parallelise. Three options: per_module (default), per_run and per_pipeline.

I have recently added functionality for Cluster Flow to run locally by running bash scripts. This behaviour could be extended to run the entire pipeline in series (per_pipeline) by submitting this bash script to the queue as a single job. For per_run, CF could write multiple bash scripts in this style and submit them each into a single queue job, with one final job containing summary modules, dependent on the others.

featureCounts

It'd be good to have a module for featureCounts (part of the subread package).

Perl error

Undefined subroutine &CF::Constants::dirname called at /pica/h1/randeer/projects/cluster-flow/clusterflow/source/CF/Constants.pm line 621, <STDIN> line 5.

warning about no "bowtie2 path defined" when you're not using a genome

There's a spurious warning about not having a bowtie2 path defined when you're not using a genome. This is from L549 in cf where the test is a simple else but should either be removed all together, or should be changed to elsif (defined ($GENOME))

Warning when no e-mail is set?

This warning is a bit ugly - print a nice warning which explains what's wrong and how to fix it.

Use of uninitialized value $EMAIL in concatenation (.) or string at /fastspace/bioinfo_apps/clusterflow_v0.1/cf line 481.

Bismark methylation extractor error

Bismark methylation extrator module threw the following warning and later failed:

Reference found where even-sized list expected at /bi/apps/clusterflow/0.4_devel/modules/../source/CF/Helpers.pm line 262.

This somehow seems to then lead to bismark not being given a reference genome:

bismark_methylation_extractor --multi 4  --ignore_r2 2 --bedGraph --counts  --cytosine_report --gzip -p --no_overlap --report  lane2_5pc_SCM_ox_ATCACGTT_L002_R1_val_1.fq.gz_bismark_bt2_pe.deduplicated.bam

Please specify a genome folder to proceed (full path only)

Note - I was messing around with this command recently to get it to do the coverage analysis, so the first warning could be a red herring.

Add submitted job IDs to a log file

Hi Phil, we recently had issues with CF runs failing for various reasons, probably mostly memory related issues. Once the jobs were killed there is no way of finding out what the job ID was and this made it almost impossible to find the folder (difficult to track when hundreds of single-cell RNA-Seqs are running at the same time...) or get information of what happened to the job using qacct -j <ID>. Something like this would be helpful:

Your job 259132 ("cf_bismark_1423231738_bismark_align_295") has been submitted
Your job 259133 ("cf_bismark_1423231738_bismark_deduplicate_005") has been submitted
Mmmkay?

Use command line argument flags instead of positions for modules

At the moment, modules receive most of the information that they need in a positional @ARGV array. This is a bit messy and easy to get into trouble with.

It would be nicer to use --flags to do this instead. However, these flags really need to be processed within the Helpers.pm function load_runfile_params(). Individual modules need to process some flags individually first though (--cores and --mem and so on).

Can we partially pass the options in the module first and then do the remaining ones in the load_runfile_params() function? Maybe this could work?

use Getopt::Long;
Getopt::Long::Configure("pass_through");

Then minimal module changes will be needed (just the extra config).

This all clears the way to make the summary modules work easier.

Specify expected time for modules

Most HPC systems have an option to specify expected job execution time, which affects queuing. At the moment, Cluster Flow basically ignores this (SLURM jobs default to two days for example), which hurts our queue performance.

It would be nice if modules are able to specify how long they thing they'll take to run. For SLURM jobs, if this is less than 15 minutes we can get the bonus of specifying --qos=short which will make them much faster.

cf --qstatall throws errors

cf --qstatall currently produces a few hundred errors such as:

Deep recursion on subroutine "CF::Headnodehelpers::parse_job_dependency_hash" at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 298.
Deep recursion on subroutine "CF::Headnodehelpers::parse_job_dependency_hash" at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 298.
...
Use of uninitialized value in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 459.
Use of uninitialized value in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 459.
...

Using GRID Engine and Cluster Flow 0.4_devel

Move documentation to github

Transfer the documentation to github wiki, makes it easier to keep it up to date.

Write UPPMAX-specific install wizard

Write a config wizard script for UPPMAX users which asks for project ID and other site specific things.

Alias this in as an extra command line flag as part of the module system perhaps? eg.:

alias 'cf --uppmax'='/usr/bin/modules/clusterflow/custom/uppmax_wizard.pl'

Write HTML log e-mail to file.

The main log file is very informative, but very long. The summary HTML e-mail is nice, but very temporary. Would be good to save it to disk!

Error with Bismark methylation extractor gwcov option

The gwcov option (#bismark_methXtract gwcov) tries to find scripts in your home directory and thus fails. Cheers, Felix

Error! Can't find capture regions BED file - running without capture information.
Error! Couldn't find find the coverage script: /home/phil/scripts/visualizations/bismark/bismark_coverage_curves.pl
Error! Couldn't find find the windows script: /home/phil/scripts/visualizations/bismark/bismark_window_sizes.pl

RSeQC

Module for RSeQC perhaps?

Edit: This probably isn't a single module, rather a group of several:

rseqc_inner_distance
rseqc_geneBody_coverage
rseqc_read_GC
rseqc_junctions (junctions_annotation.py and junction_saturation.py together in one module)
rseqc_read_distribution
rseqc_read_duplication
rseqc_tin?

Add support for running summary modules on command line

Currently, you can run a single module instead of a pipeline from the command line. However, it is always run as a regular module, so this is not compatible with summary modules. This is especially problematic when modules can be both normal and summary modules.

Simple to add - allow module names to be prefixed with either # or > when used on the command line (may need quotes). A dynamic pipeline is written from this, so everything should then work downstream.

Put in checks that module files exist

Make nicer errors + exit when module files can't be found. Currently this:

Can't exec "/fastspace/bioinfo_apps/clusterflow_v0.1/modules/Select_CoreGenes_from_cd_hit_results_9.cfmod": No such file or directory at /fastspace/bioinfo_apps/clusterflow_v0.1/cf line 826.`

Support for pymodules?

Hello,

I administer a SLURM cluster at a small research institute and I would love to use clusterflow to start standardizing our pipeline runs. That said, we use pymodules (https://web3.ccv.brown.edu/mhowison/pymodules/) instead of modules to manage our software. As far as I can gather, the syntax are nearly identical but there are some additional benefits to pymodules that I would hate to give up. Do you think that support for pymodules could be done? Thank you very much.

Sincerely,
Adrian Reich

No fastq_RRBS pipeline

There is a sra_bismark_RRBS but no fastq_bismark_RRBS. This is historical but lacking - duplicate the sra version but remove the sra_dump step.

Add @merge_regex option to config

Would be cool if Cluster Flow could automatically recognise fastq files that should be merged before processing.

Add @merge_regex to config so that people can define their own filename matches
Write new module to merge fastq files (gzipped and normal)
Update core to look for matches
- Group files accurately, taking into account --split_files
- Prepend merging module onto pipeline

htseq_counts cannot accept gtf files with 'gene_id' field

Hi Phil,

I've been trying to use htseq_counts.cfmod with the mouse genome .gtf file from mm10. In this gtf file the gene is marked by gene_id not ID. However, ID is hardcoded into the command that htseq_counts produces (line 103: -i 'ID' ), so it chokes. Could this be made into an argument for flexibility?

Cheers

edit: storing the ID tag in the clusterflow genome info for that genome/gtf set might be even better.

Make an example Python module

Probably start off by converting an existing module to Python for testing. Then generalise this to be an example module.

Clean up unrecognised output

For every run, there is always some unrecognised output which is highlighted in the log file. Find out where this is coming from, and sort it out.

Also, pad the -> and <- with some spaces.

Build UPPMAX jobstats module

It would be nice to build a module to look at the performance of modules - if they're using the resources that they allocate, especially time.

Uppmax have some nice tools to look at this. I think that a module could scrape the submission log text file to get Job IDs and submission resources, then compare this to what was actually used.

For example:

finishedjobinfo -j 5046208

gives: (split onto multiple lines for readability)

2015-04-13
16:28:09
jobid=5046208
jobstate=COMPLETED
username=phil
account=b12345678
nodes=m163
procs=2
partition=core
qos=normal
jobname=cf_samtools_sort_index_1428934402_samtools_sort_index_452
maxmemory_in_GiB=5.4
maxmemory_node=m163
timelimit=04:00:00
submit_time=2015-04-13T16:13:48
start_time=2015-04-13T16:16:22
end_time=2015-04-13T16:28:09
runtime=00:11:47
margin=03:48:13
queuetime=00:02:34

Also can run jobstats:

jobstats -p -v -d 5046208

Gives:

jobid   cluster jobstate    user    project jobname endtime runtime flag_list   booked  core_list   node_list   jobstats_list
5046208 milou   COMPLETED   phil    b2013064    cf_samtools_sort_index_1428934402_samtools_sort_index_452   2015-04-13T16:28:09 00:11:47    .   2   2   m163    /sw/share/slurm/milou/uppmax_jobstats/m163/5046208

And this:

Ideally, the module could run some custom stats and then build a summary HTML report or something? Maybe e-mail this? Could be cool if it could log some stats centrally somewhere too.

Finally, this may need an extra feature in CF to have the option to always append or prepend modules to every pipeline. Probably a good thing to have anyway, and should be a relatively easy config addon?

Rewrite job negotiation / module command line handling

Currently, each module is queries for each file group three times:

Each file group
- Each module
  - Number of cores?
  - Amount of memory?
  - Environment modules?

I want to add support to allow the module to predict the time that the job will take (see #45), which will make a fourth. This is starting to take a noticeable amount of time and is very inefficient.

Whilst I think we still need to query each module for each file group (the number of files / file size of input and other variable can vary across file groups), we certainly don't need to call the module four times for four variables (maybe more in the future).

Instead, I'd like to rebuild this system to a more robust, scaleable methodology. Aims and requirements:

One call to module per file group
As many request parameters as we like (scalable)
Return as many key: value pairs as the module likes
- If we miss a requested variable, use a sensible default
Can't be language specific (modules can be written in any language)

My suggestion is that we combine the current calls into one, eg:

module.cfmod.pl --mem $TOTAL_MEM --cores $TOTAL_CORES --modules --runfn $runfn

a helper function then parses these and if any of the 'request' parameters are found, the module returns a hash in some standard format: JSON, YAML, ini, XML etc. on STDOUT. eg:

{
  "cores": 16,
  "memory": "64G",
  "modules": ["bowtie", "samtools"],
  "time": "6:00:00"
}

This can then be interpreted by Cluster Flow for job submission. If none of these command line flags are present, the module will be run in executing mode (unless --help was there).

My preferred response would be JSON as in the example above. I think it's fairly clear and simple and widely supported. However, Perl doesn't seem to have a JSON parsing module as part of the core distribution and I don't like introducing new dependencies.

@s-andrews, @FelixKrueger, @stu2: Does anyone have any thoughts or suggestions on the above?

Whilst doing this rewrite I'd also make a load of new command line flags for execution time (--runfile, --job_id, --prev_job_id, --cores, --mem, multiple --param) instead of the current positional @ARGV mess (see #29).

Add command line flag for running locally

Add a command line to run this pipeline in local mode.. Maybe without the bash script stuff?

cf --qdel throws errors

Deleting pipelines sometimes throws errors:

Deleting jobs from Pipeline id trim_galore_1415031486

Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
Use of uninitialized value $pipelinekey in string eq at /bi/apps/clusterflow/0.4_devel/source/CF/Headnodehelpers.pm line 574.
3 jobs deleted:
fkrueger has deleted job 228541
fkrueger has deleted job 228542
fkrueger has deleted job 228543

warning about undefined $GENOME

There's a warning about undefined $GENOME when launching pipelines which don't need it. This is from L891 in cf and a couple of other places around there. Because no genome is specified then the interpolation there is wrong. I initially thought you'd just need to make $GENOME an empty string, but I think it's worse than that since you'll effectively end up passing the wrong number of arguments through to the memory request function if $GENOME is empty. I haven't looked at this thoroughly enough to see what the right fix is but I guess you'll probably know.

Stop cf from outputting every file name at launch

cf is usually run with a lot of files now. Currently it shows a 1 line warning about each one as it progresses, but this means that the output at the start is never seen.

Instead, scale these progress lines back to something more subtle that takes up less space. Maybe by using carriage returns (\r) instead of newlines (\n) or something?

Genomes wizard BWA bug

The --add_genome wizard seems to have a bug with the BWA addition, it added a Fasta file instead of a BWA file.

Also add STAR to the wizard?

ewels / clusterflow Goto Github PK

clusterflow's Introduction

A user-friendly bioinformatics workflow tool

Cluster Flow is now archived

Installation

Usage

Supported Tools

Citation

Contributions & Support

Contributors

License

clusterflow's People

Contributors

Stargazers

Watchers

Forkers

clusterflow's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs