phac-nml / galaxy_tools Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 25.0 86.86 MB

Contains a set of Galaxy Tools mostly written by the Bioinformatics core at NML

License: Apache License 2.0

Shell 1.17% Python 9.40% Perl 78.28% R 9.63% HTML 1.52%

galaxy_tools's People

Contributors

Stargazers

Watchers

galaxy_tools's Issues

Update staramr to version 0.8.0 in Galaxy

Update the staramr Galaxy tool to version 0.8.0 (when released).

GitHub CI Failures

The GitHub CI may need to be updated. Test failures seem to be related to how docker is being run:

https://github.com/phac-nml/galaxy_tools/actions/runs/6030253372/job/16362390919?pr=239

Job in error state.. tool_id: staramr_search, exit_code: 125, stderr: docker: invalid spec: ::rw: empty section between colons.

See 'docker run --help'.

.

It may involve updating our pr.yaml to be consistent with the galaxyproject version.

getmlst tool on Galaxy

Hi all,

We would like to request you to update getmlst tool on galaxy repo https://toolshed.g2.bx.psu.edu/repository?repository_id=5634a03143d6f21d.

Problem is the tool outputs the profile sequences but it doesn't give an output on MLST definitions for any species. Fixing the tool would be very useful for workflow development of MLST analysis.

Thanks,
Jayanthi Gangiredla

Biohansel does not always correctly pair dataset collections in output

Not having an extension (ex .fastq) when making a collection of reads as an input for biohansel results in the final output not having the files paired even though the dataset collection does.

If I had to guess (and if I remember correctly) this is likely due to how the get_paired_fastq_filename function interacts with the $input.paired_collection.<forward|reverse> name which I believe utilizes the underlying name from the dataset used to make the collection and not the name in the collection itself

Example follows:

Dataset 1 -> Correctly pairs output:

File names used to make up the paired collection:
- TestX_R1.fastq && TestX_R2.fastq
- TestY_R1.fastq && TestY_R2.fastq
Paired Collection (looks the exact same as Dataset 2:
Output:

TestX | heidelberg | 0.5.0

Dataset 2 -> Outputs are separated

File names used to make up the paired collection:
- TestX_R1 && TestX_R2
- TestY_R1 && TestY_R2
Paired Collection:
Output:

TestX_R1 | heidelberg | 0.5.0
TestX_R2 | heidelberg | 0.5.0

Spatyper Tool on Galaxy

Write wrapper for basic usage

spaTyper -f sequence.fasta -r sparepeats.fasta

Updates to spatyper based on user feedback

There are 2 key improvements that our spatyper wrapper could use.

Update the “repeats library multifasta” to be automatically downloaded from Ridom.
The output from the spatyper tool is currently a collapsed version. The command output should include Sample #, Repeats and Type.

bio_hansel unique file name not present in output files.

When running a galaxy workflow that includes bio_hansel, the file names are replaced with generic placeholders so you cannot differentiate between SRRs.

Update stringmlst to use conda

stringmlst tests are comment out as they do not use conda, so we cannot run them on travis ci.

biohansel cannot handle whitespace in full path for subtype metadata file

https://github.com/phac-nml/galaxy_tools/blob/master/tools/biohansel/biohansel.xml#L121

Just need quotes around it please.

The "refseq_masher" tool fails when there are spaces in the input dataset name

assembly_stats_txt.xml still at v1.0.1

https://github.com/phac-nml/galaxy_tools/blob/master/tools/assemblystats/assembly_stats_txt.xml starts:

<tool id="assemblystats" name="assemblystats" version="1.0.1">
	<description>Summarise an assembly (e.g. N50 metrics)</description>
	<requirements>
...

This is the exact same version number as Konrad's original which is also still on the tool shed, which makes it very confusing as your version has a number of changes:

https://toolshed.g2.bx.psu.edu/view/nml/assemblystats/
https://toolshed.g2.bx.psu.edu/view/konradpaszkiewicz/assemblystats/

Biohansel does not recognize collections of SPAdes assemblies without .fasta extension

We tried to assemble a collection of .fastq WGS datasets with SPAdes, and tried to run Biohansel on the dataset list in Galaxy, and got an error. Biohansel does not see any input files.n Perhaps the wrapper needs to be fixed. @apetkau @DarianHole

Add StarAMR 0.10.0 Galaxy wrapper

Improve "quasitools distance" tool description.

https://github.com/phac-nml/galaxy_tools/blob/master/tools/quasitools/distance.xml

The measure reported by the tool isn't directly an evolutionary measure, but rather an approximation of one, and this should be more clear.

Additionally, in the description of inputs, the FASTA file isn't always necessarily a "consensus" file.

assembly_stats_txt.py does not return error code

Currently code is:

https://github.com/phac-nml/galaxy_tools/blob/master/tools/assemblystats/assembly_stats_txt.py#L18

def stop_err(msg):
    sys.stderr.write('%s\n' % msg)
    sys.exit()

This will return a zero exit level, i.e. no error will be report by Galaxy.

Quick fix:

def stop_err(msg):
    sys.stderr.write('%s\n' % msg)
    sys.exit(1)

Better fix, replace all calls to stop_err with sys.exit which will accept a string, print this to stderr, and exit with return code one.

Add E. coli species to the pointfinder

Hi there,
is it possible to add E.coli to the galaxy wrapper under the pointfinder section?

galaxy_tools/tools/staramr/staramr_search.xml

Line 80 in 7ecd507

<option value="campylobacter">Campylobacter</option>

or add an option to allow user select their species of interest from the Galaxy history?

Many thanks

Kat sect sometimes fail with hard links

Having issues where KAT sect will fail on the hard linking of the database files. It occurs randomly but only when a hundreds concurrent KAT jobs are all attempting to use the same database file.

One solution would be to use cp instead of hard linking file with ln

StarAMR-related DBs in Galaxy

Hi,

I have a question regarding managing StarAMR-related DBs in Galaxy.
I do not see any data tables in the staramr folder and any DM.
How do you manage the DBs then?

We would like to update the DBs on usegalaxy.fr.

Thanks for your help.

Bérénice

Bundle Collection cannot handle comma in dataset name

median wrong in assembly_stats tool aka fasta_summary.pl

Noticed on a real dataset where my Python code did not get the same median contig length, reduced to a minimal test case:

$ cat /tmp/median.fasta 
>one
A
>two
AA
>three
AAA
>four
AAAA

There are an even number of sequences, 1, 2, 3, 4, meaning the median should be the mean of the middle two values, which is 2.5 in this trivial case.

As can be seen below, fasta_summary.pl picks the larger of the middle two values instead (which in the context of sequences will ensure the answer is an integer), so in practise the error will often be larger:

$ perl fasta_summary.pl -i /tmp/median.fasta -o /tmp && more /tmp/stats.txt 
  Directory '/tmp' exists, so the existing fasta_summary.pl output files will be overwritten
Statistics for read lengths:
	Min read length:	1
	Max read length:	4
	Mean read length:	2.50
	Standard deviation of read length:	1.12
	Median read length:	3
	N50 read length:	3

Statistics for numbers of reads:
	Number of reads:	4
	Number of reads >=1kb:	0
	Number of reads in N50:	2

Statistics for bases in the reads:
	Number of bases in all reads:	10
	Number of bases in reads >=1kb:	0
	GC Content of reads:	0.00 %

Simple Dinucleotide repeats:
	Number of reads with over 70% dinucleotode repeats:	0.00 % (0 reads)
	AT:	0.00 % (0 reads)
	CG:	0.00 % (0 reads)
	AC:	0.00 % (0 reads)
	TG:	0.00 % (0 reads)
	AG:	0.00 % (0 reads)
	TC:	0.00 % (0 reads)

Simple mononucleotide repeats:
	Number of reads with over 50% mononucleotode repeats:	50.00 % (2 reads)
	AA:	50.00 % (2 reads)
	TT:	0.00 % (0 reads)
	CC:	0.00 % (0 reads)
	GG:	0.00 % (0 reads)

CC original author @sujaikumar - given the Perl script has been used widely it may be simpler just to document this behaviour? Is there an official repository for this script?

Update staramr to version 1.0.0 in Galaxy

Update staramr to version 1.0.0 in Galaxy (when released).

Convert to conda

bio_hansel handle interweave and compressed reads

Need to add option to allow for interweave and/or compressed reads as well. Default output from fastq-dump from sratoolkit is to have compressed reads so be best that it can work.

assemblystats does not have a README.rst / README.txt

Please rename README_ASSEMBLY_STATS to README.rst, README.txt or similar so that the Galaxy Tool Shed will display the information.

Update bio_hansel galaxy wrapper with Galaxy 17.09 & Conda 3

During testing of bio_hansel's wrapper, we discovered that bio_hansel will fail travis CI testing without attrs listed in the requirements. This shouldn't be necessary as the recipe installs attrs anyways. We found this is an issue with the older versions of Galaxy and Conda. When Galaxy is updated to version 17.09, and Conda is updated to version 3.x.x
Refer to 753a290, on the required work.

Assemblystats fails in latest version of Galaxy release (20.01)

Default Galaxy version default python version is now 3.X where it was 2.7.X . Tools needs to be updated

Assemblystats gnuplot failed conda

So just re-installed gnuplot=5.0.4 on our Centos 7 machines and looks like one of the dependencies for gnuplot got upgraded (libwebp) from libwebp.so.6 to libwebp.so.7 . Getting following stacktrace from tool.

gnuplot: error while loading shared libraries: libwebp.so.6: cannot open shared object file: No such file or directory

** WARNING: GNUplot pipe returned non-zero status: '32512'

** ERROR: Failed to create /Galaxy/jobs/005/496/5496674/working/dataset_1_files/histogram_bins.dat.png'**

Upgrade gnuplot to latest version and see if it fixes issue and still works with Assemblystats.

phac-nml / galaxy_tools Goto Github PK

galaxy_tools's People

Contributors

Stargazers

Watchers

Forkers

galaxy_tools's Issues

Example follows:

Recommend Projects

Recommend Topics

Recommend Org

Jobs