GithubHelp home page GithubHelp logo

phac-nml / galaxy_tools Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 25.0 86.86 MB

Contains a set of Galaxy Tools mostly written by the Bioinformatics core at NML

License: Apache License 2.0

Shell 1.17% Python 9.40% Perl 78.28% R 9.63% HTML 1.52%

galaxy_tools's People

Contributors

apetkau avatar bgruening avatar camytran avatar connorchato avatar damcorreia avatar dankein avatar darianhole avatar dfornika avatar emarinier avatar ericenns avatar glabbe avatar hexylena avatar jencabral avatar jennifertran avatar kbessonov1984 avatar markiskander avatar matthew-fogel avatar mgopez avatar mvdbeek avatar peterk87 avatar pvanheus avatar richarddavidsonthegreat avatar shiltemann avatar takadonet avatar thezetner avatar xialiu71 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

galaxy_tools's Issues

Biohansel does not always correctly pair dataset collections in output

Not having an extension (ex .fastq) when making a collection of reads as an input for biohansel results in the final output not having the files paired even though the dataset collection does.

If I had to guess (and if I remember correctly) this is likely due to how the get_paired_fastq_filename function interacts with the $input.paired_collection.<forward|reverse> name which I believe utilizes the underlying name from the dataset used to make the collection and not the name in the collection itself

Example follows:

Dataset 1 -> Correctly pairs output:

  • File names used to make up the paired collection:

    • TestX_R1.fastq && TestX_R2.fastq
    • TestY_R1.fastq && TestY_R2.fastq
  • Paired Collection (looks the exact same as Dataset 2:
    image

  • Output:

TestX | heidelberg | 0.5.0

Dataset 2 -> Outputs are separated

  • File names used to make up the paired collection:

    • TestX_R1 && TestX_R2
    • TestY_R1 && TestY_R2
  • Paired Collection:
    image

  • Output:

TestX_R1 | heidelberg | 0.5.0
TestX_R2 | heidelberg | 0.5.0

Updates to spatyper based on user feedback

There are 2 key improvements that our spatyper wrapper could use.

  1. Update the “repeats library multifasta” to be automatically downloaded from Ridom.
  2. The output from the spatyper tool is currently a collapsed version. The command output should include Sample #, Repeats and Type.

assembly_stats_txt.xml still at v1.0.1

https://github.com/phac-nml/galaxy_tools/blob/master/tools/assemblystats/assembly_stats_txt.xml starts:

<tool id="assemblystats" name="assemblystats" version="1.0.1">
	<description>Summarise an assembly (e.g. N50 metrics)</description>
	<requirements>
...

This is the exact same version number as Konrad's original which is also still on the tool shed, which makes it very confusing as your version has a number of changes:

https://toolshed.g2.bx.psu.edu/view/nml/assemblystats/
https://toolshed.g2.bx.psu.edu/view/konradpaszkiewicz/assemblystats/

assembly_stats_txt.py does not return error code

Currently code is:

https://github.com/phac-nml/galaxy_tools/blob/master/tools/assemblystats/assembly_stats_txt.py#L18

def stop_err(msg):
    sys.stderr.write('%s\n' % msg)
    sys.exit()

This will return a zero exit level, i.e. no error will be report by Galaxy.

Quick fix:

def stop_err(msg):
    sys.stderr.write('%s\n' % msg)
    sys.exit(1)

Better fix, replace all calls to stop_err with sys.exit which will accept a string, print this to stderr, and exit with return code one.

Kat sect sometimes fail with hard links

Having issues where KAT sect will fail on the hard linking of the database files. It occurs randomly but only when a hundreds concurrent KAT jobs are all attempting to use the same database file.

One solution would be to use cp instead of hard linking file with ln

StarAMR-related DBs in Galaxy

Hi,

I have a question regarding managing StarAMR-related DBs in Galaxy.
I do not see any data tables in the staramr folder and any DM.
How do you manage the DBs then?

We would like to update the DBs on usegalaxy.fr.

Thanks for your help.

Bérénice

median wrong in assembly_stats tool aka fasta_summary.pl

Noticed on a real dataset where my Python code did not get the same median contig length, reduced to a minimal test case:

$ cat /tmp/median.fasta 
>one
A
>two
AA
>three
AAA
>four
AAAA

There are an even number of sequences, 1, 2, 3, 4, meaning the median should be the mean of the middle two values, which is 2.5 in this trivial case.

As can be seen below, fasta_summary.pl picks the larger of the middle two values instead (which in the context of sequences will ensure the answer is an integer), so in practise the error will often be larger:

$ perl fasta_summary.pl -i /tmp/median.fasta -o /tmp && more /tmp/stats.txt 
  Directory '/tmp' exists, so the existing fasta_summary.pl output files will be overwritten
Statistics for read lengths:
	Min read length:	1
	Max read length:	4
	Mean read length:	2.50
	Standard deviation of read length:	1.12
	Median read length:	3
	N50 read length:	3

Statistics for numbers of reads:
	Number of reads:	4
	Number of reads >=1kb:	0
	Number of reads in N50:	2

Statistics for bases in the reads:
	Number of bases in all reads:	10
	Number of bases in reads >=1kb:	0
	GC Content of reads:	0.00 %

Simple Dinucleotide repeats:
	Number of reads with over 70% dinucleotode repeats:	0.00 % (0 reads)
	AT:	0.00 % (0 reads)
	CG:	0.00 % (0 reads)
	AC:	0.00 % (0 reads)
	TG:	0.00 % (0 reads)
	AG:	0.00 % (0 reads)
	TC:	0.00 % (0 reads)

Simple mononucleotide repeats:
	Number of reads with over 50% mononucleotode repeats:	50.00 % (2 reads)
	AA:	50.00 % (2 reads)
	TT:	0.00 % (0 reads)
	CC:	0.00 % (0 reads)
	GG:	0.00 % (0 reads)


CC original author @sujaikumar - given the Perl script has been used widely it may be simpler just to document this behaviour? Is there an official repository for this script?

Update bio_hansel galaxy wrapper with Galaxy 17.09 & Conda 3

During testing of bio_hansel's wrapper, we discovered that bio_hansel will fail travis CI testing without attrs listed in the requirements. This shouldn't be necessary as the recipe installs attrs anyways. We found this is an issue with the older versions of Galaxy and Conda. When Galaxy is updated to version 17.09, and Conda is updated to version 3.x.x
Refer to 753a290, on the required work.

Assemblystats gnuplot failed conda

So just re-installed gnuplot=5.0.4 on our Centos 7 machines and looks like one of the dependencies for gnuplot got upgraded (libwebp) from libwebp.so.6 to libwebp.so.7 . Getting following stacktrace from tool.

gnuplot: error while loading shared libraries: libwebp.so.6: cannot open shared object file: No such file or directory

** WARNING: GNUplot pipe returned non-zero status: '32512'

** ERROR: Failed to create /Galaxy/jobs/005/496/5496674/working/dataset_1_files/histogram_bins.dat.png'**

Upgrade gnuplot to latest version and see if it fixes issue and still works with Assemblystats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.