galaxyproteomics / tools-galaxyp Goto Github PK
View Code? Open in Web Editor NEWGalaxy Tool Shed repositories maintained and developed by the GalaxyP community
License: MIT License
Galaxy Tool Shed repositories maintained and developed by the GalaxyP community
License: MIT License
Analogue to SearchGUI for library search engines, can we include DeNovoGUI for De-Novo-Sequencing?
The OpenMSTools MSGFPlusAdapter and XTandemAdapter allow only for one input in param_fixed_modifications
and for param_variable_modifications
. After adding the first one by clicking on + Insert param_fixed_modifications
, this option is still displayed, but nothing happens when clicking it again. The same is true for the variable modifications.
The Galaxy wrappers for MS-GF+ / XTandem allow to add more than one modification. Here, the input is solved in a different way.
Recent Galaxy versions support a citations tag. This issue will track to process of adding citations to all tools.
All: I am rather new to this group & would like to introduce open-source applications that I frequently use in our metagenome/metaproteome workflows (primarily on HPC systems):
Omega2: https://bitbucket.org/omicsbio/omega2 & accompanying instructions: http://omega.omicsbio.org/instructions
[Purpose: metagenomics assembler which applies a graph-overlap graph theory approach rather than de Bruijn graph theory. Works best for Illumina reads.]
[Presently undergoing significant development & may be worth introducing @ any upcoming events]
Omega2 data preprocessing prerequisites:
a. Sickle: https://github.com/najoshi/sickle
b. ecc.sh (an error correction component of BBMap): https://sourceforge.net/projects/bbmap/ & http://jgi.doe.gov/data-and-tools/bbtools/
Canu: https://github.com/marbl/canu (a fork of the Celera Assembler for MinION reads)
[Purpose: assembly of Oxford Nanopore Technologies MinION reads; documentation: https://github.com/marbl/canu]
Sipros3: https://github.com/Omics-Bio/Sipros3
[Purpose: Utilizes OpenMPI/MPI for the search of very large FASTA files (eg. those from metagenome assemblies with millions of entries, etc.]
[Sipros/ProRata is quite flexibile for integration of protein ID with protein quantification, stable isotope probing, and PTM searches: http://sipros.omicsbio.org/ ]
[Under significant development & may also be of interest & any upcoming events.
UniFam: https://github.com/chaij/UniFam
[Purpose: Enables large-scale protein annotation with UniProt-based families.]
If any of these are potentially interesting applications to others in the group, please LMK-- I'd be pleased to be able to field questions on the tools and/or work to get them incorporated into the shed.
Thanks!
We have a new peptideshaker and searchui release in preparation.
Waiting for a new upstream release.
Can anyone comment on the status of OpenMS in Galaxy? The TS package is owned by galaxyp but is several years old and I don't see it in the github repo. @bgruening has it in his own tool repo and it looks active, but it is in the .tt_blacklist
. Are either of these options suitable for a production server?
A next step for Unipept is adding support for a functional analysis next to the existing taxonomic analysis. While the problem sounds similar to the taxonomic one, there are a few problems:
Finally, the question of "what is the expected output of a functional analysis" remains. Many articles pick a type of functional annotation and simply include a pie chart. If you ask a biologist if he learned something from the pie chart, the answer is almost always "no". We should be able to do better than that. Suggestions or good examples of such data visualisations are always welcome.
What do we want to do with the datatypes in https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/proteinpilot?
Is this wrapper still used?
Should we migrate them to https://github.com/iracooke/proteomics-datatypes?
Can we remove the COPYING into ~
or put them into every readme.md file to get rid of it?
Now that we have one single galaxyp repository this would make much sense, imho.
@jmchilton ok with you? Your are probably the copyright holder?
Answers from Bart Mesuere to questions from Pratik Jagtap:
So, the reason you don't get a result for the listed peptides is that they aren't tryptic peptides. And while we don't offer the advanced missed cleavage handling with the API, you can still get some results for them by "making them tryptic". You could simply add a processing step to the galaxy workflow to perform the regular expression to split them and then run them through the Unipept API. As an extra preprocessing step, you could also filter out peptides longer than 50 or shorter than 5, since we only have peptides with a length in the range of [5,50] in our database. I did this for the unmatched peptides in your excel file and attached the results. As you can see this gives a result for each of the peptides and in many cases the result is equally specific as the web interface. While this approach is not perfect, it's definitely better than reporting nothing at all, plus it's several times faster than the advanced missed cleavage handling.
OpenMS will have a 2.0 release in the next month, we should track the process and generate wrappers for it.
The current process is tracked here: https://github.com/bgruening/galaxytools/tree/master/openms.
I have a general question about bugfixes to tools in the public toolshed.
I'm setting up a local Galaxy instance for proteomics, and I've run into some issues with the versions of wrappers maintained by the 'galaxyp' which are in the main public toolshed. For instance, the 'myrimatch' wrapper is from 2014 and seems to have fatal issues (backspaces inserted into the command line which cause failure). These issues have been fixed in the version in the sandbox and in this repository but don't seem to have made it back into the "stable" toolshed.
My short-term solution is to install the latest version from here (which also had some bugs but which I can submit PRs for directly). Given that the galaxy docs recommend avoiding sandboxed tools on production servers, I'm wondering if there's a system in place for fixing critical bugs to tools in the main instance even when the latest code might not be well-tested enough to put up there.
When installing OpenMS a single dependency (myrimatch) gives an error (seemingly a wrong link):
File "/gpfs1/data/galaxy_server/galaxy-dev/lib/tool_shed/galaxy_install/install_manager.py", line 121, in install_and_build_package_via_fabric
tool_dependency = self.install_and_build_package( install_environment, tool_dependency, actions_dict )
File "/gpfs1/data/galaxy_server/galaxy-dev/lib/tool_shed/galaxy_install/install_manager.py", line 79, in install_and_build_package
initial_download=True )
File "/gpfs1/data/galaxy_server/galaxy-dev/lib/tool_shed/galaxy_install/tool_dependencies/recipe/recipe_manager.py", line 31, in execute_step
initial_download=initial_download )
File "/gpfs1/data/galaxy_server/galaxy-dev/lib/tool_shed/galaxy_install/tool_dependencies/recipe/step_handler.py", line 665, in execute_step
dir = self.url_download( work_dir, downloaded_filename, url, extract=True, checksums=checksums )
File "/gpfs1/data/galaxy_server/galaxy-dev/lib/tool_shed/galaxy_install/tool_dependencies/recipe/step_handler.py", line 165, in url_download
raise Exception( err_msg )
Error downloading from URL http://getgalaxyp.msi.umn.edu/downloads/myrimatch-bin-linux-x86_64-gcc41-release-2_1_131.tar.bz2 : <urlopen error [Errno -2] Name or service not known>
There should be one option that controls whether all searches are done with decoy or not. Options to control decoy searches for individual search engines should be removed because they are confusing and would lead to weird mismatched scoring behaviour.
@jmchilton recent work on workflow scheduling and data collections will change a lot for the galaxyp project and we should make our wrappers collection-aware.
Maybe we will have soon workflows in workflows and loop like structures. This Issue should track our progress on this.
Please add all tools that need to be adopted and one really complex example workflow.
Is this intersting for GalaxyP?
https://github.com/mpc-bioinformatics/pia
Our readme should be extended a little bit to highlight the community project, a few tools, aims ...
Generating proteomic databases from 16S rRNA / taxonomy data.
TOOL IDEA:
Given that most of the metagenomics studies are based on 16S ribosomal RNA-based taxonomy identifications, a tool available that can take in species names as an input and parse out proteomes (if available) from UniProt website - would be desirable. In our discussion with researchers working in the field of metaproteomics - this would be a useful tool. Any ideas on effort that would be required to build this tool?
Suggestions (From emails in November 2014)-
A) Suggestion by Ira Cooke (@iracooke Australia):
Uniprot has a great API … so if you know the species identifier (or list of them) you can get a customized database direct from Uniprot by downloading using a special url that contains all the taxonomic identifiers. This negates the need for a merge step.
This is an example (Dog and Mouse)
http://www.uniprot.org/uniprot/?query=taxonomy%3a9615+OR+taxonomy%3a10090&force=yes&format=fasta
I guess the trick would be to go from species names to taxon ids … since this is inherently fuzzy (species might be listed under a different name from what you expect). For my purposes I just do this by hand using uniprot via the ncbi taxonomy database … but if you have a bulk list of species names I wouldn’t be sure how to do it in an automated way (unless all the species names had a perfect match in the database).
I believe this is the best option as it is simple (just a galaxy tool), it doesn’t require storing data locally and it will always give the latest data. It is also precise as there is no reliance on parsing names.
One missing piece is the “Species -> TaxonID” tool, but could be done using a local download of the NCBI Taxonomy data (or a web API .. I haven’t looked but Uniprot might even provide this too). I’d actually say that you’re better off getting away from using species names if possible … to be precise you need the taxon id’s at some point anyway.
B) Suggestion by Lennart Martens (Belgium):
DBToolkit can do this from the local, complete UniProt file (in .dat format) for species as well as for entire taxons, specified as either the text string ('homo sapiens') or the TaxIDs (9606). As stated above, it does require a local version of the file, however.
Conclusion:
Most 16S rRNA studies offer lists of identified species (and strains). It would be good idea to take this list and a) either convert into taxonomy identifiers or b) submit as species names through 1) UniProt API or 2) some features from db toolkit to 3) generate a FASTA file of available proteomes.
@jmchilton @iracooke I need to add support for converting Wiff files to the msconvert tools.
ProteoWizard Reader_ABI.cpp checks for the existence of a wiff scan file by appending ".scan" to the given input wiff file and searching for that filename in the same directory.
msconvert_wrapper.py as currently coded, doesn't have an option to copy in the .scan file without also adding it explicitly to the inputs added to the command line.
Any thoughts on how best to add wiff conversion support?
If no one has used the wiff datatype as yet, would it be acceptible to add explicit Metadata fields for the .wiff and .wiff.scan files similar to: BowtieIndex in lib/galaxy/datatypes/ngsindex.py or SnpSiftDbNSFP in
lib/galaxy/datatypes/text.py allowing direct use of ${input.extra_files_path}/${input.metadata.wiff} and ${input.extra_files_path}/${input.metadata.scan} in command line generation? This would allow original filenames to be used in the extra_files_path.
class Wiff( Binary ):
"""Class for wiff files."""
MetadataElement( name='reference_name', default='ABSCIEX' , desc='Reference Name', readonly=True, visible=True, set_in_upload=True, no_value='ABSCIEX' )
MetadataElement( name="wiff", default=None, desc="reference_name.wiff", readonly=True, visible=True, no_value=None )
MetadataElement( name="scan", default=None, desc="reference_name.wiff.scan", readonly=True, visible=True, no_value=None)
Hejhej,
I have a question regarding current options in GalaxyP for protein quantitation, especially on stable isotope labels (e.g. SILAC). While there are a lot of good possibilities for peptide identification and protein inference, I do not see many options for quantitation. There is ProteinPilot and Scaffold, but its both not freeware. The integration of MaxQuant was discontinued, as @bgruening told me. I am not sure, if XPRESS and ASAPRatio from the Trans-Proteomic-Pipeline have been integrated, but I would rather not recommend them to anybody. (That's why.)
So the only possibilities I see come from the OpenMS project, e.g. FeatureFinderMultiplex and ERPairFinder. Did I get this correct or am I overlooking quantitation functions of other (free) tools?
Tools needed for Data Independent Acquisition Swath Workflow using DIA-UmpireSE for signal extraction:
msconvert .wiff -> mzXML
DIA-Umpire SE mzXML -> *.mgf
msconvert .mgf -> mzXML
Xtandem mzXML -> pep.xml
PeptideProphet
Xinteract-iProphet
mayu
SpectraST
Spectrast2Spectrast_iRT
SpectraST_cons
Spectrast2tsv
TSVtoTRAML
OpenSwath
Hi everybody,
the idea is to have a suite of tools able to carry out quality filtering, assembling, clustering and ORF finding on metagenomic sequences. This would be really useful to generate (sample-matched) metagenome-based databases for metaproteomics, but also - more in general - for all microbiome/metagenome scientists.
Other issues and the Google SpreadSheet do already mention SixGill and Omega3, with the former already ready for testing. IMO it would be worth considering to add other tools. Sequence clustering, for instance, is almost mandatory in certain cases to minimize sequence redundancy and reduce database size.
These are the tools we're currently using in our lab:
I'm not a developer, so I don't know if these tools might be packed within Galaxy (or are already present in other Galaxy servers).
Do you think this could be useful? Can anybody have a look to these tools and/or propose better or simpler tools which do similar things?
Thanks!
Alessandro
Call to software developers and users:
Help the metaproteomics community to improve tools, documentation and workflows for metaproteomics research. If you ever wanted to contribute to a vibrant community, this is your event. We will get you started and explain you everything. No coding skills necessary to contribute.
Google SpreadSheet: http://z.umn.edu/metaproteomics
https://gitter.im/GalaxyProteomics/Lobby#
https://github.com/galaxyproteomics/tools-galaxyp/issues
December 15th Start Time: Central European Time:9:00 AM
Bjoern will be at the gitter at 9 AM Central European Time for European developers. He plans to have one session for ALL developers at 4 PM CET (10 AM EST, 9 AM Minneapolis, 11 noon Pacific Standard Time).
Datatypes are defined in it's own repository. We should remove them from the ProteinPilot repository.
Moreover, I think they can replaced with something already available in Galaxy. So no need to introduce yet another datatype.
The group
datatype seems to be special, and there is a special wrapper to convert group
to xml
. If we include group2xml.exe XML $input $output
into the ProteinPilot wrapper directly we can get rid of the datatype and one tool.
A TPP expert is needed here :)
planemo gained some recent features to strengthen linting support and we should lint all our tools.
This sounds like a real good hackathon task and does not involve much programming.
The parameter r_executable (type="data" format="txt" value="R") renders as a selection field in Galaxy which allows to select only data sets from the history.
This refers to revision a25d96e0d837 which should be the newest; using galaxy release 17.05.
Is it necessary to have an R installation available? On our systems R in not available by default, but needs to be "loaded" with a command (we use http://modules.sourceforge.net/). Are there R packages needed to run the tool?
Hi GalaxyP Community,
I would like to invite you to a hackathon on the September 27th and 28th. We will package as many dependencies as possible into conda packages.
Conda is a new packaging system that Galaxy can use and that enables travis tool testing, finally! Would be great if someone can join!
We will meet tomorrow here:
https://hangouts.google.com/call/35zl5pahj5hkppxokhuxvszx24e
and on gitter here:
https://gitter.im/GalaxyProteomics/Lobby
Some slides for a Conda quick-start can be found here: https://galaxy.slides.com/bgruening/conda-quick-start/live#/
A tool list that needs help can be found here: https://docs.google.com/spreadsheets/d/1C9p_XLiLyrbMoRVKS_H582TL1I5v9FKAv2dl5VAYxjw/edit#gid=0
Most of the tools without testing and a conda package are now TPP based.
I started to work on this here: https://github.com/bioconda/bioconda-recipes/compare/tpp?expand=1
But did not succeed here. Any brave sole that can help me with this gets free beer at next GCC.
We could also think about splitting the different binaries into separate packages and not building the entire thing. Also we might want to build the latest TPP version and upgrade our tools.
This is the last big step missing for a completely tested GalaxyP repository - I think it's worth the pain!
For the time being it is best-practise to include an explicit version into the package name, like packge_openms_2_0.
The Galaxy project is migrating it's wiki page and is seeking for help on the 8th of December.
Please join and help us to improve our general documentation and create an awesome Community-Hub!
So we have now an updated version of PeptideShaker that works with the latest beta release. As far as I can tell it works and we could get rid of our ugly workarounds.
Many thanks to Marc, who fixed a lot of bugs on the PeptideShaker side today.
A few things that needs to be addressed.
Several mycoplasma strains are very common contaminants in cell culture. Protein Database Downloader does not support mycoplasma so far.
Could the following strains be added to Protein Database Downloader:
As those would be often downloaded together and merged afterwards, maybe it would make sense to make it possible to download all of them at once. Maybe as Taxonomy: "Common mycoplasma contaminants (M. orale, M. arginini [...])"?
Add .shed.yml files to every repository to make make use of planemo's upload feature.
https://github.com/dhmay/sixgill/wiki
Sixgill (Six-frame Genome-Inferred Libraries for LC-MS/MS) is a tool for using shotgun metagenomics sequencing reads to construct databases of 'metapeptides': short protein fragments for database search of LC-MS/MS metaproteomics data.
Tasks:
Dear Galaxy-P colleagues,
Testing the galaxy-p galaxy flavour, using search GUI Version 2.9.0 tools, I have changed the default "Precursor Ion Tolerance Units" to Da and have the following error: Error parsing the prec_ppm option: Found 2 where 0 or 1 was expected.
Cheers,
Yvan
@PratikDJagtap presented a Multi-omics visualization platform (MVP) plugin for integrated genome viewer (IGV) at the ASMS. I did not find any documentation online. How can it be integrated to Galaxy?
Referring to the Gitter discussion: Can we include this brand-new search engine: MSFragger?
code download available here: https://secure.nouvant.com/umich/technology/7143/license/633
Can someone (@PratikDJagtap) point me to the Galaxy-P proteogenomic workflow into which I should integrate my Omicron tools, e.g. CustomProDB and PSM2SAM? I checked the "Published workflows" section of the public Galaxy-P site and it's not there. We can discuss any design considerations for the fused workflow here.
I see there's a "Tool needed" label; it begs the question, why is there no "Workflow needed" label? Pinging @bgruening because I don't know who better to ask. ;)
It would be nice if a tool for visualizing the results from Mass Spectrometry and peptide matching experiments.A tool which can visualize and show a list of peptide matches which can run from OpenMS. cc @jj-umn @bgruening @jmchilton
Links to metaproteomic public repositories
IDEA:
It would be a good idea to add a few common publically available metaproteomics databases (see HOMD database for an example below) to the Protein database downloader tool in Galaxy (https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/dbbuilder).
Please suggest links for some useful metaproteomics databases.
HOMD database: ftp://ftp.homd.org/HOMD_annotated_genomes_archive/oral_microbiome_dynamic.aa.zip
This tool looks strange to me. If the purpose of this tool is to convert windows newlines only for ProteinPilot, as the name suggest, we should include the tr
magic into the wrapper of ProteinPilot.
If such a tool is really missing in general we should put it into a text-manipluation repository, e.g. here:
https://github.com/bgruening/galaxytools/tree/master/text_processing/text_processing
Anyone know if this is still a problem or does Galaxy take care of this?
@jmchilton @jj-umn ?
More informations here:
*All OpenMS tools can be generated automatically, the wrapper quality varies
I will simply add a tool_dependencies.xml file to point to msproteomicstools if this is ok.
Pyprophet as a conda package but lacks tests.
https://github.com/galaxyproteomics/tools-galaxyp/blob/ms_wiff_testing/tools/pyprophet/pyprophet.xml
We need to add a proper format option to the input param of feature_alignment:
https://github.com/galaxyproteomics/tools-galaxyp/blob/master/tools/feature_alignment/feature_alignment.xml#L37
Do we need to define new formats for: "openswath" or "peakview"?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.