lemaslab / rump Goto Github PK

A Reproducible Untargeted Metabolomics Data Processing Pipeline

License: MIT License

Dockerfile 0.03% Nextflow 1.32% Shell 0.03% Python 1.96% HTML 96.51% R 0.14%

big-data docker machine-learning mass-spectrometry metabolomics-data nextflow pathway-analysis reproducible-research singularity statistical-analysis

rump's Introduction

Lemas Lab

This is the repository for the Lemas Lab Group. We use Jekyll to run our Github page. Feel free to fork and make your own page or submit a pull-request!

Procedure to make change

Login your Github account.
If you are using Windows and have never installed or used Git, please go to this website to install Git first. Otherwise, skip this step.
Open command line, direct to the folder in which you want to save the source code of the website.
Clone repository to your local machine with the following command line:

git clone https://github.com/lemaslab/lemaslab.github.io.git

Change directories to the cloned repository.

cd lemaslab.github.io

Create your own branch. Skip this step if you already have your own branch.

git branch <create-branch>

Switch to the appropriate branch you are working on.

git checkout <branch-name>

Make appropriate changes to the cloned repository to update information to the webpage.
Test if the updated repository can run appropriately on your local machine following the procedure: Requirements -> Step2 -> Step4 on this guideline.
Type the following lines to update the repository:

git add -A

git commit -m '{describe your change}'

git push

Go to the webpage of your repository, Make a pull request to the original repository and wait for approval by clicking pull request (note: this button is beside compare button, it is NOT New pull request button).

Add yourself

You can add yourself to the page in _people folder just create file name <your_firstname>_<your_lastname>.md in the folder. See the following for the template of the content in <firstname>_<lastname>.md

---
name: Eva Dyer
position: underGradStudent
avatar: eva.jpg
twitter:
joined: 2014
---

<img width="300" src="{{site.baseurl}}/images/people/{{page.avatar}}" data-action="zoom">

### {section_name}
(section content)

Detailed procedures to add yourself:

Prepare your photo, crop it to W:H=1:1.618.
Put your photo to images/people/ folder and name it to <your_firstname>_<your_lastname>.jpg.
Open file _people/<your_firstname>_<your_lastname>.md, change name to your name, position to your position (you can choose position from 4 classes including pi, underGradStudent, gradStudent, researchCoordinator. Position will put you into section that you choose.), avatar to <your_firstname>_<your_lastname>.jpg, joined to the year you joined the lab.
If you have more to tell, you can modify the {section_name} and (section_content) showing in the template. An example would be:

---
name: Eva Dyer
position: underGradStudent
avatar: Eva_Dyer.jpg
twitter:
joined: 2014
---

<img width="300" src="{{site.baseurl}}/images/people/{{page.avatar}}" data-action="zoom">

### Current Project
My current project is about designing a spaceship to prevent alian attack.

Follow Procedure to make change to update the webpage.

note: following sections are out dated and will be updated soon

Add posts

It's very easy to add post. All the posts are located in _posts folder. It arrangement is based on date. Each post can be written in markdown format. You just have to state headers before writing: title, description and categories. description will be shown when you share on social media like Facebook or twitter. See the following headers:

---
title: Summer School in Computational Sensory-Motor Neuroscience (CoSMo)
description: all links to CoSMo summer school in computational neuroscience materials
categories: scientists
---

We have 4 categories: scientists, students, discussion, blog you can choose and this will be rendered to different location.

How to add posts

Directly edit on Github, you can simply go to _posts and click New file then put some markdown file e.g. 2016-02-03-post-name.md and start writing blog post. Github also allows you to preview it so it's nice for people who don't want to clone the repo.
Clone the repository, kind of the same as directly add post on Github. You just have to clone the repository. Then add new post file, commit and push to the repo.

The changes will take approximately half a minute to render. You can see the new posts or changes on Lemas Lab Group!

Add new publications

All publications from the lab are located in publications.md. Please upload new publication on your own!

rump's People

Contributors

Stargazers

Watchers

Forkers

thulium1 cgpu thejacksonlaboratory nishachachad xinsongdu xkcococo gmhhope mshabdiz

rump's Issues

Asciinema Tutorial Vedio

Computational Resource Cost Table/Figure

Make a table/figure showing the relationship between raw data size and memory cost/running time. Using a fixed resource allocation. (e.g., 20G memory, 10CPU). Sample figure can be Table 3 in YAMP paper

Add Support Vector Machine (SVM) to Statistical Analysis

Check how MetaboAnalyst (https://www.metaboanalyst.ca/) do SVM for metabolomics data, add similar function to RUMP. Sample data can be used to test your code. Steps:

Google the term and learn about what SVM is.
Write code to create SVM for Sample data.
Analyze input and output of your code, add argument parser for Python or argument parser for R.
Check code qualify using pylint (if using Python) or lintr (if using R) and improve code quality based on checking results.
Modify main.nf to integrate your new code to RUMP and test your code.
Make a pull request and wait for review.

Using MetaboAnalyst and their sample data to do SVM and observe what the results look like might be helpful.

Add negative mode for pathway analysis

command in Nextflow like:

python3 !{python_mummichog_input_prepare} -i !{neg_vd_both_nobg} -o !{params.data_neg_nobg_both_mummichog} &&
mummichog1 -f !{params.data_neg_nobg_both_mummichog} -o !{params.data_neg_nobg_both_mummichog_out} -c !{params.cutoff}

Images

This issue includes all images that are used in the documentation

MZmine Peak Table Format Conversion

Add code to convert MZmine peak table to the format that can be parsed by MataboAnalyst, which can enhance interoperability. Sample data can be used to test your code. Steps:

Observe the sample data format provided by MetaboAnalyst for statistical analysis.
Write code to convert Sample data to that format.
Test your code and see if your converted data can be input to MetaboAnalyst for statistical analysis.
Analyze input and output of your code, add argument parser for Python or argument parser for R.
Check code qualify using pylint (if using Python) or lintr (if using R) and improve code quality based on checking results.
Modify main.nf to integrate your new code to RUMP and test your code.
Make a pull request and wait for review.

Using MetaboAnalyst and their sample data to do some statistical anslysis and observe what the results look like might be helpful.

Data quality score reporting

Score calculation is based on this paper: https://pubmed.ncbi.nlm.nih.gov/31179719/

Code style adjustment

Adjust the code style to PEP 8 (https://realpython.com/python-pep8/). I will work on this, and will close this issue once finished.

Make MZmine Parameters Modifiable

Enable modification of MZmine parameters, which enables the use of our pipeline by users all over the world. Use a configuration file.

Web development: data provenance graph

A good reference would be: https://view.qiime2.org/provenance/?src=https%3A%2F%2Fdocs.qiime2.org%2F2020.6%2Fdata%2Ftutorials%2Fmoving-pictures%2Ftaxa-bar-plots.qzv

Test of data processing parameter modification

See if a potential user can modify parameters for "chromatogram deconvolution" step based on this guideline: https://github.com/lemaslab/RUMP/wiki/8.-Data-processing-parameters-modification

Add PLS-DA to Statistical Analysis

Add PLS-DA (first check online what PLS-DA) to the pipeline. Sample data can be used to test your code. Steps:

Google the term and learn about what PLS-DA is.
Write code to create PLS-DA for Sample data.
Analyze input and output of your code, add argument parser for Python or argument parser for R.
Check code qualify using pylint (if using Python) or lintr (if using R) and improve code quality based on checking results.
Modify main.nf to integrate your new code to RUMP and test your code.
Make a pull request and wait for review.

Using MetaboAnalyst and their sample data to do PLS-DA and observe what the results look like might be helpful.

Adding CEU Mass Mediator to RUMP

Findings related to scientific software development from literature

Address problems pointed out by nf-core lint

Output from nf-core lint (https://nf-co.re/tools#linting-a-workflow):

% nf-core lint RUMP

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 1.12.1



INFO     Testing pipeline: RUMP                                                                                           lint.py:201
Running lint checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 of 17 » check_files_existCRITICAL Found test failures in `check_files_exist`, halting lint run.                                                    lint.py:242
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [!] 5 Test Warnings                                                                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ https://nf-co.re/errors#1: File not found: environment.yml                                                                        │
│ https://nf-co.re/errors#1: File not found: conf/base.config                                                                       │
│ https://nf-co.re/errors#1: File not found: .github/workflows/awstest.yml                                                          │
│ https://nf-co.re/errors#1: File not found: .github/workflows/awsfulltest.yml                                                      │
│ https://nf-co.re/errors#1: File should be removed: .travis.yml                                                                    │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [✗] 8 Tests Failed                                                                                                                │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ https://nf-co.re/errors#1: File not found: nextflow_schema.json                                                                   │
│ https://nf-co.re/errors#1: File not found: CHANGELOG.md                                                                           │
│ https://nf-co.re/errors#1: File not found: docs/README.md                                                                         │
│ https://nf-co.re/errors#1: File not found: docs/output.md                                                                         │
│ https://nf-co.re/errors#1: File not found: docs/usage.md                                                                          │
│ https://nf-co.re/errors#1: File not found: .github/workflows/branch.yml                                                           │
│ https://nf-co.re/errors#1: File not found: .github/workflows/ci.yml                                                               │
│ https://nf-co.re/errors#1: File not found: .github/workflows/linting.yml                                                          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────╮
│ LINT RESULTS SUMMARY  │
├───────────────────────┤
│ [✔]   9 Tests Passed  │
│ [!]   5 Test Warnings │
│ [✗]   8 Tests Failed  │
╰───────────────────────╯

Add Metabolomics Data Quality Control

Unknown Search Tool Development

CEU Mass Mediator has many drawbacks (e.g., their server is not stable, sometimes users cannot connect to it; their results do not have information like class, kindom, etc.). We need to develop a tool and incorporate it to RUMP for unknown search.

Dependency confliction

MultiQC wants networkx version 2X but mummichog needs network version 1X. Need to resolve this.

Add volcano plot

Integrate volcano plot to statistical analysis. Sample data can be used to test your code.

Add volcano plot (example plot can be found here: https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot/tutorial.html) to the pipeline. Sample data can be used to test your code. Steps:

Google the term and learn about what volcano is.
Write code to create volcano plot for Sample data.
Analyze input and output of your code, add argument parser for Python or argument parser for R.
Check code qualify using pylint (if using Python) or lintr (if using R) and improve code quality based on checking results.
Modify main.nf to integrate your new code to RUMP and test your code (i.e., add volcano plot for data before/after blank subtraction, add volcano plot figure to MultiQC report).
Make a pull request and wait for review.

Add Manhattan Plot to Statistical Analysis

Add Manhattan Plot (example plot can be found in this paper: https://www.sciencedirect.com/science/article/abs/pii/S0022347616000287?via%3Dihub) to the pipeline. Sample data can be used to test your code. Steps:

Google the term and learn about what Manhattan plot is.
Write code to create Manhattan plot for Sample data.
Analyze input and output of your code, add argument parser for Python or argument parser for R.
Check code qualify using pylint (if using Python) or lintr (if using R) and improve code quality based on checking results.
Modify main.nf to integrate your new code to RUMP and test your code.
Make a pull request and wait for review.

Figure 4B of this paper might be a good reference.

Create GUI for RUMP

Example of creating GUI for application: https://github.com/Arseha/peakonly/blob/master/peakonly.py

Check "security" section on Github, learn about how to trigger security alart

Add MSCombine

Steps (i.e., pipeline) of using MSCombine:

Load positive data, negative data, adducts table as spreadsheets in R.
Use FindCommon
Use RemoveMismatch
Use StudyRTdiff
Use FilterbyRT
Use CombinePolarities

Use MZmine config file directly instead of using Python

Remove "rump/batchfile_generator_pos_253.py" and "rump/batchfile_generator_neg_253.py", add config folder storing config files

Register SciCrunch.org to receive a RRID

Logo Design

Simulation study

Documentation Improvements

Go through the documentation, try to run RUMP with example data, provide suggestions for documentation.

Version Release with Zenodo

https://guides.github.com/activities/citable-code/

XCMS Integration

Integrating and XCMS to RUMP, select the overlap of peaks of XCMS and MZmine as the final peak table (reference: https://pubmed.ncbi.nlm.nih.gov/28752757/). Use IPO for XCMS parameter tuning.

Wiki Improvement

Include folder layout (e.g., https://github.com/alesssia/YAMP/wiki/Folders-layout-example)
Explain how to run the pipeline on AWS (e.g., https://github.com/alesssia/YAMP/wiki/How-to-use-AWS-Batch)

Different result folder name

Amnah's suggestion:

Give different name to result folder, since she was trying to do meta-analysis using RUMP for different datasets, but she couldn't find the corresponding result of each dataset, since the name of the result folder did not change. (I observe that in Mummichog, the result folder has different name every time you run it.

Add running dependency information

Export OS, Python, R, and other software/tools/packages version to a file