GithubHelp home page GithubHelp logo

ricomnl / bioinformatics-pipeline-tutorial Goto Github PK

View Code? Open in Web Editor NEW
12.0 12.0 24.0 421 KB

A tutorial on how to create bioinformatics pipelines as bash scripts, Makefiles and using tools like Nextflow.

License: MIT License

Dockerfile 0.69% Makefile 4.81% Python 56.28% Nextflow 29.43% Shell 8.80%

bioinformatics-pipeline-tutorial's Introduction

Bioinformatics Pipeline Tutorial

This is the accompanying GitHub repository for this blog post: https://ricomnl.com/blog/bottom-up-bioinformatics-pipeline/.

Photo by Sigmund on Unsplash

Outline

The workflow we're going to wrap in a pipeline looks like this:

  1. Take a set of .fasta protein files
  2. Split each into peptides using a variable number of missed cleavages
  3. Count the number of cysteines in total as well as the number of peptides that contain a cysteine
  4. Generate an output report containing this information in a .tsv file
  5. Create an archive to share with colleagues

An example output protein report

Barplot charts showing the number of cysteines in peptides and amino acids

Prerequisites

MacOS

# Add project to your path for this session.
export PATH="$PATH:$(pwd)"

# Open the terminal; Install utilities for homebrew
xcode-select --install

# Install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install python3
Follow this tutorial: https://opensource.com/article/19/5/python-3-default-mac

# Install make
brew install make

# Install git
brew install git

# Install matplotlib
pip3 install matplotlib

# Install Nextflow (https://www.nextflow.io/docs/latest/getstarted.html)
wget -qO- https://get.nextflow.io | bash
chmod +x nextflow
## Move Nextflow to a directory in your $PATH such as /usr/local/bin
mv nextflow /usr/local/bin/

Linux

# Install python3, git and make
sudo apt-get update
sudo apt-get install python3 git make

# Install matplotlib
sudo apt-get install python3-matplotlib

# Install Nextflow (https://www.nextflow.io/docs/latest/getstarted.html)
wget -qO- https://get.nextflow.io | bash
chmod +x nextflow
## Move Nextflow to a directory in your $PATH such as /usr/local/bin
mv nextflow /usr/local/bin/

bioinformatics-pipeline-tutorial's People

Contributors

ricomnl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

bioinformatics-pipeline-tutorial's Issues

Redun pipeline is broken with orjson ~3.8

Hey Rico,

When downloading, installing, and running the main redun pipeline, one gets this error message:

Error message ``` [redun] Execution duration: 0.84 seconds [redun] *** Execution failed. Traceback (most recent task last): [redun] Job eea2a311: File "/Users/falexwolf/repos/bioinformatics-pipeline-tutorial/wf/workflow.py", line 27, in main [redun] def main( [redun] amino_acid = 'C' [redun] enzyme_regex = '[KR]' [redun] executor = [redun] input_dir = 'fasta/' [redun] max_length = 75 [redun] min_length = 4 [redun] missed_cleavages = 0 [redun] Job 299f076a: File "/Users/falexwolf/repos/bioinformatics-pipeline-tutorial/bioinformatics_pipeline_tutorial/lib.py", line 269, in plot_count_task [redun] def plot_count_task(input_count: File) -> File: [redun] input_count = File(path=data/KLF4.count.tsv, hash=be612cb0) [redun] File "/Users/falexwolf/repos/bioinformatics-pipeline-tutorial/bioinformatics_pipeline_tutorial/lib.py", line 278, in plot_count_task [redun] counts_plot = plot_counts(output_path, counts) [redun] File "/Users/falexwolf/repos/bioinformatics-pipeline-tutorial/bioinformatics_pipeline_tutorial/lib.py", line 108, in plot_counts [redun] fig.write_image(tmp_file.path) [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/plotly/basedatatypes.py", line 3829, in write_image [redun] return pio.write_image(self, *args, **kwargs) [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/plotly/io/_kaleido.py", line 267, in write_image [redun] img_data = to_image( [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/plotly/io/_kaleido.py", line 144, in to_image [redun] img_bytes = scope.transform( [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/kaleido/scopes/plotly.py", line 153, in transform [redun] response = self._perform_transform( [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/kaleido/scopes/base.py", line 296, in _perform_transform [redun] export_spec = self._json_dumps(dict(kwargs, data=data)).encode('utf-8') [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/kaleido/scopes/plotly.py", line 76, in _json_dumps [redun] return pio.to_json(val, validate=False, remove_uids=False) [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/plotly/io/_json.py", line 199, in to_json [redun] return to_json_plotly(fig_dict, pretty=pretty, engine=engine) [redun] File "/opt/anaconda3/envs/base1/lib/python3.9/site-packages/plotly/io/_json.py", line 126, in to_json_plotly [redun] opts = orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY [redun] AttributeError: partially initialized module 'orjson' has no attribute 'OPT_NON_STR_KEYS' (most likely due to a circular import) ```

I tried downgrading plotly from 5.11 to 5.8, and it didn't help.

I then downgraded orjson from 3.8.1 to 3.5.4, and it fixed the issue.

If other people want to try out your code, to keep your example running, I'd suggest to pin the plotly and orjson versions for the time being - even if it's not great to pin such fundamental dependencies it's much better than an erroring workflow. ☺️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.