GithubHelp home page GithubHelp logo

chosenobih / hamrlinc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from harrlol/hamrlnc

0.0 0.0 0.0 522 KB

High-throughput pipeline for modified mRNA annotation and long intergenic non-coding RNA annotation

License: MIT License

Shell 57.43% R 37.67% Dockerfile 4.31% Perl 0.33% Python 0.26%

hamrlinc's Introduction

HAMRLNC: High-throughput Annotation of Modified Ribonucleotides and Long Non-Coding RNAs

HAMRLINC_workflow

Overview

  • HAMRLNC is a multipurpose toolbox that expedites the analysis pipeline for HAMR and Evolinc. The former was developed by Paul Ryvkin et al, and the latter by Andrew D.L. Nelson et al. HAMRLNC aims to make the original methods more accessible by automating the tedious pre-processing steps and expanding on their functionalities with its built-in post-processing steps, allowing users to visualize epitranscriptomic analysis with experimental condition contexts.
  • HAMRLNC is high-throughput and performs RNA-modification annotation and long intergenic non-coding RNAs(lincRNA) annotation at a bioproject scale. HAMRLNC performs constitutive trimming of acquired reads using Trim-Galore, and makes use of STAR as the default aligning tool; mapped reads are pre-processed using selected methods from GATK, gffread, CPC2, infernal, samtools, etc. Users can also opt to quantify transcripts alongside these steps.
  • HAMRLNC is optimized for partial parallel processing and modularization. Specifying a larger thread count where hardware permits will greatly increase the speed of a single run. If only partial functionality is needed (e.g. only analyzing modified ribonucleotides), users can implement flags to activate the function modules desired. See below for more details.

Command Line Arguments and Description

Read the Wiki for detailed descriptions on selected flags.

Command Description
Required
-o <pipeline output directory>
name of the directory where you would like your hamrlnc run to be
-c <filenames for each fastq.csv>
a csv file that corresponds each srr code (or name of fastq file) to your desired nomenclature for each read
-g <reference genome.fa>
a fasta file of the genome of the model organism
-i <reference genome annotation.gff3>
a gff3 file of the genome of the model organism, note we require gff3 instead of gtf
Optional
-l <minimum average read length>
default: auto-detect
-n [number of threads]
default=4
-r [perform fastqc]
default=false
-d [raw fastq folder]
default=NA
-t [trim raw fastq]
default=false
-D [raw bam folder]
default=NA
-b [sort raw bam]
default=false
-I [STAR genome index folder]
default=NA
-k [activate modification annotation workflow]
default=false
-p [activate lncRNA annotation workflow]
default=false
-u [activate featurecount workflow]
default=false
-f [HAMR filter]
default=filter_SAM_number_hits.pl
-m [HAMR model]
default=euk_trna_mods.Rdata
-Q [HAMR minimum quality score: 0-40]
default=30
-C [HAMR minimum coverage: 0-∞]
default=10
-E [HAMR sequencing error: 0-1]
default=0.01
-P [HAMR maximum p-value: 0-1]
default=1
-F [HAMR maximum FDR: 0-1]
default=0.05
-O [Panther organism taxon ID]
default="3702"
-A [Panther annotation dataset]
default="GO:0008150"
-Y [Panther test type: FISHER or BINOMIAL]
default="FISHER"
-R [Panther correction type: FDR, BONFERRONI, or NONE]
default="FDR"
-y [keep intermediate bam files]
default=false
-z [keep raw fastq files downloaded from SRA]
default=false
-q [halt program upon completion of checkpoint 2]
default=false
-G [attribute used for featurecount]
default="gene_id"
-x [max intron length for lncRNA-annotation-unique STAR mapping]
default=NA
-H [SERVER alt path for panther]
-U [SERVER alt path for HAMRLNC]
-W [SERVER alt path for GATK]
-S [SERVER alt path for HAMR]
-J [SERVER alt path for CPC2]
-M [SERVER alt path for Rfam]
-h [help message]

Running HAMRLNC

Required dependencies

  1. Linux-based computer, server, or cluster.
  2. Docker
  3. Minimum memory of 32 GB and minimum disk space of 120 GB, could require higher specs for organisms with larger genomes like human.
# pull HAMRLNC docker image:  
docker pull chosenobih/hamrlnc:v0.01
# clone HAMRLNC repo
git clone https://github.com/harrlol/HAMRLNC.git
cd HAMRLNC
# download the genome file for Arabidopsis thaliana from ENSEMBL
wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
# download the annotation file for Arabidopsis thaliana from ENSEMBL
wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.59.gff3.gz
gunzip Arabidopsis_thaliana.TAIR10.59.gff3.gz
# make sure your fa and gff3 files are in your working directory, and enter that directory
cd /your/working/directory
# run HAMRLNC with SRA IDs with all three arms activated
docker run \
  --rm -v $(pwd):/working-dir \
  -w /working-dir chosenobih/hamrlnc:v0.01 \
  -o test_run \
  -c /demo/demo_filenames.csv \
  -g Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
  -i Arabidopsis_thaliana.TAIR10.59.gff3 \
  -l 50 -n 8 -k -p -u

Running HAMRLNC as an application on CyVerse's Discovery Environment

HAMRLNC has been integrated as an app on CyVerse's Discovery Environment (DE), and it is available for use by researchers. Search for “HAMRLINC" and then select the 1.0.0 version. A short tutorial on how to run the app is available at this CyVerse wiki. CyVerse's DE provides an easy-to-use graphic user interphase for running several Life Sciences computational pipelines.

Step-by-step walkthrough

For more detailed documentation and step-by-step tutorial for running HAMRLNC using docker, please visit the Wiki.

Issues

If you encounter any issues while running HAMRLNC, please open an issue on this GitHub repo, and we'll attend to it as soon as possible.

Copyright

Copyright (c) 2024 HAMRLNC Team

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.