adeslatt / methylseqcaptureseq Goto Github PK
View Code? Open in Web Editor NEWMegan Barefoot's PhD thesis project
Megan Barefoot's PhD thesis project
for each command in bash script, make into a Nextflow process and test with test data (https://github.com/nf-core/test-datasets/tree/methylseq/testdata)
piggy back from tools with code already written in Nextflow
create new code in nextflow for biscuit and wgbs_tools
create a new docker file and build image for wgbs_tools
create Flowcraft components for each Nextflow process
build a pipeline with Flowcraft
run Nextflow pipeline on CloudOS (lifebit)
Starting with the bash scripts -- take them to build the required Nextflow pipeline for execution on Amazon or other cloud resources.
What is the best practice when we need to containerize another repository?
Such as wgbs_tools
We need to containerize the wgbs_tools for use in our nextflow pipeline for methylseq.
#Dependencies wgbs_tools
samtools
python 3+
c++ boost library
#How to download:
git clone https://github.com/nloyfer/wgbs_tools.git
cd wgbs_tools
#Compile the cpp files:
python3 setup.py
#used to be this:
g++ -std=c++11 src/pat2beta/stdin2beta.cpp -o src/pat2beta/stdin2beta
g++ -std=c++11 src/pat_sampler/sampler.cpp -o src/pat_sampler/pat_sample
g++ -std=c++11 wgbs_tools/pipeline_wgbs/patter.cpp -o wgbs_tools/pipeline_wgbs/patter
g++ -std=c++11 wgbs_tools/pipeline_wgbs/match_maker.cpp -o wgbs_tools/pipeline_wgbs/match_maker
#setup reference genome
python3 wgbs_tools.py init_genome /path/to/genome.fa GENOME_NAME
#could loop in reference
#ln -s /path/to/genome.fa .
#init.genome dependencies
import argparse
from utils_wgbs import delete_or_skip, validate_single_file, eprint, IllegalArgumentError, DIR
import re
import numpy as np
import pandas as pd
import os.path as op
import os
from itertools import groupby
import subprocess
from multiprocessing import Pool
import multiprocessing
#bam2pat
python3 wgbs_tools.py bam2pat BAM_PATH
#bam2pat dependencies
import os
import os.path as op
import argparse
import subprocess
from multiprocessing import Pool
from utils_wgbs import IllegalArgumentError, match_maker_tool, patter_tool, add_GR_args
from init_genome_ref_wgbs import chromosome_order
from pat2beta import pat2beta
from pipeline_wgbs.test import run_test
from genomic_region import GenomicRegion
divide repository into a nextflow pipeline methylSeqCaptureSeq-nf and methylSeqCaptureSeq. The first will contain the nextflow pipeline for processing and the latter will contain any jupyter notebooks that contain the analyses.
samtools version 1.9
trim_galore version 0.6.4_dev (dependency fastqc and cudadapt v. 2.8)
biscuit version 0.3.8.20180515
bamtools version 2.5.1
picard version 2.18.15-snapshot
wgbs_tools = github repository (dependencies python3, c++ boost library, samtools)
https://github.com/nloyfer/wgbs_tools
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.