GithubHelp home page GithubHelp logo

compgen-caulerpa-chloroplast's Introduction

README to Caulerpa chloroplast comparative genomics project

This repo contains the code and documentation for the comparative genomics of the Caulerpa chloroplast genomes. How these genomes were obtained is detailed elsewhere (https://github.com/ejongepier/compgen-caulerpa-fin). Results are stored in minio bucket fnwi/202203-compgen-chloroplast-jongepier/.

Usage

Preliminary chloroplast assembly

The purpose is to get a quick assembly to use for read mapping and filtering. The filtered reads will be used in the final assembly

sbatch scripts/getorganelle.sh -f fnwi/202201-compgen-caulerpa-jongepier/results/qc/qc/sickle/${spp}_R1_sickle.fq.gz -r fnwi/202201-compgen-caulerpa-jongepier/results/qc/qc/sickle/${spp}_R2_sickle.fq.gz -o fnwi/202203-compgen-chloroplast-jongepier/results/assembly -d embplant_pt -p ${spp}-chloroplast

Post process by changing the fasta header, doubling the sequence and building the bowtei2 db. The purpose of doubling the seq is to allow for concordant mapping accross the circularized junction.

Filter chloroplast reads

The purpose is to get the chloroplast reads as input to metaspades

sbatch scripts/assembly-based-filtering-illumina.sh -f fnwi/202201-compgen-caulerpa-jongepier/results/qc/qc/sickle/${spp}_R1_sickle.fq.gz -r fnwi/202201-compgen-caulerpa-jongepier/results/qc/qc/sickle/${spp}_R2_sickle.fq.gz -d fnwi/202203-compgen-chloroplast-jongepier/results/assembly/$spp-chloroplast-getorganelle-doubled -p ${spp} -o fnwi/202203-compgen-chloroplast-jongepier/results/readfilt
sbatch scripts/assembly-based-filtering-minion.sh -i fnwi/202011-metagenomics-caulerpa-anastasiabarilo/data/wgs/${spp}-minion-wgs-filtered.fastq.gz -d  fnwi/202203-compgen-chloroplast-jongepier/results/assembly/$spp-chloroplast-getorganelle-doubled -p ${spp} -o fnwi/202203-compgen-chloroplast-jongepier/results/readfilt

Cloroplast meta-assembly

Identify chloroplast ORFs

ORFs were identified with EMBOSS getorfs utility, and annotated using blastp against NCBI nr. This analyses was run on blobfish.

Get chloroplast genome assemblies:

mkdir -p data/assemblies
mc cp fnwi/202201-compgen-caulerpa-jongepier/results/masurca/clen-masurca-organel/clen-chloroplast-genome.fa.gz data/assemblies/
mc cp fnwi/202201-compgen-caulerpa-jongepier/results/getorganelle/cpro-chloroplast-genome.fa.gz data/assemblies/

Get ORFs:

SPP=clen
mkdir -p results/getorfs/
gzip -d data/assemblies/${SPP}-chloroplast-genome.fa.gz
getorf -circular Y -reverse T -sequence data/assemblies/${SPP}-chloroplast-genome.fa -outseq results/getorfs/${SPP}-chloroplast-orfs.faa

Get NCBI db:

cd ~/Databases/ncbi-nr
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
gzip -d nr.gz
mv nr nr.faa
makeblastdb -dbtype 'prot' -in nr.faa
echo `date` > VERSION.txt
cd -

Blast orfs against nr:

blastp -db ~/Databases/ncbi-nr/nr -query results/getorfs/${SPP}-chloroplast-orfs.faa -outfmt 6

Cleanup

gzip results/getorfs/${SPP}-chloroplast-orfs.faa
mc cp results/getorfs/${SPP}-chloroplast-orfs.faa fnwi/202203-compgen-chloroplast-jongepier/results/getorfs/
rm -r data results

Annotate chloroplast genomes

Chloroplast genomes were annotated with the webutility of GeSeq.

Poolseq variant calling

sbatch scripts/depth-pileup-illumina.sh -f fnwi/202201-compgen-caulerpa-jongepier/results/assembly-based-filt/clen_R1_clen-chloroplast-noncirc-genome-mapped.fastq.gz -r fnwi/202201-compgen-caulerpa-jongepier/results/assembly-based-filt/clen_R2_clen-chloroplast-noncirc-genome-mapped.fastq.gz -d fnwi/202201-compgen-caulerpa-jongepier/results/masurca/clen-masurca-organel/clen-chloroplast-genome -p clen -o fnwi/202203-compgen-chloroplast-jongepier/results/pileup

Authors

compgen-caulerpa-chloroplast's People

Contributors

ejongepier avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.