GithubHelp home page GithubHelp logo

sablokgaurav / arabidopsis-maf-cap-accessions Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 43.41 MB

arabidopsis-maf-cap-accessions. genome extraction, alignments, visualization, phylogenomics, ancestral tree

License: MIT License

Shell 0.07% R 0.03% Python 0.01% HTML 99.89% Nextflow 0.01%
arabidopsis arabidopsis-thaliana maf phylogeny phylogeny-plots visualization

arabidopsis-maf-cap-accessions's Introduction

arabidopsis-maf-cap-accessions: brief writeup

arabidopsis alignments for the maf and the cap gene clusters. multiple alignment and visualization of the aligned genes and clusters. The genes that are used are At5g65050-At5g65080, containing 4 genes (Maf2-5 cluster of genes) and At2g13540 CBP80 Cap-binding protein 80. The corresponding genome assemblies and the download link to the assemblies along with the code to download, align, annotate and draw the alignments. Analysis for the arabidopsis genomes and the accessions cited in the paper: https://www.nature.com/articles/s41586-023-06062-z#data-availability.

Analysis outlay: since the genomes reported in the paper have not been annotated, miniprot was used to align the protein sequences for the corresponding genes and then align them to the genome of all these accesssion and extract those regions and then make a alignment of the same. The code along with the runs files are present within the corresponding project execution files.

Methods

the accession used for the analysis are listed here: accession

  • downloadrecords.sh: Run this to download the sequence records from the ebi or the ena. Either you can run this or you can run the code below to generate the direct apis for the download
    *code for generating the direct apis for the arabidopsis ena*
for i in $(cat arabidopsisaccessionlinks.md | grep GCA | cut -f 2 -d "|");
do
         echo "curl https://www.ebi.ac.uk/ena/browser/api/fasta/$i.1\?download\=true\&gzip\=true -o $i.gz";
done

arabidopsis genome directapis directapis.

          *Normalize your header by running this before running the analysis*
          cat fastafile | cut -f 1 -d " " | cut -f 1 -d "." > output.fasta
          # the output fasta will be used for all the analysis. 
  • alignmentrecords.sh: Run this to make the corresponding alignments, this follows the lift off approach by transferring the annotations.
  • phylogeny.R: Run this to make the phylogeny.
  • visualfreq.R: Run this to make the alignment visualization.
  • mapalignment-phylogeny.R: Run this to make the alignment, visualization.
  • generatemRNAs.py: Run this to extract the corresponding mRNAs.
  • genome-annotation-visualizer.R: Run this to make the visualization of the genomic features. You can find the code here also evoseq and here also genome-annotation

How to read this github repository

  • allassembly.md all accession that were studied

  • arabidopsisaccessionlinks.md links to the accession and the corresponding ENA archives

  • arabidopsis_paper.pdf arabidopsis paper

  • directapis.txt directapis for the ena

  • cap_alignments folder containing cap alignments with a readme as how to generate them

  • cap_final_joined_fasta folder containing the final fasta, alignments, ancestral tree, phylogenetic tree, acestral sequence, alignment visualization

  • cap_genes folder containing the cap genes

  • maf_alignments folder containing maf alignments with a readme as how to generate them

  • maf_final_joined_fasta folder containing the final fasta alignments, ancestral tree, phylogenetic tree, acestral sequence, alignment visualization

  • maf_genes folder containing the cap genes

  • python_scripts python scripts for analysis

  • r_scripts r scripts for analysis

  • shell_scripts shell scripts for analysis

  • README.md README for the complete analysis

Folder read for the analysis

cap_final_joined_fasta: File listing cap_final_joined_fasta

    alignments can be run with the following: 
       for i in *.fasta; do echo prank -d=${i} -o=${i%.*}.aligned.fasta -showanc -showtree; done
├── all.cap.gff.clipped.gff: All aligned mRNA positions. 
├── capgenes.aligned.fasta.best.anc.dnd: best phylogenetic tree 
├── capgenes.aligned.fasta.best.anc.fas: best ancestral sequence 
├── capgenes.aligned.fasta.best.dnd: best phylogenetic ancestral tree 
├── capgenes.aligned.fasta.best.fas: alignment 
└── capview.html: visualization of alignment

maf_final_joined_fasta: File listing maf_final_joined_fasta

├── AT5G65050.all.out.fasta : mRNA regions for the AT5G65050
├── AT5G65050.gff.clipped.gff : aligned position information for the AT5G65050
├── AT5G65060.all.out.fasta : mRNA regions for the AT5G65060
├── AT5G65060.gff.clipped.gff : aligned position information for the AT5G65060
├── AT5G65070.all.out.fasta : mRNA regions for the AT5G65070
├── AT5G65070.gff.clipped.gff : aligned position information for the AT5G65070
├── AT5G65080.all.out.fasta : mRNA regions for the AT5G65080
├── AT5G65080.gff.clipped.gff : aligned position information for the AT5G65070
├── final.all.linear.tar.bz : all arabidopsis accessions
├── maf_aligned_ancestral_tree : ancestral tree for each of the indiviual.
├── maf_aligned_best : aligned regions for each of the indiviuals along with the visualization
├── maf_ancestral_sequence : ancestral sequences for each of them.

Uncompress the tar archive by using the tar -xJf TAIR10_GFF3_genes.tar.xz for the genome annotations. if you have any questions i can be contacted at [email protected] or [email protected]

Gaurav
Academic Staff Member
Bioinformatics
Institute for Biochemistry and Biology
University of Potsdam
Potsdam,Germany

arabidopsis-maf-cap-accessions's People

Watchers

Gaurav avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.