GithubHelp home page GithubHelp logo

synthenic-families's Introduction

Pipeline to classify lncRNA from different species into families according to synthenic relationships.

This pipeline was developed to classify genes from 4 nematode species: C. elegans, C. briggsae, C. remanei and C. brenneri, but it could be easily modified to be used in any other species.

REQUIREMENTS: ftp://ftp.wormbase.org/pub/wormbase/releases/WS248/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WS248.orthologs.txt

1st: for each species, to obtain a file with a sorted list of gene IDs. In this list all chromosomes are concatenated.
#ex:
#WBGene00189949 
#WBGene00021681 
#WBGene00004274 
#XLOC_001892 
#WBGene00199899 
#WBGene00199597 

NOTE: this list can contain all genes in each genome, only protein coding genes, etc.

2nd: to create a file including all orthology relationships for the four nematode species

#usage: python wormbase_orthoParalogsGH.py elegans_genes.list c_elegans.PRJNA13758.WS248.orthologs.txt celegans_ortho.out NOTE: protein coding genes are more prone to have orthology relationships annotated

3rd: to compare gene order between species.The script script produces a file including all lncRNA having conserved syntheny with lncRNA from any of the three other species

#usage: synteny_nematodesv4GH.py sp1_geneOrder.list sp2_geneOrder.list sp3_geneOrder.list sp4_geneOrder.list celegans_ortho.out 4spv4.out cluster overlap minSideGenes noHomology

- cluster (integer): number of genes considered at each side of a given lncRNA; ex: 3. NOTE: if stated 3, the considered size of the cluster is 3+3=6
- overlap (integer): minimum number of shared genes for each pairwise comparisson betwen species; ex: 3
- minSideGenes (integer): minimum number of shared genes at each side of a given lncRNA to be considered as members of the same family; ex: 1
- noHomology (yes/no): yes: to consider genes lacking orthology relationships; no: do not consider genes lacking homology	
ex: python synteny_nematodesv4GH.py celegans_geneOrder.list cbriggsae_geneOrder.list cbrenneri_geneOrder.list cremanei_geneOrder.list celegans_ortho.out out 3 3 1 no
4th: to classify lncRNA from 4 different species into families

#usage: python classifyFamiliesv5_VennGH.py cele cbrig cbren crem 4spv4.out 4spv4_Families.fam 4spv4_Families.txt 4spv4_FAMvenn.R 4spv4_GENESvenn.R >4spv4_Families.counts

- ex INPUT file: 4spv4.out
cele	cbren	XLOC_014828	XLOC_024666
cele	cbren	XLOC_014828	XLOC_024657
- ex INPUT file: cele -> #number of C. elegans genes to classify
- ex INPUT file: cbrig -> #number of C. briggsae genes to classify
- ex INPUT file: cbren -> #number of C. brenneri genes to classify
- ex INPUT file: crem -> #number of C. remanei genes to classify

- ex OUTPUT file: 4spv4_Families.fam
fam1    XLOC_010119cbrig
fam1	XLOC_019544crem
fam1	XLOC_024997cbren
- ex OUTPUT file: 4spv4_Families.txt
	fam1	  bren	brig	 rem
	fam2   	bren   rem
- ex OUTPUT file: 4spv4_Families.counts
	EleBrenBrig 56
	EleBrenRem 51
- ex OUTPUT file: 4spv4_GENESvenn.R -> R script to draw a venn diagram with the number of overlapped genes using R
- ex OUTPUT file: 4spv4_FAMvenn.R -> R script to draw a venn diagram with the number of overlapped families using R

synthenic-families's People

Contributors

cintapq avatar

Stargazers

Xingzhuo avatar Hanbo Zhao (Hanjabolgo Jakuta)  avatar Raziel Amador Rios avatar

Watchers

Salvador Capella avatar James Cloos avatar Manu avatar Toni Gabaldón avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.