GithubHelp home page GithubHelp logo

promoter-analysis's Introduction

Promoter-analysis: revealing the potential role of transcription factors in a given physiological process using RNA-Seq data

These scripts was used for research published at https://www.mdpi.com/2223-7747/9/9/1176/htm to:

  1. Search cis-regulatory elements (CREs) within gene promoter regions using position weight matrices (PWMs) obtained from PlantPAN3.0 (http://plantpan.itps.ncku.edu.tw/index.html) and MAST tool from MEME suite (http://meme-suite.org/). However, you can adapt the scripts to work with other databases and motif search tools.
  2. Combine CRE search results with differential gene expression (DGE) analysis to predict the potential master-regulators among plant transcription factors (TFs);
  3. Infer certain potential TF families responsible for the differential regulation of genes belonging to the particular multigene families within which both up- and downregulated genes were well-represented.

System requirements:

  • Multi-core CPU (for parallel computations)
  • Linux OS is recommended (tested on Ubuntu 14.04)
  • MEME suite is to be installed on your system (http://meme-suite.org/)
  • R studio (is obligatory for automatic 'setwd()')
  • R packages: data.table, ggplot2, ggpubr, grid, gridExtra, reshape2, XML

Instructions

If you want to save time and just use PlantPAN3.0 PWMs as the input for MAST, do following steps:

  1. Create an empty folder on your machine (name it 'Promoter-analysis' or whatever you like).
  2. Download 'Run_MAST', 'MAST_XML_parser', 'TF_family_regulons_correlation_analysis', and 'TF_regulons_enrichment_analysis' folders from this repository and put them into the folder you have created.
  3. Download the ID mapping file (all plants) from PlantPAN3.0. Put the file 'ID_mapping_all_plant.txt' into into the folder you have created. Alternative direct link to ID_mapping_all_plant.txt (2.8 MB)
  4. Use RimGubaev's script to extract promoters of your species' genes. Put the output file 'Promoters.fa' into 'Run_MAST' directory. Example output: Promoters.fa (56.3 MB)
  5. Download PlantPAN_TF_annotation_filtered.tsv, put it into 'MAST_XML_parser' folder.
  6. Download PlantPAN_meme_motifs (959 KB), put it into 'Run_MAST' folder.
  7. Download PlantPAN_TF_annotation_filtered.tsv, put it into 'Run_MAST' folder.
  8. Using bash shell, change current directory to 'Run_MAST' ($cd full_path_to_the_folder_created_in_step_1/Run_MAST)
  9. Run 'run_MAST_parallel.sh' ($bash run_MAST_parallel.bash). The output folder ('MAST_output') will appear in the current directory. Example output: MAST_output (1.34 GB).
  10. Open 'MAST_XML_parser.R' (is located in 'MAST_XML_parser') in R Studio and run this script. The output file ('mast_output_full.tsv') will appear in 'MAST_XML_parser' directory. Example output: mast_output_full.tsv (81.4 MB).
  11. Open 'Annotate_MAST_output_full.R' (is located in 'MAST_XML_parser') in R Studio and run this script. The output file ('tf_analysis_input_annotated.tsv') will appear in 'MAST_XML_parser' directory. Example output: tf_analysis_input_annotated.tsv (104.1 MB).
  12. Master-regulators prediction: put the table contains data on differential gene expression (DGE) into the folder you have created in the step 1. NB: the following columns must be in this table: GeneID (text or numeric), log2FC (numeric) (as shown below)
GeneID log2FC
107809780 4.838
107760295 -1.706

(example gene expression table 1 (2 MB)). If you want to use this example, change file's name to 'Expression_table.tsv'. Then open 'TF_regulons_enrichment_analysis.R ' (is located in 'TF_regulons_enrichment_analysis') in R Studio and run this script. The output file ('DEG_enriched_regulons.tsv') will appear in 'TF_regulons_enrichment_analysis' directory. Example output: DEG_enriched_regulons.tsv (174 B).

  1. Prediction of TF families responsible for regulation of a certain group of genes: put the table contains data on differential gene expression (DGE) into the folder you have created in the step 1. NB: differential genes you're interested in must be categorized in some way. For example, they may have GO terms or manually added groups:
GeneID Gene_group log2FC
107791722 Zinc finger proteins -9.473
107814985 Aquaporins 5.173

(example gene expression table 2 (categorized) (2 MB)). If you want to use this example, change file's name to 'Expression_table.tsv'.

Then open 'TF_family_regulons_correlation_analysis.R ' (is located in 'TF_family_regulons_correlation_analysis') in R Studio and run this script.The output png files will appear in 'Significant/Non_significant' directories. Example output:

Expansins

If you're interested in how PlantPAN_TF_annotation_filtered.tsv and chunked PlantPAN_meme_motifs were produced, you may perform the following steps:

  1. Create an empty folder on your machine (name it 'Promoter-analysis' or whatever you like).
  2. Download all the folders from this repository into the folder you have created.
  3. Download PWMs of TF binding sites (all plants) from PlantPAN3.0. Put the file 'Transcription_factor_weight_matrix.txt' into into the folder you have created. Alternative direct link to Transcription_factor_weight_matrix.txt (1.1 MB)
  4. Download the ID mapping file (all plants) from PlantPAN3.0. Put the file 'ID_mapping_all_plant.txt' into into the folder you have created. Alternative direct link to ID_mapping_all_plant.txt (2.8 MB)
  5. Open 'PlantPAN_annotation_download.R' (is located in 'PlantPAN_annotation_download') in R Studio and run this script. The output file ('PlantPAN_TF_annotation_filtered.tsv') will appear in 'Output' directory. You may download this file here: PlantPAN_TF_annotation_filtered.tsv.
  6. Open 'Preparing_MAST_input.R' (is located in 'Preparing_MAST_input.R') in R Studio and run this script. The output ('PlantPAN_meme_motifs' folder) will appear in 'Preparing_MAST_input' directory. You may download it here: PlantPAN_meme_motifs (959 KB).
  7. Move 'PlantPAN_meme_motifs' into 'Run_MAST' folder.
  8. To perform further analysis, go to step 8 of the previous section.

promoter-analysis's People

Contributors

ivantsers avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.