Promoter-analysis: revealing the potential role of transcription factors in a given physiological process using RNA-Seq data

These scripts was used for research published at https://www.mdpi.com/2223-7747/9/9/1176/htm to:

Search cis-regulatory elements (CREs) within gene promoter regions using position weight matrices (PWMs) obtained from PlantPAN3.0 (http://plantpan.itps.ncku.edu.tw/index.html) and MAST tool from MEME suite (http://meme-suite.org/). However, you can adapt the scripts to work with other databases and motif search tools.
Combine CRE search results with differential gene expression (DGE) analysis to predict the potential master-regulators among plant transcription factors (TFs);
Infer certain potential TF families responsible for the differential regulation of genes belonging to the particular multigene families within which both up- and downregulated genes were well-represented.

System requirements:

Multi-core CPU (for parallel computations)
Linux OS is recommended (tested on Ubuntu 14.04)
MEME suite is to be installed on your system (http://meme-suite.org/)
R studio (is obligatory for automatic 'setwd()')
R packages: data.table, ggplot2, ggpubr, grid, gridExtra, reshape2, XML

Instructions

If you want to save time and just use PlantPAN3.0 PWMs as the input for MAST, do following steps:

Create an empty folder on your machine (name it 'Promoter-analysis' or whatever you like).
Download 'Run_MAST', 'MAST_XML_parser', 'TF_family_regulons_correlation_analysis', and 'TF_regulons_enrichment_analysis' folders from this repository and put them into the folder you have created.
Download the ID mapping file (all plants) from PlantPAN3.0. Put the file 'ID_mapping_all_plant.txt' into into the folder you have created. Alternative direct link to ID_mapping_all_plant.txt (2.8 MB)
Use RimGubaev's script to extract promoters of your species' genes. Put the output file 'Promoters.fa' into 'Run_MAST' directory. Example output: Promoters.fa (56.3 MB)
Download PlantPAN_TF_annotation_filtered.tsv, put it into 'MAST_XML_parser' folder.
Download PlantPAN_meme_motifs (959 KB), put it into 'Run_MAST' folder.
Download PlantPAN_TF_annotation_filtered.tsv, put it into 'Run_MAST' folder.
Using bash shell, change current directory to 'Run_MAST' ($cd full_path_to_the_folder_created_in_step_1/Run_MAST)
Run 'run_MAST_parallel.sh' ($bash run_MAST_parallel.bash). The output folder ('MAST_output') will appear in the current directory. Example output: MAST_output (1.34 GB).
Open 'MAST_XML_parser.R' (is located in 'MAST_XML_parser') in R Studio and run this script. The output file ('mast_output_full.tsv') will appear in 'MAST_XML_parser' directory. Example output: mast_output_full.tsv (81.4 MB).
Open 'Annotate_MAST_output_full.R' (is located in 'MAST_XML_parser') in R Studio and run this script. The output file ('tf_analysis_input_annotated.tsv') will appear in 'MAST_XML_parser' directory. Example output: tf_analysis_input_annotated.tsv (104.1 MB).
Master-regulators prediction: put the table contains data on differential gene expression (DGE) into the folder you have created in the step 1. NB: the following columns must be in this table: GeneID (text or numeric), log2FC (numeric) (as shown below)

GeneID	log2FC
107809780	4.838
107760295	-1.706

(example gene expression table 1 (2 MB)). If you want to use this example, change file's name to 'Expression_table.tsv'. Then open 'TF_regulons_enrichment_analysis.R ' (is located in 'TF_regulons_enrichment_analysis') in R Studio and run this script. The output file ('DEG_enriched_regulons.tsv') will appear in 'TF_regulons_enrichment_analysis' directory. Example output: DEG_enriched_regulons.tsv (174 B).

Prediction of TF families responsible for regulation of a certain group of genes: put the table contains data on differential gene expression (DGE) into the folder you have created in the step 1. NB: differential genes you're interested in must be categorized in some way. For example, they may have GO terms or manually added groups:

GeneID	Gene_group	log2FC
107791722	Zinc finger proteins	-9.473
107814985	Aquaporins	5.173

(example gene expression table 2 (categorized) (2 MB)). If you want to use this example, change file's name to 'Expression_table.tsv'.

Then open 'TF_family_regulons_correlation_analysis.R ' (is located in 'TF_family_regulons_correlation_analysis') in R Studio and run this script.The output png files will appear in 'Significant/Non_significant' directories. Example output:

If you're interested in how PlantPAN_TF_annotation_filtered.tsv and chunked PlantPAN_meme_motifs were produced, you may perform the following steps:

Create an empty folder on your machine (name it 'Promoter-analysis' or whatever you like).
Download all the folders from this repository into the folder you have created.
Download PWMs of TF binding sites (all plants) from PlantPAN3.0. Put the file 'Transcription_factor_weight_matrix.txt' into into the folder you have created. Alternative direct link to Transcription_factor_weight_matrix.txt (1.1 MB)
Download the ID mapping file (all plants) from PlantPAN3.0. Put the file 'ID_mapping_all_plant.txt' into into the folder you have created. Alternative direct link to ID_mapping_all_plant.txt (2.8 MB)
Open 'PlantPAN_annotation_download.R' (is located in 'PlantPAN_annotation_download') in R Studio and run this script. The output file ('PlantPAN_TF_annotation_filtered.tsv') will appear in 'Output' directory. You may download this file here: PlantPAN_TF_annotation_filtered.tsv.
Open 'Preparing_MAST_input.R' (is located in 'Preparing_MAST_input.R') in R Studio and run this script. The output ('PlantPAN_meme_motifs' folder) will appear in 'Preparing_MAST_input' directory. You may download it here: PlantPAN_meme_motifs (959 KB).
Move 'PlantPAN_meme_motifs' into 'Run_MAST' folder.
To perform further analysis, go to step 8 of the previous section.

ivantsers / promoter-analysis Goto Github PK

promoter-analysis's Introduction

Promoter-analysis: revealing the potential role of transcription factors in a given physiological process using RNA-Seq data

System requirements:

Instructions

If you want to save time and just use PlantPAN3.0 PWMs as the input for MAST, do following steps:

If you're interested in how PlantPAN_TF_annotation_filtered.tsv and chunked PlantPAN_meme_motifs were produced, you may perform the following steps:

promoter-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs