Cross-species comparison of transcription factor binding revealed cistrome plasticity during plant evolution
This is an introduction about the code we used in this project. The scripts are all in the folder bin.
At first we detected the novel motifs with HOMER. To overcome the limitation of this software, we developed a K-mer model to find motifs contribute most in the binding sites. The input format could refer to the tables we uploaded. These sequences are extracted from the peak summit regions.
The GLK target genes we identified could be divided into five groups according to their conservation in five species. This script is to find the conservation of a target.
Uploaded the GO result files from agriGO and plot the results.
Find the pattern in maize genome duplication.
First, the comparison of models with multiple features shows each of our selected features have important effects on the model. In the best model script, we resample the training and test dataset for 500 times to avoid errors caused by small dataset, which achieves high accuracy.