GithubHelp home page GithubHelp logo

arundurvasula / maize-background-selection-simulations Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 1.61 MB

A pipeline for both forward and coalescent simulations of background selection during maize domestication on Slurm

Shell 35.81% R 60.48% Python 3.71%

maize-background-selection-simulations's Introduction

###Purpose: To use approximate bayesian computation and a good rejection algorithm to create parameters that give summary statistics similar to the data found in Hufford et. al. 2012.

###Dependencies/Assumptions: To run this full pipeline you need the following:

  • The Sun Grid Engine
  • Working executables of:
    • sfs_code
    • convertSFS_CODE
    • ms
    • msstats
    • ABCreg
    • R
    • Python

If those executables are not already in your path, you must add them:

  1. create a file called ".bash_profile" in your home directory (if it doesn't exist)

  2. add this, substituting the paths to your exectuable:

     PATH=$PATH:/home/user/sfs_code/bin/:/home/user/msstats-0.3.1/bin/
     export PATH
    

###Forward Simulations: This pipeline uses sfs_code to simulate maize domestication forward in time.

Directions to recreate simulations:

  1. source submit_million_forward.sh
  2. source cleanup.sh
  3. source create_posterior.sh
  4. Graph bneck.0.post in R
  • You can test out those parameters with Rscript generate_[simtype]_params.r 1 2 3. The output will be on the terminal with parameters saved as 'paramsFile.1.2.3'

There is also a pipeline that uses slim to simulate maize domestication forward in time.

To use the slim pipeline:

  1. Make sure the folder slurm-log exists in /forward-sims/

  2. Run simulations: source ./submit_multiple_slim_arrays.sh

    This script submits the slim_pipeline.sh script multiple times, as specified in the the for loop. After the simulations finish, the files will be placed in slurm-log. There are 4 types of output files:

    • bneck.[JOBID].txt => holds the size of the population after the bottleneck for the simulation
    • paramsFile.[JOBID].txt => holds the all of the parameters used for simulation
    • slim-pipeline-out.[ARRAYID].[JOBID].txt => holds one line of summary statistics for a population
    • slim-pipeline-stderr.[ARRAYID].[JOBID].txt => contains any error messages generated during simulation as well as anything else sent to sdterr.
  3. Analysis: to run the analysis using ABCreg, you need a data file containing the mean of the summary staticstics and a prior file containing all of the simulated summary statistics. NOTE: file names and paths need to be changed. 3. Create data file (uses Hufford statistics grabs the mean using awk):

     	awk '{print($48,"\t",$37,"\t",$39, "\t",$40,"\t",$41,"\t",$42, "\t",$43, "\t")}' ~/sfs_code_workflow2/Hufford_et_al._2012_10kb_statistics/Hufford_et_al._2012_10kb_statistics.txt | grep -v "NA" | awk '{for(i=1; i<=NF; i++){sum[i]+=$i}} END {for(i=1; i<=NF; i++){printf sum[i]/NR "\t"}}' > data_file.txt
    
    1. Create prior file:

      1. Run slim-cleanup.sh to concatenate only valid stats and bottlenecks into one file.

      2. Remove stats labels from stats.txt

        awk '$1 ~ /[0-9]/ {print}' semifinal_stats.txt > final_stats.txt

      3. Grab only necessary stats:

        awk '{print($1,"\t",$2,"\t",$4, "\t",$5,"\t",$6,"\t",$8, "\t",$13, "\t")}' final_stats.txt > temp_sim_stats.txt

      4. Paste bottlenecks and sim stats together:

        paste bneck temp_sim_stats | grep -v 'nan' > prior_file.txt

    2. Run ABCreg:

      ~/ABCreg/reg -T -P 1 -S 7 -p prior_file.txt -d data_file.txt -b bneck -t 0.001

###Coalescent Simulations: This pipeline uses ms to simulate maize domestication backwards in time. Directions to recreate simulations:

  1. Rscript gen_tbs.r
  2. Rscript generate_ms_params.r
  3. source ../create_posterior_ms.sh
  4. Graph bneck.0.post in R
  • Change parameters in generate_[simtype]_params.r

###"Oh man I can't believe I did that"

  • The submit million scripts do a for loop twice that qsubs the sim script. The sim script submits an array job from 1-50,000 (max that SGE allows). Each job in the sim script has its own for loop that runs generate_[simtype]_params.r and the rest of the pipeline 10 times. So we have:

      2 array jobs x 50,000 jobs per array x 10 simulations per job = 1,000,000 simulations
    

====

Tricks for analysis

  • data file contains summary stats from observed data. It should only be 1 line. (no headers):

      .0054 -0.434
    

(the stats above are Pi and TajD)

  • prior file contains parameters and summary stats from the simulations. Each line should have the parameter value and the summary stats associated with it (no headers):

      0.835507987765595       65.7986 0.241462
      0.550444400170818       29.6863 -0.276919
      0.212850250769407       42.6132 0.212022
      0.781859162030742       106.241 -0.0995736
    

(the stats above are theta, Pi and TajD)

  • make sure you're using the right data. If you're simulating with --annotate N then use stats from noncoding windows.

  • make sure your stats are actually comparable: msstats gives theta pi per locus. Your data might have theta pi per base pair.

  • if your graph is crappy, try reducing the number of values in your posterior.

  • use cut -f [column #] to get a column of a file.

  • stats currently being compared:

      sims        obs
      1-S*         48-S_rhoMZ
      2-n1*        37-singletonsMZ
      4-theta*     39-ThetaWMZ
      5-pi*        40-ThetaPiMZ
      6-thetaH*    41-ThetaHMZ
      8-tajd      42-TajDMZ
      13-rm       43-rmin
    
  • Guide to final_stats (for reference when changing create_posterior.sh):

  • 1 S

  • 2 n1

  • 3 next

  • 4 theta

  • 5 pi

  • 6 thetaH

  • 7 Hprime

  • 8 tajd

  • 9 fulif

  • 10 fulid

  • 11 fulifs

  • 12 fulids

  • 13 rm

  • 14 rmmg

  • 15 nhaps

  • 16 hdiv

  • 17 wallb

  • 18 wallq

  • 19 rosasrf

  • 20 rosasru

  • 21 zns

maize-background-selection-simulations's People

Contributors

arundurvasula avatar rossibarra avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

maize-background-selection-simulations's Issues

ms path

Code breaks if someone has ms in different location. Should assume it's in path and maybe write something to check and exit with error if not?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.