Author: Timothy Johnstone, 2017
Development never completed and has been paused indefinitely. Plenty of similar tools (e.g. Gradio) now exist that accomplish this in a much more elegant way. It was a great exercise, though.
Snakemake-driven pipeline for creating generative text models with tensorflow-char-rnn. Produces nice reports comparing models with various parameters.
(TODO: include some samples of what this can do here)
I recommend doing this on AWS EC2 or Google Compute Cloud if you don't have your own box with a beefy, CUDA-enabled GPU with plenty of memory. GPUs speed up training a lot, especially if you have a lot of hidden states.
As of 11/26/17:
EC2 g2.2xlarge range between 0.25 and 0.80 USD per hour (0.65 in us-west_2), giving you access to 1 K520 GRID GPU with 4GB GPU memory. The AWS deep learning AMI does make things easier for beginners as everything is preinstalled and configured.
Google Compute just lowered GPU prices, they charge $0.45 for each K80 GPU attached to an instance. K80s have about 2x the compute cores (2,496) and 3x the memory (12GB) as the K520 GRID resources available on an AWS g2.2xlarge. Cost/performance-wise, it's pretty much a no-brainer to use google if you're willing to do a bit more setup.
If you are computing on the cloud, remember to stop your instances when you're not working with them or you'll rack up costs.
- Spin up a g2.2xlarge on EC2, using Deep Learning AMI with Conda (Ubuntu) (ami-f1e73689)
- Login as ubuntu
- pip install snakemake
- add ~/anaconda3/bin to your PATH
- fix your conda activate/deactivate scripts (mimicking conda/conda#5407 , until 4.4.0 comes out
See the included documentation (hopefully pretty beginner-friendly) here https://github.com/tgjohnst/auto_rnngine/blob/master/etc/cloud_setup.md
- clone this repo
- create a config yaml (examples provided in config/)
- maybe start a screen or tmux session so you can leave it running in the background
- run the pipeline with ./run_pipeline.sh config/config_file.yaml
- enjoy your robo-babble
TODO - readme about monitoring the pipeline using the included logs as well as tensorboard
This is a personal project at the moment, documentation will be updated once it is in working condition, but please let me know if you have any ideas or needs!
Snakemake pipeline framework complete with standard YAML formatTraining runs successfully on EC2Sampling runs successfully on EC2- Enable restarting half-finished trainings during training rule if previous run is detected (add completion sentinel)
Test and write up environment setup on google cloudRun pipeline successfully on google cloud- Allow for multiple model types to be tested in a single run
- Develop reporting scripts to compile results for each model
- Develop reporting scripts to assemble all model results for each run
- Add a rule at the end of each run to archive models to shared storage (bucket somewhere)
- Implement cluster options and configuration to work with SGE in an HPC environment
Tiny shakespeareHearthstone card names- Soccer player names (SoccerWiki)
Beer names (OBDB)Beer descriptions (OBDB)- Inspirational quotes
- Christmas songs
- Corporate slogans
harry potterlotr- Hitchhiker's guide