/Data
- /features: features tables.
- /Seattle: put raw Yelp review, merged inspection instances, and inspection record data here.
/Data Exploration
- Link inspection record
- Data cleaning
- Exploratory analysis and visualization
/Feature Generation
-
/Ngram: code and output
-
/Topic Modeling:
-
code
-
visualization demo in Jupyter Notebook
-
mallet: to replicate labeled-LDA result, 1. use R code to generate rev_tm_violation_rating.txt. 2. run command below in the terminal:
-
bin/mallet import-file --input rev_tm_violation_rating.txt --output yelp-short.seq --stoplist-file yelp.stops --label-as-features --keep-sequence --line-regex '([^\t]+)\t([^\t]+)\t(.*)' bin/mallet run cc.mallet.topics.LabeledLDA --input yelp-short.seq --output-topic-keys yelp-llda.keys --output-doc-topics docsAsTopicsProbs.txt
the output is rev_tm_violation_rating.txt
-
/Machine Learning Pipeline
-
/config: machine learning model training configuration yaml files
-
/mloutput: model evaluation output
-
classifiers.py: model training
-
python3 classifiers.py <config file path>
-
-
MagicLoop.ipynb: run classifiers.py in Jupyter Notebook for handy visualization and model comparison
Ran Bi, Shambhavi Mohan, Minjia Zhu @Uchicago Harris, CAPP