All work done in Data Science Pipeline Course nearly all of these jupyter notebook files require additional download of a subset of yelp dataset. The dataset is specified in the naming convention of the file. Alternatively, the HTML is provided for you.
Guide to what's in each file:
Data Cleaning and Management -dataset manipulation, combination, cleaning, etc -openrefine usage
Exploratory Data Analysis -Histogram -Boxplot -Scatterplot -QQ plot -Star Plot
Statistics -Chi Square -Correlation Coefficients -Association Rule -T Testing
Machine Learning -Regression techniques with Gradient boosting -Classification with Logistic Regression
Feature Engineering -Cross validation with Gradient boosted regression -Gaussian Random Projection with standard scaling -PCA
Sentiment Analysis and Visualization -Textblob -visualization of data with graphical tools -Natural Language Processing
For Project, See https://github.com/DanielWang2029/Amazon-Price-Prediction