prediction model of hotel reservation dataset
-
from sebadefs import distribution_analysis
based on https://github.com/cokelaer/fitter -
sklearn.preprocessing.RobustScaler
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html
-
df = pd.get_dummies()
-
Under-sampling is a technique to balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class. NOTE : We should use sampling methods only upon the train dataset otherwise it would be inacurate.
-
SVM-SMOTE, an over-sampling technique, is used to investigate how well it handles the trade-off. SMOTE, its ancestor, is a popular over-sampling technique which balances class distribution by synthetically generating new minority class instances along directions from existing minority class instances towards their nearest neighbours.SVM-SMOTE focuses on generating new minority class instances near borderlines with SVM so as to help establish boundary between classes.
https://towardsdatascience.com/5-smote-techniques-for-oversampling-your-imbalance-data-b8155bdbe2b5
https://arxiv.org/pdf/1106.1813.pdf
Note: Models were tested first without Under-Sampling and Over-sampling, then best models were tested again with Under-Sampling and Over-sampling preprocessing.