Solution posts to Kaggle competition Two sigma connect Renthop for rental listing inquries
The basic features include features from Branden Murray's post and some further revisions.
- encoded features: This is based on the fact that some categorical turns out to have pretty high impact on prediction.
- encode some numerical features, such as
bedrooms
,distance_city
onmanager_id
, by taking the mean of features grouped by the categorical feature. - inspired by this, one can also encode important numerical features, such as
price
, on other features. - encode prediction (i.e.
interest_level
) onmanager_id
, requires cross validation. - encode (group mean, other statistics could also be defined) other numerical features (e.g.
price
) on categorical features (e.g.manager_id
) conditioned oninterest_level
, also request cross validation.
- encode some numerical features, such as
- geological features: local price fluctuation, from
plantsgo
- images features: also see magic feature
- Trained different machine learning models, including xgboost, lightgbm, nn, adaboost, gb, rf, et, lsvc, lr, knn
- check out https://github.com/bolaik/RenthopApartmentInterestLevel/tree/master/my_final_version/classifiers_ipynb
- Train level-2 models, including xgb, nn, knn, lr, lightgbm, where xgb prediction is submitted because of best cv score.