4th place solution for the Foursquare Location Matching competition
The solution write-up is available here
The solution shared here contain's @theo's pipeline. It is slightly more complicated than the actual pipeline, but somehow runs out of memory during inference.
The submitted model does not use features from fe_theo.py
, and uses less candidates.
-
Competition data is available on the competition page
-
Ressources & dictionaries are available on Kaggle
Notebooks contain our training pipeline, they need to be run in the following order :
Cleaning.ipynb
: Processes the dataMatching.ipynb
: Creates pairsLevel 1.ipynb
: Creates features for the level 1 modelClassification.ipynb
: UsingLEVEL = 1
, trains the prefiltering level 1 modelLevel 2.ipynb
: Creates features for the level 2 modelClassification.ipynb
: UsingLEVEL = 2
, trains the final model
src
├── inference
│ └── main.py # Boosting inference functions
├── model_zoo
│ ├── catboost.py # To train a Catboost model
│ ├── lgbm.py # To train a LightGBM model
│ └── xgb.py # To train a XGBoost model
├── training
│ └── main_boosting.py # Boosting training functions
├── utils
│ ├── logger.py # Logging utils
│ └── plot.py # Plotting utils
├── cleaning.py # Functions for data cleaning
├── dtypes.py # Handling pandas dataframe dtypes
├── fe_theo.py # Theo's features
├── fe.py # Youri & Vincent's features
├── matching.py # Functions for pairs matching
├── params.py # Parameters
├── pp.py # Post-processing utils
└── ressources.py # Ressources used for cleaning / matching