GithubHelp home page GithubHelp logo

rali-udem / arpi_air_canada Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 3.0 5.44 MB

Code base to work with Air Canada's data for the Tenth Montreal Industrial Problem Solving Workshop

Python 97.15% Shell 2.85%

arpi_air_canada's Introduction

Tenth Montreal Problem Solving Workshop / Air Canada

This code:

  • has been used to prepare the data for the workshop
  • shows how to load the data and work with the pandas library to manipulate the data
  • shows how to code a dummy clusterer and how to evaluate it

Sample code

A sample clusterer is provided in the file sample_clusterer.py Its sole purpose is to show how to load the data and manipulate it, then cluster it and evaluate your algorithm. Feel free to do whatever you like with this. The evaluation framework can also be modified!

Similarly, for labeling, take a look at sample_labeler.py (and also at sample_clusterer.py which shows how to use pandas.)

Data preparation

To prepare the data, the following recipe was used, starting from the original data set in Excel form, to produce the final dataset used in the workshop (aircan-data-split-clean.pkl and the equivalent aircan-data-split-clean.xlsx).

You don't have to rerun this, just use the pickle provided, unless you have difficulty loading the pickle.

The data split was 82.5% train, 7.3% dev, 10.2% test.

export INPUT_DIR=/your/input/directory
export OUTPUT_DIR=/your/output/directory

python import_excel.py ${INPUT_DIR}/10-july-2020/IVADO\ Data\ July\ 10\ 2020.xlsx ${OUTPUT_DIR}/aircan-data-2018-raw.pkl
python import_excel.py ${INPUT_DIR}/15-june-2020/IVADO\ Data\ 15\ June\ 2020.xlsx ${OUTPUT_DIR}/aircan-data-2019-raw.pkl
python sanitize.py ${OUTPUT_DIR}/aircan-data-2018-raw.pkl ${OUTPUT_DIR}/aircan-data-2018-clean.pkl
python sanitize.py ${OUTPUT_DIR}/aircan-data-2019-raw.pkl ${OUTPUT_DIR}/aircan-data-2019-clean.pkl
python combine_datasets.py ${OUTPUT_DIR}/aircan-data-2018-clean.pkl ${OUTPUT_DIR}/aircan-data-2019-clean.pkl ${OUTPUT_DIR}/aircan-data-full-clean.pkl
python split_dataset.py  ${OUTPUT_DIR}/aircan-data-full-clean.pkl  ${OUTPUT_DIR}/aircan-data-split-clean.pkl
python dump_to_excel.py  ${OUTPUT_DIR}/aircan-data-split-clean.pkl ${OUTPUT_DIR}/aircan-data-split-clean.xlsx

Fabrizio G

arpi_air_canada's People

Contributors

rali-udem avatar vletard avatar dahrs avatar felipe-git2020 avatar smolpixel avatar ilan006 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.