GithubHelp home page GithubHelp logo

ml-data-clasifier's Introduction

Tabular Waste Data Classifications Model POCs

  1. Fast AI multi-label classification POC - EN
  2. SpaCy & Scikit-learn multi-label classifications POC - EN, NL
  3. SPaCy Entities Anlysis POC - EN, NL

See the ML model here

Uses Dutch Waste Data

  • Construction and demolition waste
  • Packaging waste and recyclables
  • Electronic and electrical equipment
  • Vehicle and oily wastes
  • Healthcare and related wastes
data ommitted within repo for data sensitivity reasons

Training of the Fast AI Machine Learning classification model:

This project uses Fast AI Tabular Neural Nets for ML classification model:

  • Using neural nets for analyzing tabular data
  • Loading data into Pandas DataFrame
  • Using categorical variables for entity embedings(more on embedings)
  • using continuous variables (numeric values) for neural nets
  • using 3 data sets: train, validation and test data
*unfortunately for data privacy reasons the data required is not included in this repo. Please reach out or message if you will

Treating The Data:

1.Translation services

  • Google tranlsate API and service account
  • client was set up to provide the translations from nl to en

2. Augmenting data

  • Treating Boolean-like field value overwrrides - fields of 2 options of strings become integers - 0 and 1
  • Fields such as pureOrMixed string values of pure and mixed become integers 1 or 0 to be set later as continuous variables in tabular learner
  • Prefilling the fields where possible - such as waste description field, prefilled with euralCodeDescription when underdefined

3. Creating 3 sets of data: train, validate and test data

  • loaded to pandas DataFrame
  • for training of ML model - uses train and validation data with rich fields
  • for testing of ML model - uses test data with missing fields

ml-data-clasifier's People

Contributors

yolantele avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.