GithubHelp home page GithubHelp logo

oxml_esg_bl_nc's Introduction

OxML_ESG_BL_NC

Competition Winner !!

Certificate of Achievement here

[Link to Kaggle Competion](https://link-url-here.org](https://www.kaggle.com/competitions/oxml2023mlcases-esg-classifier)

Kaggle Competition Details:

As a Data Scientist in a rating agency, your goal is to build an ESG document classifier that can take a document as an input, classify each page to be either E,S or G related. The business wants your approach to beat an existing baseline that results in 90% F-score, while having a strong 95% on environmental content

Evaluation

Metric The evaluation metric for this competition is Mean F1-Score. The F1 score, commonly used in information retrieval, measures accuracy using the statistics precisionand recall. The F1 metric weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously. Thus, moderately good performance on both will be favored over extremely good performance on one and poor performance on the other. ### Submission Format For every file x page in the dataset, submission files should contain two columns: id and class. The file should contain a header and have the following format: id,class report_123.pdf.4,social Citation Khemon. (2023). OxML 2023 | ML Cases | ESG Document Classifier. Kaggle. https://kaggle.com/competitions/oxml2023mlcases-esg-classifier

Approach and Plan

Here is some information about the format of the module and the schedule: Format: We will be using Kaggle as a platform to host the challenge. You are encouraged to team up in group of 3 people. Working in small teams have several benefits: you will likely be more engaged, therefore you will take more out of the program You will learn faster by leveraging other’s experiences, You will build up your network. If you still want to work solo, it is possible. Case selection: You will chose the case after the Case presentations. A group can only chose to work on a single ML case simultaneously. Deliverable: Before June 21st (end of the day), you will need to provide : A submission file that includes model predictions the code for building and training your model a small presentation that explains your approach to solve the problem There is a ranking based on model performance, but clarity on code and presentation greatly matter to be part of the award winning teams. The best projects will be showcased through the OxML platform. Communication & Support: Discussions between speakers and candidates will happen on Slack workspace We highly value collaborative learning, and encourage groups to support each other through a dedicated "Support" Slack channel Speakers will only answer most relevant/common questions that benefits every groups Schedule : [LIVE] 30/05 (13:00 UK time) - 1h : Case presentation [LIVE] 31/05 (13:00 UK time) - 1h : Case presentation 31/05-02/06 - 1h : Case and group selection 02/06 - 21/06 : Work on the Case / Get support through Slack from other groups and speakers 21/06: Submit your Case for final evaluation [LIVE] 30/06 (13:00 UK time) - 1h : Speakers share final ranking along with their feedback from they have seen from groups.

Objective: Shoot for high 80%s for accuracy on the classification. Once reached, move to objective 2 of table detector project.

Prediction:

  1. One model for class and high accuracy of submission.csv
  2. ** Extra ** probabilty of strenth to the class

Ideas: LDA topic modeling

UMAP finding - The environment and social classes appear closely correlated and there may be confusion from prediction models on these two classes

Screenshot 2023-06-06 at 4 57 59 PM

Considerations: how to predict?

  • whole page
  • paragraph
  • sentence
  • word

do we predict a probabilty with a classification for strenth to the class? Example: Class Prediciton: Governance Probabilty: 78% strength to class

oxml_esg_bl_nc's People

Contributors

blodgic avatar ncosme avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.