GithubHelp home page GithubHelp logo

efemeryds / offensive-language-detection Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 3.06 MB

Basic approach to the offensive language detection and checklist tests

Python 0.36% Jupyter Notebook 99.38% PowerShell 0.22% Batchfile 0.04%

offensive-language-detection's People

Contributors

efemeryds avatar melisnv avatar nikita29112 avatar

Watchers

 avatar  avatar

Forkers

melisnv

offensive-language-detection's Issues

BONUS

Develop 2 new diagnostic tests (you can use checklist): describe what they test, explain why
they are relevant and implement them. Run the tests and describe your observations. Provide
examples of difficult cases, that is, when the model fails to assign the correct label. Discuss
potential sources of errors and propose improvements to the model.

PART A - 3. Classification by fine-tuning BERT

Run your notebook on colab, which has (limited) free access to GPUs.
You need to enable GPUs for the notebook:

● navigate to Edit → Notebook Settings
● select GPU from the Hardware Accelerator drop-down

➢ Install the simpletransformers library: !pip install simpletransformers
(you will have to restart your runtime after the installation)

➢ Follow the documentation to load a pre-trained BERT model: ClassificationModel('bert',
'bert-base-cased')

➢ Fine-tune the model on the OLIDv1 training set and make predictions on the OLIDv1 test
set (you can use the default hyperparameters). Do not forget to save your model, so that
you do not need to fine-tune the model each time you make predictions.
If you cannot fine-tune your own model, contact us to receive a checkpoint.
a. Provide the results in terms of precision, recall and F1-score on the test set and provide
a confusion matrix (2 points)

Compare your results to the baselines and to the results described in the paper in 2–4
sentences

PART A - 2. Baselines

Calculate two baselines and evaluate their performance on the test set (olid-test.csv):

● The first baseline is a random baseline that randomly assigns one of the 2 classification
labels.

● The second baseline is a majority baseline that always assigns the majority class.
Calculate the results on the test set and fill them into the two tables below. Round the results to
two decimals.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.