efemeryds / offensive-language-detection Goto Github PK
View Code? Open in Web Editor NEWBasic approach to the offensive language detection and checklist tests
Basic approach to the offensive language detection and checklist tests
Develop 2 new diagnostic tests (you can use checklist): describe what they test, explain why
they are relevant and implement them. Run the tests and describe your observations. Provide
examples of difficult cases, that is, when the model fails to assign the correct label. Discuss
potential sources of errors and propose improvements to the model.
Load the training set (olid-train.csv) and analyze the number of instances for each of the two classification labels.
Run your notebook on colab, which has (limited) free access to GPUs.
You need to enable GPUs for the notebook:
● navigate to Edit → Notebook Settings
● select GPU from the Hardware Accelerator drop-down
➢ Install the simpletransformers library: !pip install simpletransformers
(you will have to restart your runtime after the installation)
➢ Follow the documentation to load a pre-trained BERT model: ClassificationModel('bert',
'bert-base-cased')
➢ Fine-tune the model on the OLIDv1 training set and make predictions on the OLIDv1 test
set (you can use the default hyperparameters). Do not forget to save your model, so that
you do not need to fine-tune the model each time you make predictions.
If you cannot fine-tune your own model, contact us to receive a checkpoint.
a. Provide the results in terms of precision, recall and F1-score on the test set and provide
a confusion matrix (2 points)
Compare your results to the baselines and to the results described in the paper in 2–4
sentences
Calculate two baselines and evaluate their performance on the test set (olid-test.csv):
● The first baseline is a random baseline that randomly assigns one of the 2 classification
labels.
● The second baseline is a majority baseline that always assigns the majority class.
Calculate the results on the test set and fill them into the two tables below. Round the results to
two decimals.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.