clidit's Introduction
Non-clickbait: ![nonclickbait](/images/wc5.png) ## Models & Interpretations #### Baseline Dummy Classifier ![](/images/dc_cm.png) #### Naive Bayes ![](/images/nb_cm.png) ![](/images/CB_coefs_nb.png) #### Logistic Regression ![](/images/lr_cm.png) ![](/images/noncb_lr_coeff.png) ## Conclusion I was able to use machine learning algorithms such as Naive Bayes, Logistic regression and SVM to accurately classify clickbait versus non-clickbait headlines. The results were quite good - within the 90-93% range for accuracy scores and 90-93% range for recall scores. I slightly prioritized recall as I figured that it would be more valuable to minimize false negatives (classifying clickbait as non-clickbait) and as such Naive Bayes performed the best. As machine learning was able to work so well, there is definitely a real world use case for deploying a machine learning solution to filter out / flag clickbait before a reader even has to visualize and discern the headline for themselves! By analyzing the coefficients of the models that performed the best, I was able to interpret and get some insight into how the models determined if a headline is clickbait or not. ## Next Steps/Future Improvement Ideas - Explore Deep NLP and neural net models to see if they make a stronger classifier - Analyze topics and themes with LDA - Possibly use LDA topics for modeling - Test model on a new dataset ## Repository Navigation 1. Data folder - all relevant csv files 2. Working Notebooks folder - scraping & api requests, cleaning & eda , modeling, front end / streamlit code 3. Final_mvp.ipynb - final notebook showcasing the end to end project. 4. README - end to end project report, reproduction instructions, repository navigation, link to presentation, sources. 5. app.py,pkls,setup.sh,requirements.txt,procfile - app files ## Reproduction Instructions 1. First, start with the cleaning&eda notebook under the 'workingnotebooks' folder - this compiles all relevant csvs (found in the data folder) and sets up the data for the project. Feature engineering code is located here and processing for EDA is also found here. 2. Second, the modeling notebook (in working notebooks) - the code here can be reproduced to further process the data for modeling and then creating/evaluating your classifiers. 3. Third, the final_mvp notebook gives an overview of my whole process - this notebook can be used for a clear picture of the end to end process but areas like data cleaning are just explained in markdown so utilize the working notebooks for all details. ## Citations Kaggle dataset: https://www.kaggle.com/amananandrai/clickbait-dataset Streamlit reference : https://docs.streamlit.io/en/stable/api.html # CliDET # CliDET # CliDET
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.