GithubHelp home page GithubHelp logo

nlp-assignment's Introduction

Authorship Attribution with Machine Learning ๐Ÿ“š๐Ÿค–

Welcome to the GitHub repository for the Authorship Attribution project, where machine learning meets linguistic analysis! This project is all about classifying authors of texts using their unique writing styles. The dataset comprises texts from six different authors, making it a supervised learning challenge with a twist of linguistics.

Project Overview ๐ŸŒŸ

Authorship Attribution is the process of identifying the author of a text based on their unique writing style or 'fingerprint'. This project is split into two main parts:

  1. Data Cleaning and Feature Engineering - Where we prepare the text data and extract meaningful features that capture the essence of each author's style.
  2. Model Training and Evaluation - Where various machine learning models are trained and evaluated to find the one that best identifies the authors.

Repository Contents ๐Ÿ“

Part1_Data_Cleaning_and_Feature_Engineering.ipynb
Part2_Model_Training_and_Evaluation.ipynb
cleaned_data.csv
mwe_tokenizer.pkl
Assignment_Data (folder containing dataset)

Technologies Used ๐Ÿ’ป

Python
Pandas & NumPy for data manipulation
NLTK for natural language processing
scikit-learn for machine learning
Matplotlib & Seaborn for visualization
Jupyter Notebook for interactive development

Performance Highlight: 95% F1 Score on Stratified 5-Fold Cross-Validation ๐Ÿ…๐Ÿ“ˆ

The pinnacle of success in this Authorship Attribution project is the remarkable achievement of a 95% F1 score, meticulously obtained through Stratified 5-Fold Cross-Validation. This exceptional result is far more than a mere indicator of accuracy; it's a compelling evidence of the model's robustness and its consistent performance across diverse data subsets.

nlp-assignment's People

Contributors

gangula-karthik avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.