GithubHelp home page GithubHelp logo

fyt3rp4til / lexicon-nlp-lab Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 60.76 MB

Jupyter Notebook 100.00%
bag-of-words gensim-word2vec regex tf-idf spacy-word-embeddings bag-of-words-model gensim lemmatization n-grams named-entity-recognition parts-of-speech stemming stop-words word-embeddings nltk spacy

lexicon-nlp-lab's Introduction

๐ŸŒ Lexicon-NLP-Lab

NLP Python License: MIT

Welcome to the Lexicon! This repository contains a comprehensive collection of Jupyter notebooks and datasets focused on various Natural Language Processing (NLP) tasks.

๐Ÿ“‚ Repository Structure

๐Ÿ” Data Preprocessing

  • 1_Regex_for_information_extraction.ipynb - Regular expressions for information extraction.
  • 2_Spacy_vs_Nltk.ipynb - Comparison between Spacy and NLTK for tokenization.
  • 3_Spacy_Tokenize.ipynb - Tokenization techniques using Spacy.
  • 4_Spacy_Pipelines.ipynb - Pipelines in Spacy: Stemming and Lemmatization.
  • 5_Stemming_Lemmatization.ipynb - Stemming and lemmatization methods.
  • 5_Stemming_Lemmatization_2.ipynb - Continuation of stemming, lemmatization, and POS tagging.
  • 6_Parts_of_Speech_2.ipynb - POS tagging, Bag of Words, and NER with Spacy.
  • 6_Parts_of_Speech_in_Spacy.ipynb - Detailed POS tagging with Spacy.

๐Ÿท๏ธ Named Entity Recognition (NER)

  • 7_NER.ipynb - Named entity recognition with Spacy.
  • 7_NER_2.ipynb - Additional NER tasks and implementations.

๐Ÿ—ƒ๏ธ Bag of Words and N-Grams

  • 8_Bag_of_Words_2_SentimentAnalysis.ipynb - Sentiment analysis using Bag of Words.
  • 8_Bag_of_Words_SpamClassifier.ipynb - Spam classification with Bag of Words.
  • 9_Stop_Words.ipynb - Handling stop words in text preprocessing.
  • 9_Stop_Words_2.ipynb - Further exploration of stop words, Bag of Words, and N-grams.
  • 10_Bag_of_N_Grams_2_Fake_News_Prediction.ipynb - Fake news prediction using N-grams.
  • 10_Bag_of_N_Grams_News_Classification.ipynb - News classification with N-grams.

๐Ÿ”ค TF-IDF (Term Frequency-Inverse Document Frequency)

  • 11_TF_IDF_2_EmotionDetection.ipynb - Emotion detection using TF-IDF.
  • 11_TF_IDF_TextClassification_Ecommerce_Goods.ipynb - E-commerce goods classification using TF-IDF.

๐Ÿ’ก Word Embeddings and Vectors

  • 12_Overview_Spacy_Word_Vectors.ipynb - Overview of word vectors using Spacy and Gensim.
  • 13_Spacy_Word_Embeddings_News_Category_Classification.ipynb - News category classification using Spacy word embeddings.
  • 14_Nlp_Word_Vectors_Gensim_Overview.ipynb - Overview of word vectors using Gensim.
  • 15_Gensim_w2v_Google_Fake_News_Detection.ipynb - Fake news detection with Gensim.

๐Ÿš€ FastText Classifier

  • 16_Fasttext_Indian_Food_Receipe_Classification.ipynb - Classification of Indian food recipes using FastText.
  • 17_Fasttext_Ecommerce_Classification.ipynb - E-commerce classification using FastText.

๐Ÿ”ง Miscellaneous

  • cosine_similarity.ipynb - Computing cosine similarity between text vectors.

๐Ÿ“Š Datasets

  • Cleaned_Indian_Food_Dataset.csv - Dataset for Indian food recipes classification.
  • Fake_Real_Data.csv - Dataset containing fake and real news.
  • news_story.txt - Text file with a sample news story.
  • spam.csv - Spam dataset for classification tasks.
  • students.txt - Additional text file for experimentation.

lexicon-nlp-lab's People

Contributors

fyt3rp4til avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.