GithubHelp home page GithubHelp logo

siddharthm10 / sentiment-analysis-e-commerce Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 1.06 MB

During my internship with the Microsoft Technology Associate (MTA) Program, I made this as my Major Project.

Jupyter Notebook 100.00%

sentiment-analysis-e-commerce's Introduction

Sentiment Analysis For Female Clothing

During my internship with the Microsoft Technology Associate (MTA) Program, I made this as my Major Project.

Gathering Data:

I was provided the data by the intrnship mentors. It was from an E-commerce website and all the reviews were for female clothing items.

Data Cleaning :

The Columns I had in the data were :

  • Age : Age of the Reviewer
  • Title : Title they gave to their review
  • Review_Text : Review description
  • Recommended_IND : 1/0 , It's whether they will recommend the same item to someone else or not?.
  • Pos_Feedback_Count : It's the number of positive reviews given to different items by a certain buyer.
  • Division_Name , Department_Name, Class_name : All these Columns differentiate between different items and give us the category to which they belong.
  • Rating_Class : Good/Bad. A column that I added which was to be predicted. It is based on the "Rating" column which had Rating on the scale of 1-5. (Good-4,5; Bad-1,2,3)

I deleted all the entries with null values of Review_Text as our analysis totally depends on that.

Text Pre-Processing :

The next step was to do text pre-processing. So I followed the following steps :

  1. Lowercasing : As same words might be differentiated because of the formating that they are in.(UpperCase/LowerCase).
  2. Removing Punctuation : As Punctuation have no role in Sentiment Analysis. And they also add complexities to the implementation of the model.
  3. Removing Stop Words : Stopwords are the English words which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence. For example, the words like the, he, have etc.
  4. Removing Commonly Occuring words : As they wont help us in making the model any better. Also it will decrease the processing to be done by the model. It is quite helpful while working with huge datasets.
  5. Removing Rarely Occuring words : As they wont help us as it is very rare if the users will use those words in future.
  6. Tokenization : It is breaking the sentence into small tokens making the implementation of model very easy.
  7. Lemmatization : It usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .
  8. Removing Words without Meaning : Words like "Soooooo", "veryyyy", etc, were to be removed as even a change in a single "o" can make it a new word for the model and thus they have gotta be removed or spell-checked. Spell-checking is very resource intensive process and takes a lot of time. So I chose to remove those words.

Exploratory Data Analysis(EDA) :

  1. Category of items people of specific class Range Bought:

As the heading implies it shows the Variety of items bought by people of certain age group.

Agewise Category Shopping

  1. Rating-Count by buyers : Here, we can see that the data we have has mostly 5 rating, then 4 and so on. So its good for us to identify the good words as some reviews with 3 rating might have some good and bad comments which may cause confusion for machine.

Rating-Count by Buyers

  1. Good V/s Bad Rating Count : This shows the Rating count Good V/S Bad.

Count

  1. Clothing Id Count : It shows us the no. of reviews for certain clothin item.

Clothing Id Count

  1. Positive Vs Negative Feedback Counts Age-Wise: It's the Age-wise Feedback Plot.

Count

  1. Words mostly used in Positive Reviews:

Words Used in Positive Reviews

  1. Words mostly used in Negative Reviews:

Words Used in Negative Reviews

sentiment-analysis-e-commerce's People

Contributors

siddharthm10 avatar

Stargazers

Archi Mehta avatar  avatar

Watchers

 avatar

Forkers

atulpokharel-gp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.