GithubHelp home page GithubHelp logo

sandy4321 / webscraping_twittersentimentanalysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bensooraj/webscraping_twittersentimentanalysis

0.0 2.0 0.0 164 KB

To Scrape IMDB for Celebrity Data and Analyze Sentiment on Twitter | Edureka Course Work

Python 100.00%

webscraping_twittersentimentanalysis's Introduction

Edureka Python Project Documentation

Problem Statement

IMDB provides a list of celebrities born on the current date. Below is the link: http://m.imdb.com/feature/bornondate

Get the list of these celebrities from this webpage using web scraping (the ones that are displayed i.e top 10). You have to extract the below information:

  1. Name of the celebrity
  2. Celebrity Image
  3. Profession
  4. Best Work

Once you have this list, run a sentiment analysis on twitter for each celebrity and finally the output should be in the below format

  1. Name of the celebrity:
  2. Celebrity Image:
  3. Profession:
  4. Best Work:
  5. Overall Sentiment on Twitter: Positive, Negative or Neutral

Hint: Use IMDB scrapping sample example as reference for scraping the mentioned web page. For sentiment analysis use the Twitter sentiment code as reference.

Please Note That I Am Using Python 3.4

Tools and Packages Used

• Version: Python 3.4 [VERY IMPORTANT] • Tweepy  Tweepy is an open-sourced, hosted on GitHub, and enables Python to communicate with the Twitter platform and use its API. Here's the documentation.

• Codecs  The codecs module provides stream and file interfaces for transcoding data in your program. In this project I use the module for storing the tweets as Unicode text. Here's the documentation.

• String (punctuation)  To strip the tweets of all punctuations.

• BeautifulSoup  Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree using Python parsers like lxml and html5lib. It automatically converts incoming documents to Unicode and outgoing documents to UTF-8. Here's the documentation.

• Selenium  The webdriver kit emulates a web-browser (I chose FireFox) and executes the JS scripts to load the dynamic content.

Challenges Faced during the project

Tweepy has an issue with Python 3

Error message: TypeError: Can't convert 'bytes' object to str implicitly inside: tweepy\streaming.py

Solution:
Can be found at tweepy/tweepy#615. In streaming.py: I changed line 161 to

self._buffer += self._stream.read(read_len).decode('ascii')

and line 171 to

self._buffer += self._stream.read(self._chunk_size).decode('ascii')

and then reinstalled.

The IMDB website has dynamic content:

Reference: http://fruchter.co/post/53164489086/python-headless-web-browser-scraping-on-amazon

Description: Had to use the Selenium’s webdriver to emulate a Firefox browser and execute the JS functions which dynamically fetches the details of celebrities born on the current day.

webscraping_twittersentimentanalysis's People

Contributors

bensooraj avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.