GithubHelp home page GithubHelp logo

optionalg / scrapy-flask-imdb-python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from geogas/scrapy-flask-imdb-python

0.0 2.0 0.0 241 KB

Python project scraping imdb and web application implemented using Flask.

Python 69.91% CSS 18.29% HTML 10.84% JavaScript 0.96%

scrapy-flask-imdb-python's Introduction

Project Description

That is a simple Python project illustrating the use of the following:

  1. Scrapy (scraping and crawling framework)
  2. Flask (micro web development framework based on Werkzeug)

The project is split up into two subprojects located in the respective folders. We firstly scrape the Internet Movie Database (imdb) with the aim to get information for movies we are interesting in. This information is persistenly stored in the mongodb database. Given that a movie can be represented as a document, mongodb was considered the best match for that use case. The second subproject corresponds to a web application being responsible for rendering the data we gathered from imdb.

Screenshots

Screenshot

Screenshot

Installation

If you have Vagrant installed you can simply run vagrant up to get a running environment.

To manually install the prerequisites on a ubuntu/debian system you can type the following in your shell.

# install mongo and python 
sudo apt-get install -y mongodb python-dev python-pip python-lxml
# install python packages
sudo pip install -r requirements.txt
# create mongo index for speeding up queries
mongo scripts/create_index.js

Components

###scrapy_imdb Location: scrapy_imdb

Goal of our scraping application is to fetch information about movies. For example: name, rating, genre, cast, etc. We specify a url that corresponds to a list assembled by imdb itself, or by a user. E.g. top-250 movies (http://www.imdb.com/chart/top). Then the scrapy spider parses this list and for every movie existing there it acquires information. This information is later being stored to imdb.movies collection of mongodb database by the implemented pipeline.

###flask_imdb Location: flask_imdb

A web application was implemented to present the aforementioned movie related information in a human friendly manner. This application is backed up by a server provided by the flask framework. Server listens for user requests and dispatces these requests to the corresponding views. A sidebar allowing for predefined queries exists. The user can also issue a request to the server by typing a movie's name (or part of it, a rating (1-10), a desired genre (e.g. crime), or a specific year.

Filling out mongodb collection

cd scrappy\_flask\_imdb/scrappy\_imdb
scrapy crawl imdb

This opetation will take some time and after its execution a number of movies will exist in the movies collection of the imdb mongodb.

Starting the flask server

Once spider and pipeline have completed, the server can be started and content can be served to the user via the web browser. In order to start the server simply type:

cd scrappy\_flask\_imdb/flask\_imdb/
python manage.py runserver

Check web page

Open your preferred browser and type in the location bar: http://localhost:5000/index

Cleanup

Execute the following commands for dropping the movies collection:

mongo imdb --eval "db.movies.drop()"

For dropping the whole imdb database please execute:

mongo imdb --eval "db.dropDatabase()"

scrapy-flask-imdb-python's People

Contributors

geogas avatar fretscha avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.