<<<<<<< HEAD A web crawler to scrape data from New Delhi, Kolkata, Mumbai, Hyderabad, Chennai, Bangalore, Mysore using a scrapy project. How to run:
- Install scrapy
- Run "scrapy crawl -o stuff.json" on command line
- Stuff.json will contain json data of scraped stuff. The spider code is in /spiders/zomatospider.py. Uncomment the other cities' URLs in the start_urls list.
- Do json to csv conversion if required. Note: I have written code so that lat, lon, cities, reviews are scraped. We can scrape a lot more by modifying zomatospider.py.
The scraped data can be found in \data
and it contains three files restaurants.csv
, cuisines.csv
, collections.csv
. Each restaurant in the dataset is uniquely identified by its r_id
. restaurants
contains the following variables:
r_type
: Whether the listing is a casual dining restaurant, a fine-dining restaurant, a cafe etc.r_name
area
bookmarks
: # bookmarks of the restaurant made using the bookmark feature in the websitecheckins
: Check-ins using the "been here" featurecity
cost
: Cost for two in rupeesr_address
link
photos
: No. of photos of the restaurant uploaded to the servicer_id
r_latitude
: Latitude coordinate of the restaurant's locationr_longitude
: Longitude coordinate of the restaurant's locationrating
: Average rating out of 5rating_votes
: Number of ratingsreviews
: Number of reviews
This code is the newer modified version of the scrapy code from https://github.com/mushimaster/ZomatoData. This cod eis obsolete as Zomato changed its web markup and hence the spider can't scrape well.
9c384b9f6d41a60608175e9c6cf7bde2846a09b6