GithubHelp home page GithubHelp logo

servin / gmaps_popular_times_scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from philshem/gmaps_popular_times_scraper

0.0 0.0 0.0 31 KB

Scraper for Google Maps "Popular Times" for place entries

Home Page: https://twitter.com/philshem/status/1232395590144753664?s=20

License: The Unlicense

Python 100.00%

gmaps_popular_times_scraper's Introduction

Scraper of Google Maps "Popular Times" for business entries

Turn this:

screenshot of google maps popular times

into a machine readable dataset. (This is really unofficial. YMMV.)

to get the code

git clone https://github.com/philshem/gmaps_popular_times_scraper.git
cd gmaps_popular_times_scraper

to configure the code

Install required packages (selenium and beautifulsoup4)

pip3 install -r requirements.txt

Modify these lines in the code config.py to point to your path of Chrome and chromedriver.

CHROME_BINARY_LOCATION = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
CHROMEDRIVER_BINARY_LOCATION = '/usr/local/bin/chromedriver'

Chromedriver downloads are here. Make sure you use the version that matches your Chrome version.

to run the code

Run the scraper by putting a URL as the system argument:

python3 scrape_gm.py "$URL_TO_CSV"

or specifically for a list of URLs stored in a google sheet

python3 scrape_gm.py "https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"

The URL should point to any CSV (local or http) that has as the first column a valid google maps url. For example, a valid google maps URL:

https://www.google.com/maps/place/Der+Gr%C3%BCne+Libanon/@47.3809042,8.5325368,17z/data=!3m1!4b1!4m5!3m4!1s0x47900a0e662015b7:0x54fec14b60b7f528!8m2!3d47.3809006!4d8.5347255

Or a shortened one is also valid:

https://goo.gl/maps/r2xowUB3UZX7ZL2u6

Note that the html page source can be saved to the folder html/ by setting the parameter in config.py. If the html files are saved as cache, with a timestamp for when they were retrieved, and can be cleaned out once in a while. Logs are saved to logs/, which makes an archive of the URLs retrieved based on the CSV input file.

results

The output data (sample_output.csv) has this structure (abbreviated):

place,url,scrape_time,day_of_week,hour_of_day,popularity_percent_normal,popularity_percent_current
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,13,38,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,14,45,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,15,61,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,16,79,30
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,17,90,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,18,88,

All technical timestamps, for example 20200318_163629, are in the machine's time. The hour from the column hour_of_day is in the local time of the mapped place.

Data in csv format is saved to data/. You can use the code (csv2sql.py) to convert to a SQLite3 database. Or this awk command to take all individual CSVs for each place and time, and write to one big CSV called all.csv

awk 'FNR==NR||FNR>2' data/*.csv > all.csv

dataviz

And to visualize the data for a week of one Kebab shop in Zürich (note that Friday at 12 is max crowd!)

shawarma popularity

gmaps_popular_times_scraper's People

Contributors

metebalci avatar philshem avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.