GithubHelp home page GithubHelp logo

homebot's Introduction

HomeBot

A tool for obtaining and analyzing data from the Swedish housing market.

Install

The following should get you started on most systems.

$ pip3 install -r requirements.txt

Optionally also install extra requirements.

$ pip3 install -r requirements-extra.txt

Obtaining data

Data is obtained using spiders with Scrapy. Run

$ scrapy --help

for general help with Scrapy. To fetch data you need to get a spider crawling.

$ scrapy crawl [options] <spider>

You can pass -a url=<url> to change the starting URL for your crawls. See

$ scrapy list

for a list of available spiders.

Also see settings.py for some options that could be useful!

Examples

Fetch all residences for sale from Hemnet that cost more than 20,000,000 SEK.

Villa

scrapy crawl \
         -a url="https://www.hemnet.se/bostader?location_ids%5B%5D=17793&location_ids%5B%5D=898740&location_ids%5B%5D=473325&item_types%5B%5D=villa" \
         --output-format=csv \
         --output=sthlm_villa.csv \
         hemnet

Flat

scrapy crawl \
         -a url="https://www.hemnet.se/bostader?location_ids%5B%5D=925970&location_ids%5B%5D=898740&item_types%5B%5D=bostadsratt" \
         --output-format=csv \
         --output=sthlm_flat.csv \
         hemnet

Villa-sold

scrapy crawl \
         -a url="https://www.hemnet.se/salda/bostader?item_types%5B%5D=villa&location_ids%5B%5D=17793" \
         --output-format=json \
         --output=sthlm_villa_sold.csv \
         hemnet-sold

Flat-sold

scrapy crawl \
         -a url="https://www.hemnet.se/salda/bostader?item_types%5B%5D=villa&location_ids%5B%5D=17793&location_ids%5B%5D=898740&location_ids%5B%5D=473325" \
         --output-format=csv \
         --output=sthlm_flat_sold.csv \
         hemnet-sold

Fetch all sold residences from Hemnet with at least 10 rooms to a JSON file.

$ scrapy crawl \
         -a url="https://www.hemnet.se/salda/bostader?rooms_min=10.0" \
         --output-format=json \
         --output=hemnet-sold.json \
         hemnet-sold

Debug by add shell

from scrapy.shell import inspect_response
inspect_response(response, self)

Then in interactive shell do the work

```python
import re, json, flatten_json
s = "//script"
for script in response.xpath(s):
    text = script.xpath("text()").extract_first()
    try:
        if text:
            pat = r""".*dataLayer = \[([.\W\w]{0,})\]\;"""
            a = re.search(pat, text).group(1)
            break
    except AttributeError:
        pass
x= "{" + a.split("},{")[1]
attribs = json.loads(x).get("property")
item=flatten(attribs)

Analyzing data

When you got some data, it's time to analyze!

SQL

SQL is a nice language for relational databases. Even though we don't really have any relational data, it's still useful for doing aggregation and filtering.

You can use sqlitebiter to convert e.g. a JSON file to a SQLite format.

$ sqlitebiter file hemnet-sold.json --output-path=hemnet-sold.sql
$ sqlite3 hemnet-sold.sql
sqlite> SELECT AVG(soldprice) FROM hemnet_sold_json1;
6047405.62248996

Inspiration

This project was inspired by Lauri Vanhala's article Figuring out the best place to live in Helsinki.

homebot's People

Contributors

jonnyhou82 avatar lndmrk avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.