GithubHelp home page GithubHelp logo

mevers / allhomes Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 24.78 MB

R package to obtain past sales data from allhomes.com.au

Home Page: https://mevers.github.io/allhomes/

License: Other

R 100.00%
allhomes housing-data

allhomes's Introduction

allhomes

Update October 2022

In mid October 2022, there was a major & breaking change to how Allhomes/Domain Group makes past sales data available through allhomes.com.au. As a result, all methods provided by the allhomes package to extract past sales data have been invalidated. Until I have had an opportunity to carefully review these changes, I cannot say if or when a fix will be possible. Until then, sadly, you will not be able to download Allhomes past sales data through allhomes.


Overview

This is the repository for the allhomes R package. The main function that the package provides is get_past_sales_data() which extracts past sales data from allhomes.com.au for a (or multiple) suburb(s) and year(s).

Installation

Install the package from CRAN

install.packages("allhomes")

Or directly from GitHub

remotes::install_github("mevers/allhomes")

Details

The function get_past_sales_data() takes the following two arguments:

  • suburb: This is a character vector denoting a (or multiple) suburbs. Every entry must be of the form "<suburb_name>, <state/territory_abbreviation>", e.g. "Balmain, NSW".
  • year: This is an numeric or integer vector of the the year(s) of the sales history.

Example:

get_past_sales_data("Balmain, NSW", 2019) %>% print(width = 100)
#[2022-07-27 14:52:47] Looking up division ID for suburb='Balmain, NSW'...
#[2022-07-27 14:52:47] URL: https://www.allhomes.com.au/svc/locality/searchallbyname?st=NSW&n=balmain
#[2022-07-27 14:52:47] Finding data for ID=7857, year=2019...
#[2022-07-27 14:52:47] URL: https://www.allhomes.com.au/ah/research/_/120785712/sale-history?year=2019
#[2022-07-27 14:52:48] Found 229 entries.
## A tibble: 229 × 27
#   divis…¹ state postc…² value  year address bedro…³ bathr…⁴ ensui…⁵ garages carpo…⁶ contr…⁷ trans…⁸
#   <chr>   <chr> <chr>   <int> <dbl> <chr>     <dbl>   <dbl> <lgl>     <dbl> <lgl>   <chr>   <chr>  
# 1 Balmain NSW   2041     7857  2019 1 Long…      NA      NA NA           NA NA      06/12/… 02/04/…
# 2 Balmain NSW   2041     7857  2019 7 Alex…      NA      NA NA           NA NA      30/08/… 16/10/…
# 3 Balmain NSW   2041     7857  2019 29 Bir…      NA      NA NA           NA NA      25/10/… 06/12/…
# 4 Balmain NSW   2041     7857  2019 2 Well…       6       3 NA            4 NA      25/05/… 26/08/…
# 5 Balmain NSW   2041     7857  2019 109 Mo…       4       2 NA            2 NA      25/02/… 08/04/…
# 6 Balmain NSW   2041     7857  2019 10 Tha…       4       2 NA            4 NA      05/10/… 16/12/…
# 7 Balmain NSW   2041     7857  2019 3/100 …      NA      NA NA           NA NA      18/07/… 06/09/…
# 8 Balmain NSW   2041     7857  2019 160 Be…       5       4 NA            1 NA      18/10/… 13/12/…
# 9 Balmain NSW   2041     7857  2019 25 Isa…      NA      NA NA           NA NA      01/05/… 02/09/…
#10 Balmain NSW   2041     7857  2019 71 Mor…       4       2 NA            2 NA      24/05/… 05/07/…
## … with 219 more rows, 14 more variables: list_date <chr>, price <dbl>, block_size <dbl>,
##   transfer_type <chr>, full_sale_price <dbl>, days_on_market <dbl>, sale_type <lgl>,
##   sale_record_source <chr>, building_size <lgl>, land_type <lgl>, property_type <lgl>,
##   purpose <chr>, unimproved_value <lgl>, unimproved_value_ratio <lgl>, and abbreviated variable
##   names ¹​division, ²​postcode, ³​bedrooms, ⁴​bathrooms, ⁵​ensuites, ⁶​carports, ⁷​contract_date,
##   ⁸​transfer_date
## ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Under the hood, the function get_past_sales_data() first calls a helper function get_ah_division_ids() that determines for every suburb entry the Allhomes "division" name and ID. The division ID is then used to extract past sales data from the Allhomes website using the low-level function extract_past_sales_data().

Currently, there are limited sanity checks in place to verify if past sales data are available for a particular suburb and year. Allhomes does not have data for all suburbs and years (for example, Allhomes past sales data for Victoria is pretty much absent).

allhomes also provides two datasets divisions_ACT and divisions_NSW that list division names and IDs for all Allhomes divisions (suburbs) in the ACT and NSW, respectively.

Getting involved

Please report any bugs as GitHub issues. If you like to get involved, please get in touch and/or submit a PR.

Further comments

Allhomes localities

The (unofficial) Allhomes API distinguishes between different types of "localities"; in increasing level of granularity these are: state > region > district > division > street > address. Divisions (roughly) correspond to suburbs. The allhomes package pulls in past sales data at the division (i.e. suburb) level.

Allhomes past sales data

Allhomes (which is part of Domain Group) receives historical past sales data from relevant state departments. Some details on Allhomes' data retention are given here.

While there seems to exist an (unofficial) Allhomes API to query IDs (which are necessary for looking up past sales data), past sales data themselves need to be scraped from somewhat awkwardly-formatted static HTML tables. Data for every sale is stored within a <tbody> element; within every <tbody> element, individual values (address, price, dates, block size, etc.) are spread across 3 lines, each contained within a <td> element; unfortunately, the format of every line is not consistent.

Disclaimer

This project is neither related to nor endorsed by allhomes.com.au. With changes to how Allhomes (and Domain group) manages and formats data, some or all of the functions might break at any time. There is also no guarantee that historical past sales data won't change.

All data provided are subject to the allhomes "Advertising Sales Agreement terms and conditions - All Homes Pty Ltd".

allhomes's People

Contributors

hadley avatar mevers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

hadley datalearns

allhomes's Issues

Suburb with whitespace not processed properly

This returns an empty tibble.

get_past_sales_data("Swinger Hill, ACT", 2020)
#[1] "https://www.allhomes.com.au/svc/locality/searchallbyname?st=ACT&n=swinger-hill"
## A tibble: 0 × 6
## … with 6 variables: division <chr>, state <chr>, postcode <chr>, value <int>, year <dbl>, data <lgl>
## ℹ Use `colnames()` to see all variable names

This works

get_ah_division_ids("Swinger Hill, ACT")
#[1] "https://www.allhomes.com.au/svc/locality/searchallbyname?st=ACT&n=swinger-hill"
#      division state postcode value
#1 Swinger Hill   ACT     2606 18009

This returns NULL

extract_past_sales_data("Swinger Hill", 18009, 2020)
# NULL

This returns a tibble

extract_past_sales_data("swinger-hill", 18009, 2020)
#[2022-07-26 17:23:13] Parsing data for swinger-hill, 2020
#            address bedrooms bathrooms ensuites garages carports contract_date transfer_date  list_date  price block_size
#1    8 Oldham Court        3         2       NA       2        0    20/01/2020            NA 19/12/2019 810000          0
#2   21 Jewell Close        2         1       NA       1        1    05/12/2020            NA 13/11/2020 592000          0
#3    7 Sulman Place        2         1       NA       2        0    07/11/2020            NA 15/10/2020 580000        190
#4    4 Hallen Close        2         1       NA       0        1    16/10/2020            NA 01/10/2020 540000        847
#...

Need to have consistent processing of suburbs with whitespaces/special characters.

`get_past_sales_data()` fails on non-existent suburb

The following gives an error

get_past_sales_data("Arboretum, ACT", 2022)
#[2022-08-20 01:18:11] Looking up division ID for suburb='Arboretum, ACT'...
#[2022-08-20 01:18:11] URL: https://www.allhomes.com.au/svc/locality/searchallbyname?st=ACT&n=arboretum
#Error in `dplyr::mutate()`:
#! Problem while computing `data = purrr::pmap(list(.data$value, .data$year, .env$quiet), extract_past_sales_data)`.
#Caused by error in `.data$value`:
#! Column `value` not found in `.data`.
#Run `rlang::last_error()` to see where the error occurred.

Expected behaviour:

A warning/error that the suburb could not be found and/or a zero-row tibble/data.frame.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.