GithubHelp home page GithubHelp logo

buds-lab / building-data-genome-project-2 Goto Github PK

View Code? Open in Web Editor NEW
173.0 17.0 64.0 431.79 MB

Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition

Home Page: https://www.budslab.org/

License: Other

Jupyter Notebook 100.00%
open-source open-data open-data-science energy-efficiency energy-consumption building-energy building-automation smart-city smart-meter electricity-meter

building-data-genome-project-2's People

Contributors

cmiller8 avatar ponybiam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

building-data-genome-project-2's Issues

Switch the UID for Eagle and Peacock and redo the code names

I had the UID/Code names switched for Eagle and Peacock! Therefore we need to redo the unique id code names. I think we should take the opportunity to reorder the code name to: "AnimalName_SimplifiedSpaceUse_HumanName" -- I think this is a good idea so when people sort the UID, then it groups the buildings in more logical groups. Also, I think some of the Space Uses can be simplified so there is no "/" in the names

Site Moose converted from MJ, not KJ

Just discovered that I messed up one more conversion!

Chilled Water and Elec for Moose should be converted from Megajoules (MJ) and not kilojoules (KJ)

image

Add available data start-date in metadata

Should we add a feature with the date in which data starts being available for each building? a lot of them have missing values at the begginig of the period and it was suggested by an user that may be useful.

Building IDs?

create a deanonymzed version of the meta file for the UC berkeley

Calculate the number of meters per site

In the meter data analysis portion, calculate the number of meters from each site for Table 1 of the BDG2 paper. Also calculate the total number of meters in the dataset

Mapping between BDG1 and BDG2

As discussed on Kaggle, it would be very helpful to find out which buildings are present in both BDG1 and in BDG2. For those buildings that are present in both sites, is there also an actual overlap in data or are the timeframes of measurement different?

Wishing you all the best.

Predictive models

Predictive models with cleaned data:

  • Long-term: 1 year train, 1 year prediction
  • Short-term: 30 days train, 3 days prediction

meter reading units

Hello! You mentioned that all meter units were converted to kwh in the cleaned dataset. However, I cannot find where you did this. Can you please confirm this and point me to where you did the conversion? Thank you.

Site ID 'Wolf" has incorrect coordinates

According to Miller et al. (2020), site id "Wolf" should correspond to the University College in Dublin. The coordinates for "Wolf" available in the metadata file correspond to a point near Lauwersoog in the Netherlands.

Accordingly, the latitude and longitude coordinates for Wolf should be updated from (53.3498, 6.2603) to a point near (53.3667, 6.2583), which is the Google Maps coordinates for the University College in Dublin.

I was concerned that the weather data would also be mismatched. Miller et al. (2020) use the weather data from NOAA ISD Station 039690-99999, which I verified is near or in Dublin, so there is no need to verify that this is the correct weather data set.

Update figures for paper

Now that raw data has changed, figures for the publication must be updated.

  • Metadata features
  • Weather features
  • Normalized consumption heatmap
  • Data quality heatmap
  • Weather sensitivity screening
  • Breakout heatmap

Created some streamlit apps for BDG2

Hello friends,

I'm starting to create some Streamlit apps for BDG2 for interactive data exploration and model building. These apps can be used to better understand the data-set and used for my future teachings. My plan is to:

  • Create a metadata exploration app (app link)
    • Explore building attributes (floor area, EUI, etc.) per selected site(s)
    • Explore weather data per selected site
    • Calculate HDD/CDD with customizable base temperatures per selected site(s)
  • Create a site meters exploration tool (WIP app link)
    • Number of meters per site
    • Distribution of meter readings per selected site
    • Time-series meter reading plot per selected site
    • Heatmap showing distances between meters per site, allows selection of different distance metrics (correlation, Euclidean, etc.)
    • Clustering analysis of meters per site with customizable SKLearn parameters
  • Create a building meter analysis/modelling tool (not started)
    • TBD, probably something related to customizable regression models with SKLearn

For context Streamlit is a super easy-to-use tool to create data apps in Python. Compared to traditional dashboards like plotly dash or superset, you can utilize the full potential of Python like creating ML models and perform in-depth analysis. They will also host the apps for free (at least for now).

I noticed there is already a notebook folder in this repository, I wonder how does the app files fit in? They are not really markdown files as they need to rendered by Streamlit or run locally. Would it be possible to include the app links in the readme.md once they are complete?

Oh and for those interested I will write a tutorial on how to create these apps. Also please feel free to comment on what kind of analysis you would like to see. Cheers.

zeros in electricity_cleaned.csv

The 10_Cleaned-dataset.ipynb contains code to convert electricity.csv -> electricity_cleaned.csv by replacing the zeros with NaN. But when I checked out electricity_cleaned.csv it contains the original zeros?

Plus at first glace many sites don't have data from the earlier date in the file, ideally the earliest date where data is available by site would be included in the metadata?

Also maybe it's because I cloned the repo or didn't have git-lfs installed first, but I had to do a bit of googling to work out how to fetch the csv files. It might pay to add some details on LFS in the readme?

misaligned timestamps?

I loaded the weather.csv and electricity.csv files into Pandas. The timestamp fields seem to be inconsistent.

In weather.csv for site_id="Hog" there are two entries for 2017-11-05 (non-duplicates) due I assume to DLS end. There is also a missing entry for 2017-03-12 02:00 on DLS start.

But the electricity.csv contains 24 records per day, there is no duplicates record on 2017-11-05 and no missing on 2017-03-12

So it appears weather.csv is wall clock time in the local timezone, and electricity is perhaps standard time?

I noticed in particular in 11_Models.ipynb you join the data and weather datasets on timestamp but you haven't added an index on these columns. If you add an index the join will fail. But probably of more concern, I think the temperature and electricity will be shifted by 1 hour for half the year?

"Mandatory" meta data?

@anjukan -- You mentioned in your part that "For the metadata of the buildings, it was deemed that \texttt{square$_$feet}, \texttt{primary$_$use}, \texttt{year$_$built} and \texttt{floor$_$count} would be the only mandatory attributes."

Does this mean these were the filters for the Kaggle competition? I'm surprised that we required floor count and year built as mandatory.

@ponybiam -- there are tons of buildings in the BDG2 that don't have these two features, so I'm guessing we didn't make those same filtering steps?

Accurate statement about the issues fixed between Kaggle and BDG2

I have the following passage in the paper -- is it correct?

The data contained in this repository has several differences from that found on the Kaggle competition website. These were issues that were detected in the midst of the competition and were fixed in this updated data set. The first difference is that the BDG2.0 data set only has timestamps that are in the local time zone, including the weather data. The weather data released in the Kaggle competition had a timestamp that was set to UTC and the contestants had to come up with ways to find the right alignment for the weather data in order to use it properly. The other issue fixed was a conversion mistake in which Site X was not properly converted to \emph{kWh$_{sum}$} and was instead left in its raw form. This conversion has fixed in this data set.

Add 'kaggle' data set

This is the 2017 data used for the public leaderboard in the competition. A copy of this data set will be part of this repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.