buds-lab / building-data-genome-project-2 Goto Github PK
View Code? Open in Web Editor NEWWhole building non-residential hourly energy meter data from the Great Energy Predictor III competition
Home Page: https://www.budslab.org/
License: Other
Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition
Home Page: https://www.budslab.org/
License: Other
I had the UID/Code names switched for Eagle and Peacock! Therefore we need to redo the unique id code names. I think we should take the opportunity to reorder the code name to: "AnimalName_SimplifiedSpaceUse_HumanName" -- I think this is a good idea so when people sort the UID, then it groups the buildings in more logical groups. Also, I think some of the Space Uses can be simplified so there is no "/" in the names
@anjukan -- you have any idea why the Swan
data set was never added to the Kaggle competition data set?
Change all non-energy meters to liters from gallons
Should we add a feature with the date in which data starts being available for each building? a lot of them have missing values at the begginig of the period and it was suggested by an user that may be useful.
create a deanonymzed version of the meta file for the UC berkeley
In the meter data analysis portion, calculate the number of meters from each site for Table 1 of the BDG2 paper. Also calculate the total number of meters in the dataset
As discussed on Kaggle, it would be very helpful to find out which buildings are present in both BDG1 and in BDG2. For those buildings that are present in both sites, is there also an actual overlap in data or are the timeframes of measurement different?
Wishing you all the best.
Predictive models with cleaned
data:
Hello! You mentioned that all meter units were converted to kwh in the cleaned dataset. However, I cannot find where you did this. Can you please confirm this and point me to where you did the conversion? Thank you.
@anjukan -- is there a reason that Bear
and Hog
are left blank -- is it because you didn't know what those units were?
Convert all sqft to sqm where there are gaps in sqm and vice versa
(This issue will be updated and worked on once the units are checked)
According to Miller et al. (2020), site id "Wolf" should correspond to the University College in Dublin. The coordinates for "Wolf" available in the metadata file correspond to a point near Lauwersoog in the Netherlands.
Accordingly, the latitude and longitude coordinates for Wolf should be updated from (53.3498, 6.2603) to a point near (53.3667, 6.2583), which is the Google Maps coordinates for the University College in Dublin.
I was concerned that the weather data would also be mismatched. Miller et al. (2020) use the weather data from NOAA ISD Station 039690-99999, which I verified is near or in Dublin, so there is no need to verify that this is the correct weather data set.
Now that raw
data has changed, figures for the publication must be updated.
Hello friends,
I'm starting to create some Streamlit apps for BDG2 for interactive data exploration and model building. These apps can be used to better understand the data-set and used for my future teachings. My plan is to:
For context Streamlit is a super easy-to-use tool to create data apps in Python. Compared to traditional dashboards like plotly dash or superset, you can utilize the full potential of Python like creating ML models and perform in-depth analysis. They will also host the apps for free (at least for now).
I noticed there is already a notebook folder in this repository, I wonder how does the app files fit in? They are not really markdown files as they need to rendered by Streamlit or run locally. Would it be possible to include the app links in the readme.md once they are complete?
Oh and for those interested I will write a tutorial on how to create these apps. Also please feel free to comment on what kind of analysis you would like to see. Cheers.
Add a new column in the metadata file for the mapping of BDG2 ids and the kaggle building id: https://github.com/buds-lab/building-data-genome-project-2/wiki/BDG-Kaggle-mapping
The 10_Cleaned-dataset.ipynb
contains code to convert electricity.csv
-> electricity_cleaned.csv
by replacing the zeros with NaN. But when I checked out electricity_cleaned.csv
it contains the original zeros?
Plus at first glace many sites don't have data from the earlier date in the file, ideally the earliest date where data is available by site would be included in the metadata?
Also maybe it's because I cloned the repo or didn't have git-lfs installed first, but I had to do a bit of googling to work out how to fetch the csv files. It might pay to add some details on LFS in the readme?
I loaded the weather.csv
and electricity.csv
files into Pandas. The timestamp fields seem to be inconsistent.
In weather.csv
for site_id="Hog"
there are two entries for 2017-11-05
(non-duplicates) due I assume to DLS end. There is also a missing entry for 2017-03-12 02:00
on DLS start.
But the electricity.csv contains 24 records per day, there is no duplicates record on 2017-11-05 and no missing on 2017-03-12
So it appears weather.csv is wall clock time in the local timezone, and electricity is perhaps standard time?
I noticed in particular in 11_Models.ipynb
you join the data and weather datasets on timestamp but you haven't added an index on these columns. If you add an index the join will fail. But probably of more concern, I think the temperature and electricity will be shifted by 1 hour for half the year?
@anjukan -- You mentioned in your part that "For the metadata of the buildings, it was deemed that \texttt{square$_$feet}, \texttt{primary$_$use}, \texttt{year$_$built} and \texttt{floor$_$count} would be the only mandatory attributes."
Does this mean these were the filters for the Kaggle competition? I'm surprised that we required floor count
and year built
as mandatory.
@ponybiam -- there are tons of buildings in the BDG2 that don't have these two features, so I'm guessing we didn't make those same filtering steps?
I have the following passage in the paper -- is it correct?
The data contained in this repository has several differences from that found on the Kaggle competition website. These were issues that were detected in the midst of the competition and were fixed in this updated data set. The first difference is that the BDG2.0 data set only has timestamps that are in the local time zone, including the weather data. The weather data released in the Kaggle competition had a timestamp that was set to UTC and the contestants had to come up with ways to find the right alignment for the weather data in order to use it properly. The other issue fixed was a conversion mistake in which Site X was not properly converted to \emph{kWh$_{sum}$} and was instead left in its raw form. This conversion has fixed in this data set.
This is the 2017 data used for the public leaderboard in the competition. A copy of this data set will be part of this repository.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.