antonroman / smart_meter_data_analysis Goto Github PK
View Code? Open in Web Editor NEWThis repository contains all the code developed to analyze the smart meter data with HTM and LSTM
This repository contains all the code developed to analyze the smart meter data with HTM and LSTM
S05 should give the value of the meter at the end of everyday and it should always be equal or higher than the day before. If there is a negative delta it can be a meaningful input variable to detect anomalies.
Thanks!
The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid.
If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_invalid_records['meterId'] = meter_id
detect_bc_invalid.py:39: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
We should check the integrity of the provided data. It is useful for two reasons:
To complete the N/A values there are different strategies:
First please check if there are many N/A and missing values and then we'll decide what to do.
We could even compare different approaches to fill the missing data if there is a relevant number of corrupt rows. The best approach would be the one which gives gets the best forecast performance from the model.
On the other side, it is worth checking this paper (https://www.sciencedirect.com/science/article/pii/S2352467720303003) as it seems to explain how to deal with this problem. I'll try to read it as well before our meeting.
Thanks, great job!!
We would need to create 10 CSV files aggregating the values of 10%, 20%... 100% of the S02 and S05 values. So we will have 20 CSV files in total. We have to aggregate values which corresponds to the same sensing time.
We will use them later with forecasting models to check how the MAPE improves with the aggregation level (at least in theory).
The JSON meter files also include the S04 values. We should generate files as we did for S02 and S05.
S04: this report provides monthly power consumption information. It includes both the Absolute value (the absolute energy reading the meter is showing at the moment of the report generation) as well as the incremental value since the last S04 was issued (usually 1 month, but there are exceptions). Values are in kWh.
Thanks!
Missing time point in S02 and S05 may represent either PLC communication or other type of issues in the meters.
We should if there are missing timestamps in the series and generate a CSV listing all those files with missing values.
We need to find some correlation between the variables identified as possible indicators of damage in the meter.
We have included a non-exhaustive list here:
Appendix1: Input variables of forecasting and anomaly detection problems
Repeat #11 for ARIMA for all the S02 aggregated files.
Firstly univariate just using the previous load, and then using also temperature.
The "tl" object contains of objects like this: {"status_code":1,"t":"2019-01-07T02:40:24.000Z","status":"TF","l":"2019-01-08T00:35:27.452Z"}
More info here: https://docs.google.com/document/d/115VEvkWMn1ApgOcBWt8-IqQD3E3b-fx_82eHAZd7ujA/edit#heading=h.av483g1628qv
There is a file per subscriber and it includes the coordinates at the end of the file. For each subscriber we should generate two CSV files with the timestamp and the Rx values in different columns.
We need to download and prepare weather data from a station as close as possible to the meters. The more interesting variable for us will be the temperature which, according to other papers, is the variable with a highest correlation to the power consumption behind previous load data.
Use aggregated S02 files to forecast the load one-week ahead and calculate MAPE and RMSE for the different levels of aggregation.
Firstly univariate just using the previous load, and then using also temperature.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.