antonroman / smart_meter_data_analysis Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 21.48 MB

This repository contains all the code developed to analyze the smart meter data with HTM and LSTM

Python 74.57% Jupyter Notebook 25.43%

smart_meter_data_analysis's People

Contributors

Stargazers

smart_meter_data_analysis's Issues

Check that S05 timseries is monotonic for each meter

S05 should give the value of the meter at the end of everyday and it should always be equal or higher than the day before. If there is a negative delta it can be a meaningful input variable to detect anomalies.

Thanks!

Verify control bit in S02 files

The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid.
If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.

Fix: "A value is trying to be set on a copy of a slice from a DataFrame." in detect_bc_invalid.py

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_invalid_records['meterId'] = meter_id
detect_bc_invalid.py:39: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Search for N/A or missing values in S02 and S05 time-series and find the optimal approach to fil them

We should check the integrity of the provided data. It is useful for two reasons:

it avoids problems when using the data as input for ML algorithms once normalized
the number and frequency of N/A or missing values can be a valuable input variable to detect anomalies. -> we could generate a file with these problems for the anomaly detection problem, if it makes sense to you, please feel free to create an issue for this task.

To complete the N/A values there are different strategies:

drop the line.
use the previous value
use the value of the same time of the previous day
use an average value

First please check if there are many N/A and missing values and then we'll decide what to do.

We could even compare different approaches to fill the missing data if there is a relevant number of corrupt rows. The best approach would be the one which gives gets the best forecast performance from the model.

On the other side, it is worth checking this paper (https://www.sciencedirect.com/science/article/pii/S2352467720303003) as it seems to explain how to deal with this problem. I'll try to read it as well before our meeting.

Thanks, great job!!

Plot showing hourly meter load for the time-series duration

A plot like this could be useful to identify different consumption patterns:

If could we get this graph for a random sample of 30 meters? Open to discuss better approaches.

generate CSV files at different aggregation levels

We would need to create 10 CSV files aggregating the values of 10%, 20%... 100% of the S02 and S05 values. So we will have 20 CSV files in total. We have to aggregate values which corresponds to the same sensing time.

We will use them later with forecasting models to check how the MAPE improves with the aggregation level (at least in theory).

Generate S04 files from the original JSON files

The JSON meter files also include the S04 values. We should generate files as we did for S02 and S05.

S04: this report provides monthly power consumption information. It includes both the Absolute value (the absolute energy reading the meter is showing at the moment of the report generation) as well as the incremental value since the last S04 was issued (usually 1 month, but there are exceptions). Values are in kWh.

Thanks!

Check if there are missing hours and days in timestamps

Missing time point in S02 and S05 may represent either PLC communication or other type of issues in the meters.
We should if there are missing timestamps in the series and generate a CSV listing all those files with missing values.

Find correlation between input variables and detected failures in the meters

We need to find some correlation between the variables identified as possible indicators of damage in the meter.
We have included a non-exhaustive list here:
Appendix1: Input variables of forecasting and anomaly detection problems

Calculate MAPE and RMSE for one-week-ahead S02 forecasting using ARIMA

Repeat #11 for ARIMA for all the S02 aggregated files.

Firstly univariate just using the previous load, and then using also temperature.

Extract event timeline from original JSON files

The "tl" object contains of objects like this: {"status_code":1,"t":"2019-01-07T02:40:24.000Z","status":"TF","l":"2019-01-08T00:35:27.452Z"}

More info here: https://docs.google.com/document/d/115VEvkWMn1ApgOcBWt8-IqQD3E3b-fx_82eHAZd7ujA/edit#heading=h.av483g1628qv

Generate script to build S02 and S05 CSV files from JSON files

There is a file per subscriber and it includes the coordinates at the end of the file. For each subscriber we should generate two CSV files with the timestamp and the Rx values in different columns.

Find and prepare metheorological data from the meter location

We need to download and prepare weather data from a station as close as possible to the meters. The more interesting variable for us will be the temperature which, according to other papers, is the variable with a highest correlation to the power consumption behind previous load data.

Calculate MAPE and RMSE for one-week-ahead S02 forecasting using LSTM

Use aggregated S02 files to forecast the load one-week ahead and calculate MAPE and RMSE for the different levels of aggregation.
Firstly univariate just using the previous load, and then using also temperature.

antonroman / smart_meter_data_analysis Goto Github PK

smart_meter_data_analysis's People

Contributors

Stargazers

smart_meter_data_analysis's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs