GithubHelp home page GithubHelp logo

antonroman / smart_meter_data_analysis Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 21.48 MB

This repository contains all the code developed to analyze the smart meter data with HTM and LSTM

Python 74.57% Jupyter Notebook 25.43%

smart_meter_data_analysis's People

Contributors

antonroman avatar gbarreiro avatar

Stargazers

 avatar

smart_meter_data_analysis's Issues

Check that S05 timseries is monotonic for each meter

S05 should give the value of the meter at the end of everyday and it should always be equal or higher than the day before. If there is a negative delta it can be a meaningful input variable to detect anomalies.

Thanks!

Verify control bit in S02 files

The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid.
If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.

Search for N/A or missing values in S02 and S05 time-series and find the optimal approach to fil them

We should check the integrity of the provided data. It is useful for two reasons:

  • it avoids problems when using the data as input for ML algorithms once normalized
  • the number and frequency of N/A or missing values can be a valuable input variable to detect anomalies. -> we could generate a file with these problems for the anomaly detection problem, if it makes sense to you, please feel free to create an issue for this task.

To complete the N/A values there are different strategies:

  • drop the line.
  • use the previous value
  • use the value of the same time of the previous day
  • use an average value

First please check if there are many N/A and missing values and then we'll decide what to do.

We could even compare different approaches to fill the missing data if there is a relevant number of corrupt rows. The best approach would be the one which gives gets the best forecast performance from the model.

On the other side, it is worth checking this paper (https://www.sciencedirect.com/science/article/pii/S2352467720303003) as it seems to explain how to deal with this problem. I'll try to read it as well before our meeting.

Thanks, great job!!

generate CSV files at different aggregation levels

We would need to create 10 CSV files aggregating the values of 10%, 20%... 100% of the S02 and S05 values. So we will have 20 CSV files in total. We have to aggregate values which corresponds to the same sensing time.

We will use them later with forecasting models to check how the MAPE improves with the aggregation level (at least in theory).

Generate S04 files from the original JSON files

The JSON meter files also include the S04 values. We should generate files as we did for S02 and S05.

S04: this report provides monthly power consumption information. It includes both the Absolute value (the absolute energy reading the meter is showing at the moment of the report generation) as well as the incremental value since the last S04 was issued (usually 1 month, but there are exceptions). Values are in kWh.

Thanks!

Check if there are missing hours and days in timestamps

Missing time point in S02 and S05 may represent either PLC communication or other type of issues in the meters.
We should if there are missing timestamps in the series and generate a CSV listing all those files with missing values.

Find and prepare metheorological data from the meter location

We need to download and prepare weather data from a station as close as possible to the meters. The more interesting variable for us will be the temperature which, according to other papers, is the variable with a highest correlation to the power consumption behind previous load data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.