GithubHelp home page GithubHelp logo

wiebket / delprocess Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 12.22 MB

Extract, clean, resample and enumerate load profile and survey data from a local file hierarchy retrieved from the South African Domestic Electrical Load (DEL) database.

Python 100.00%
electricity-consumption households residential south-africa

delprocess's Introduction

DOI
DEL Logo

South African
Domestic Electrical Load Study
Data Processing

About this package

This package contains tools to process primary data from the South African Domestic Electric Load (DEL) database. It requires access to csv or feather file hierarchy extracted from the original General_LR4 database produced during the NRS Load Research study.

Notes on data access:

Data can be accessed and set up as follows:

  1. From Data First at the University of Cape Town (UCT). On site access to the complete 5 minute data is available through their secure server room.
  2. For those with access to the original SQL database, delretrieve can be used to retrieve the data and create the file hierarchy for further processing.
  3. Several datasets with aggregated views are available online and can be accessed for academic purposes. If you use them, you will not need to install this package.

Package structure

delprocess
    |-- delprocess
        |-- data
            |-- geometa
                |-- 2016_Boundaries_Local
                |-- ...
            |-- specs
                |-- app_broken_00.txt
                |-- appliance_00.txt
                |-- appliance_94.txt	
                |-- behaviour_00.txt
                |-- binned_base_00.txt
                |-- binned_base_94.txt
                |-- dist_base_00.txt
                |-- dist_base_94.txt	
        |-- __init.py__
        |-- command_line.py
        |-- loadprofiles.py
        |-- plotprofiles.py
        |-- support.py
        |-- surveys.py
    |-- MANIFEST.in
    |-- README.md
    |-- setup.py

Setup instructions

Ensure that python 3 is installed on your computer. A simple way of getting it is to install it with Anaconda. Once python has been installed, the delprocess package can be installed.

  1. Clone this repository from github.
  2. Navigate to the root directory (delprocess) and run python setup.py install (run from Anaconda Prompt or other bash with access to python if running on Windows).
  3. You will be asked to confirm the data directories that contain your data. Paste the full path name when prompted. You can change this setting at a later stage by modifying the file your_home_dir/del_data/usr/store_path.txt .

This package only works if the data structure is exactly like the directory hierarchy in del_data if created with the package delretrieve:

your_home_dir/del_data
    |-- observations
        |-- profiles
            |-- raw
                |-- unit
                    |-- GroupYear
        |-- tables
            |-- ...
    |-- survey_features
    |-- usr
        |-- specs (automatically copied from delprocess/data/specs during setup)
        |-- store_path.txt (generated during setup)

Data processing

This package runs a processing pipeline from the command line or can be accessed via python directy with import delprocess.

Modules: surveys, loadprofiles, plotprofiles

Timeseries data (DEL Metering data)

From the command line

  1. Execute delprocess_profiles -i [interval] from the command line (equivalent to loadprofiles.saveReducedProfiles())
  2. Options: -s [data start year] and -e [data end year] as optional arguments: if omitted you will be prompted to add them on the command line. Must be between 1994 and 2014 inclusive
  3. Additional command line options: -c or [--csv]: Format and save output as csv files (default feather)

In python

Run delprocess.loadprofiles.saveReducedProfiles()

Additional profile processing methods:

loadRawProfiles(year, month, unit) 
reduceRawProfiles(year, unit, interval)
loadReducedProfiles(year, unit, interval)
genX(year_range, drop_0=False, **kwargs)

Data output

All files are saved in your_home_dir/del_data/resampled_profiles/[interval].

Feather file format

Feather is the devalt format for temporary data storage of the large metering dataset as it is a fast and efficient file format for storing and retrieving data frames. It is compatible with both R and python. Feather files should be stored for working purposes only as the file format is not suitable for archiving. All feather files have been built under feather.__version__ = 0.4.0. If your feather package is of a later version, you may have trouble reading the files and will need to reconstruct them from the raw MSSQL database. Learn more about feather.

Survey data (DEL Survey data)

From the command line

If you know what survey data you want for your analysis, it is easiest to extract it from the command line.

  1. Create a pair of spec files *_94.txt and *_00.txt with your specifications
  2. Execute delprocess_surveys -f [filename] (equivalent to running genS())
  3. Options: -s [data start year] and -e [data end year] as optional arguments: if omitted you will be prompted to add them on the command line. Must be between 1994 and 2014 inclusive.

In python

Import the package to use the following functions:

searchQuestions(searchterm)
searchAnswers(searchterm)
genS(spec_files, year_start, year_end)

The search is not case sensitive and has been implemented as a simple str.contains(searchterm, case=False), searching all the strings of all the Question column entries in the questions.csv data file. The searchterm must be specified as a single string, but can consist of different words separated by whitespace. The search function removes the whitespace between words and joins them, so the order of words is important. For example, 'hot water' will yield results, but 'water hot' will not!

Data output

All files are saved in .csv format in your_home_dir/del_data/survey_features/.

Spec file format

Spec file templates are copied to your_home_dir/del_data/usr/specs during setup. These can be used directly to retrieve standard responses for appliance, behavioural and demographic related questions, or be adapted to create custom datasets from the household survey data.

The spec file is a dictionary of lists and dictionaries. It is loaded as a json file and all inputs must be strings, with key:value pairs separated by commas. The specfile must contain the following keys:

key value
year_range list year range for which specs are valid; must be ["1994", "1999"] or ["2000","2014"]
features list of user-defined variable names, eg. ["fridge_freezer","geyser"]
searchlist list of database question search terms, eg. ["fridgefreezerNumber" ,"geyserNumber"]
transform dict of simple data transformations such as addition. Keys must be one of the variables in the features list, while the transformation variables must come from searchlist, eg. {"fridge_freezer" : "x['fridgefreezerNumber'] - x['fridgefreezerBroken']"}
bins dict of lists specifying bin intervals for numerical data. Keys must be one of the variables in the features list, eg. {"floor_area" : ["0", "50", "80"]}
labels dict of lists specifying bin labels for numerical data. Keys must be one of the variables in the features list, eg. {"floor_area" : ["0-50", "50-80"]}}
cut dict of dicts specifying details of bin segments for numerical data. Keys must be one of the variables in the features list. right indicates whether bins includes the rightmost edge or not. include_lowest indicates whether the first interval should be left-inclusive or not, eg {"monthly_income":{"right":"False", "include_lowest":"True"}}
replace dict of dicts specifying the coding for replacing feature values. Keys must be one of the variables in the features list, eg. {"water_access": {"1":"nearby river/dam/borehole"}}
geo string specifying geographic location detail (can be "Municipality","District" or "Province")

If no transform, bins, labels, cuts, replace or geo is required, the value should be replaced with an empty dict {}.

Creating a custom spec file

To create a custome spec file, the following process is recommended:

  1. Copy an existing spec file template and delete all values (but keep the keys and formatting!)
  2. Use the searchQuestions() function to find all the questions that relate to a variable that you are interested in. Use this to construct your searchlist.
  3. Use the searchAnswers() function to get the responses to your search.
  4. Interrogate the responses to decide if any transform, bins and replacements are needed.
  5. If bins are needed, decided whether labels and cut are required.
  6. Decide whether high level geographic information should be added to the responses and update geo accordingly.
  7. Save the file as name_94.txt or name_00.txt.

NB: Surveys were changed in 2000 and questions vary between the years from 1994 - 1999 and 2000 - 2014. Survey data is thus extracted in two batches and requires two spec files with appropriate search terms matched to the questionaire. For example, the best search term to retrieve household income for the years 1994 - 1999 is 'income', while for 2000 - 2014 it is 'earn per month'.

Acknowledgements

Citation

Toussaint, Wiebke. delprocess: Data Processing of the South African Domestic Electrical Load Study, version 1.01. Zenodo. https://doi.org/10.5281/zenodo.3605422 (2019).

Funding

This code has been developed by the Energy Research Centre at the University of Cape Town with funding from the South African National Energy Development Initiative under the CESAR programme.

Developed by Funded by
ERC Logo Sanedi Logo

delprocess's People

Contributors

wiebket avatar

Stargazers

 avatar

Watchers

 avatar

delprocess's Issues

Change processing output dirs

  • Instead of loadprofiles.saveReducedProfiles writing to profiles_dir/interval, write to usr_dir/resampled_profiles.
  • Instead of surveys.genS writing to usr_dir/usr/features write to usr_dir/survey_features

Improve performance of loadprofiles.reduceRawProfiles

Currently reads a year's data into memory and then uses a resample operation to reduce it. Crashes in year 1999 on laptop RAM as dataset is too large. Try:

  1. try dask and groupby ProfileID only --> may chuck out RecorderID, which will need to be joined afterwards
  2. chunk resample operation with a for loop
  3. figure out dask properly and try again

README

Fix install instructions:

  • install no longer prompts for data directory, that only happens when running the processing from the command line for the first time

Meaning of missing values

Hi @wiebket,
I hope you're well!
I just wanted to find out if the missing values in the meter readings for specific hours of the day for households represent power disruptions or outage of some sort?
By missing values, I mean excluding readings with 0's.

I would greatly appreciate your feedback!
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.