nilmtk / nilm_metadata Goto Github PK

A schema for modelling meters, measurements, appliances, buildings etc

Home Page: http://nilm-metadata.readthedocs.org

License: Apache License 2.0

Python 99.10% Shell 0.48% Batchfile 0.42%

nilm_metadata's Introduction

NILM METADATA

NILM Metadata (where 'NILM' stands for 'non-instrusive load monitoring') is a metadata framework for describing appliances, meters, measurements, buildings and datasets.

Please jump in and add to or modify the schema and documentation!

Documentation

The documentation is available online.

If you're new to NILM Metadata then please read this README and then dive into the tutorial to find out how to see a worked example.

Or, if you are already familiar with NILM Metadata then perhaps you want direct access to the full description of the "Dataset metadata".

There are two sides to NILM Metadata:

1) A schema describing energy datasets

Modelled objects include:

electricity meters (whole-home and individual appliance meters)
- wiring hierarchy of meters
- a controlled vocabulary for measurement names
- description of pre-processing applied
- storage of pre-processed statistics
domestic appliances
- a controlled vocabulary for appliance names
- each appliance can contain any number of components (e.g. a light fitting can contain multiple lamps and a dimmer)
- a list of time periods when each appliance was active
- manufacturer, model, nominal power consumption etc.
a mapping of which appliances are connected to which meters
buildings
datasets

The metadata itself can be either YAML or JSON.

2) Central metadata

Common info about appliances is stored in NILM Metadata. This includes:

Categories for each appliance type
prior knowledge about the distribution of variables such as:
- on power
- on duration
- usage in terms of hour per day
- appliance correlations (e.g. that the TV is usually on if the games console is on)
valid additional properties for each appliance
mapping from country codes to nominal mains voltage ranges

The common info about appliances uses a simple but powerful inheritance mechanism to allow appliances to inherit from a other appliances. For example, laptop computer is a specialisation of computer and the two share several properties (e.g. both are in the ICT category). So laptop computer inherits from computer and modifies and adds any properties it needs. In this way, we can embrace the "don't repeat yourself (DRY)" principal by exploiting the relationship between appliances.

Python utilities

NILM Metadata comes with a Python module which collects all ApplianceTypes in central_metadata/appliance_types/*.yaml, performs inheritance and instantiates components and returns a dictionary where each key is an ApplianceType name and each value is an ApplianceType dict. Here's how to use it:

from nilm_metadata import get_appliance_types
appliance_types = get_appliance_types()

NILM Metadata also comes with a convert_yaml_to_hdf5() function which will convert a YAML instance of NILM Metadata to the HDF5 file format.

Research paper describing NILM metadata

The following paper describes NILM metadata in detail:

Jack Kelly and William Knottenbelt (2014). Metadata for Energy Disaggregation. In The 2nd IEEE International Workshop on Consumer Devices and Systems (CDS 2014) in Västerås, Sweden. arXiv:1403.5946 DOI:10.1109/COMPSACW.2014.97

Bibtex:

@inproceedings{NILM_Metadata,
title = {{Metadata for Energy Disaggregation}},
author = {Kelly, Jack and Knottenbelt, William},
year = {2014},
month = jul,
address = {V{\" a}ster{\aa}s, Sweden},
booktitle = {The 2nd IEEE International Workshop on Consumer Devices and Systems (CDS 2014)},
archivePrefix = {arXiv},
arxivId = {1403.5946},
eprint = {1403.5946},
doi = {10.1109/COMPSACW.2014.97}
}

Please cite this paper if you use NILM metadata in academic research. But please also be aware that the online documentation is more up-to-date than the paper.

JSON Schema has been depreciated

In version 0.1 of the schema, we wrote a very comprehensive (and complex) schema using JSON Schema in order to automate the validation of metadata instances. JSON Schema is a lovely language and can capture everything we need but, because our metadata is quite comprehensive, we found that using JSON Schema was a significant time drain and made it hard to move quickly and add new ideas to the metadata. As such, when we moved from v0.1 to v0.2, the JSON Schema has been dropped. Please use the human-readable documentation instead. If there is a real desire for automated validation then we could resurrect the JSON Schema, but it is a fair amount of work to maintain.

However, there are YAML validators freely available to make sure you are using the correct YAML format. For example: YAMLlint

Installation

If you want to use the Python package in order to concatenate the common appliance metadata then please run:

sudo python setup.py develop

Please do not use python setup.py install until I have updated setup.py to copy the relevant *.yaml files. See issue #6.

Related projects

Project Haystack, to quote their website, "is an open source initiative to develop tagging conventions and taxonomies for building equipment and operational data. We define standardized data models for sites, equipment, and points related to energy, HVAC, lighting, and other environmental systems." Haystack is an awesome project but it does not specify a controlled vocabulary for appliances, which is the meat of the nilm_metadata project. Where appropriate, nilm_metadata does use similar properties to Haystack (e.g. the "site_meter" property is borrowed directly from Haystack).
WikiEnergy "A Universe of Energy Data, Available Around the World".
sMAP metadata tags
- sMAP is Berkley's "Simple Measurement and Actuation Profile".

nilm_metadata's People

Contributors

Stargazers

Watchers

nilm_metadata's Issues

Writing a data converter - multiple files from 1 meter

I have a collection of data from a whole house meter in multiple files, with each file containing 2 columns containing, a timestamp string and an integer metric (active or reactive power). Each file contains only one type of metric and are differentiated by standardised names. Each file contains multiple days worth of 1 second granularity data, and there are many files covering a wider span of time. the files come in pairs for example
4-POWER_REAL_FINE 2013-11-20 Dump.csv <-> 5-POWER_REACTIVE_STANDARD 2013-11-20 Dump.csv
4-POWER_REAL_FINE 2013-11-28 Dump.csv <-> 5-POWER_REACTIVE_STANDARD 2013-11-28 Dump.csv
4-POWER_REAL_FINE 2013-12-01 Dump.csv <-> 5-POWER_REACTIVE_STANDARD 2013-12-01
I have already written the code to walk the directory tree and find the matching pairs.
I have followed the structure that has datasetname/building1/elec
Question 1 should I add another layer of ../meter1 to allow for the possibility in the future of having more meters so I can keep the files for each meter separate?
Question 2 is there any scheme for incrementally adding data, that should be included in a converter or would it always read in all the data and construct a new HDF5 file?

Add version number as we use in nilmtk

yaml error calling convert_yaml_to_hdf5

Can anybody tell me what is wrong with this section of yaml (starts at line 549 of building.yaml), 559 is the components line, error traceback given below from a call to convert_yaml_to_hdf5 I do know appliances don't yet contain a cassette deck but I don't think it got that far.

- original_name: HiFi
  description: Teac compact H500 separates, output 50 watts per channel into 8Ω (stereo),
    Total harmonic distortion 0.03%, Input sensitivity 2.8mV (MM), 180mV (line),
    Signal to noise ratio 67dB (MM), 95dB (line), Channel separation 65dB (line),
    Speaker load impedance 4Ω to 16Ω, Dimensions 285 x 131 x 319mm, Weight 7kg
  manufacturer: Teac
  brand: Reference 500
  type: audio system
  room: lounge
  meters: [1]
  components:
  - {type: CD player, model: PD-H500i, nominal_consumption: 40}
  - {type: audio amplifier, model: A-H500i, nominal_consumption: 500}
  - {type: radio, subtype: analogue, model: T-H500, nominal_consumption: 40}
  - {type: cassette deck, model: T-H500, nominal_consumption: 40}
  year_of_purchase: 1990 #approx
  dates_active: {start:  1990-02-01}

ParserError                               Traceback (most recent call last)
<ipython-input-1-352366a75b30> in <module>()
    132     df = df.sort_index()
    133     return df
--> 134 convert_gjw('C:/Users/GJWood/nilm_gjw_data',None)

<ipython-input-1-352366a75b30> in convert_gjw(gjw_path, output_filename, format)
    109             break # only 1 folder with .csv files at present
    110     store.close()
--> 111     convert_yaml_to_hdf5(join(gjw_path, 'metadata'),output_filename)
    112     print("Done converting gjw to HDF5!")
    113 

c:\users\gjwood\nilm_metadata\nilm_metadata\convert_yaml_to_hdf5.pyc in convert_yaml_to_hdf5(yaml_dir, hdf_filename)
     48         except:
     49             group = store._handle.get_node('/' + building)
---> 50         building_metadata = _load_file(yaml_dir, fname)
     51         elec_meters = building_metadata['elec_meters']
     52         _deep_copy_meters(elec_meters)

c:\users\gjwood\nilm_metadata\nilm_metadata\convert_yaml_to_hdf5.pyc in _load_file(yaml_dir, yaml_filename)
    100     if isfile(yaml_full_filename):
    101         with open(yaml_full_filename) as fh:
--> 102             return yaml.load(fh)
    103     else:
    104         print(yaml_full_filename, "not found.", file=stderr)

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\__init__.pyc in load(stream, Loader)
     69     loader = Loader(stream)
     70     try:
---> 71         return loader.get_single_data()
     72     finally:
     73         loader.dispose()

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\constructor.pyc in get_single_data(self)
     35     def get_single_data(self):
     36         # Ensure that the stream contains a single document and construct it.
---> 37         node = self.get_single_node()
     38         if node is not None:
     39             return self.construct_document(node)

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in get_single_node(self)
     34         document = None
     35         if not self.check_event(StreamEndEvent):
---> 36             document = self.compose_document()
     37 
     38         # Ensure that the stream contains no more documents.

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_document(self)
     53 
     54         # Compose the root node.
---> 55         node = self.compose_node(None, None)
     56 
     57         # Drop the DOCUMENT-END event.

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_node(self, parent, index)
     82             node = self.compose_sequence_node(anchor)
     83         elif self.check_event(MappingStartEvent):
---> 84             node = self.compose_mapping_node(anchor)
     85         self.ascend_resolver()
     86         return node

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_mapping_node(self, anchor)
    131             #    raise ComposerError("while composing a mapping", start_event.start_mark,
    132             #            "found duplicate key", key_event.start_mark)
--> 133             item_value = self.compose_node(node, item_key)
    134             #node.value[item_key] = item_value
    135             node.value.append((item_key, item_value))

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_node(self, parent, index)
     80             node = self.compose_scalar_node(anchor)
     81         elif self.check_event(SequenceStartEvent):
---> 82             node = self.compose_sequence_node(anchor)
     83         elif self.check_event(MappingStartEvent):
     84             node = self.compose_mapping_node(anchor)

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_sequence_node(self, anchor)
    109         index = 0
    110         while not self.check_event(SequenceEndEvent):
--> 111             node.value.append(self.compose_node(node, index))
    112             index += 1
    113         end_event = self.get_event()

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_node(self, parent, index)
     82             node = self.compose_sequence_node(anchor)
     83         elif self.check_event(MappingStartEvent):
---> 84             node = self.compose_mapping_node(anchor)
     85         self.ascend_resolver()
     86         return node

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\composer.pyc in compose_mapping_node(self, anchor)
    125         if anchor is not None:
    126             self.anchors[anchor] = node
--> 127         while not self.check_event(MappingEndEvent):
    128             #key_event = self.peek_event()
    129             item_key = self.compose_node(node, None)

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\parser.pyc in check_event(self, *choices)
     96         if self.current_event is None:
     97             if self.state:
---> 98                 self.current_event = self.state()
     99         if self.current_event is not None:
    100             if not choices:

c:\Users\GJWood\Anaconda\lib\site-packages\yaml\parser.pyc in parse_block_mapping_key(self)
    437             token = self.peek_token()
    438             raise ParserError("while parsing a block mapping", self.marks[-1],
--> 439                     "expected <block end>, but found %r" % token.id, token.start_mark)
    440         token = self.get_token()
    441         event = MappingEndEvent(token.start_mark, token.end_mark)

ParserError: while parsing a block mapping
  in "C:\Users\GJWood\nilm_gjw_data\metadata\building1.yaml", line 549, column 3
expected <block end>, but found '-'
  in "C:\Users\GJWood\nilm_gjw_data\metadata\building1.yaml", line 559, column 3

real power only

Hello,

I'd like to use NILMTK for my own input data from my smart plug which deliver a pulse for each 1Wh consumed and then I calculate the real power consumption.
So, I have only the real power consumption. My question is it is possible to use this toolkit and how? And about the metadata that I have to build for my own data, is there any tool to help building it?
I will be grateful if someone can help me.
Thank you very much.

please how can I import the redd.h5 metadata into Matlab

Documentation inconsistent in modeling multi-phase site meters

What should be assigned to submeter_of if you have a 3-phase main meter and you know to which specific line the submeters are connected?

The 3-phase main meter is modeled as the first three entries in the dict elec_meters, which are all site_meters.

In the tutorial (Chapter 1.1.1 Simple example in nilm_metadata/docs/source/tutorial.rst) every submeter is submeter_of one of this main meters, so either 1 or 2.
But in the documentation of ElecMeter (Chapter 2.4 in nilm_metadata/docs/source/dataset_metadata.rst) it says (even twice in a paragraph) "All non-site-meters directly downstream of the site meters should set submeter_of=0. Optionally also use phase to describe which phase this meter measures." In the metadata for REDD it's done this way, too, but without specifying the phase field.

So which is the right way to do it?

rename `max_sample_period`

it's unclear that it defines a gap.

maybe: gap_threshold

Make _load_file() public in convert_yaml_to_hdf5

The private function _load_file() which loads a yaml file into a dict would be quite useful in the CSVDataStore. Would it make sense to make it public, and maybe call it yaml_file_to_dict() or similar?

Issues with converting iawe metadata

Stacktrace

/home/nipun/git/nilmtk/nilmtk/dataset_converters/iawe/convert_iawe.pyc in convert_iawe(iawe_path, hdf_filename)
     69     store.close()
     70     convert_yaml_to_hdf5(join(_get_module_directory(), 'metadata'),
---> 71                          hdf_filename)
     72 
     73     print("Done converting iAWE to HDF5!")

/home/nipun/git/nilm_metadata/nilm_metadata/convert_yaml_to_hdf5.pyc in convert_yaml_to_hdf5(yaml_dir, hdf_filename)
     51         elec_meters = building_metadata['elec_meters']
     52         _deep_copy_meters(elec_meters)
---> 53         _set_data_location(elec_meters, building)
     54         _sanity_check_meters(elec_meters, meter_devices)
     55         _sanity_check_appliances(building_metadata)

/home/nipun/git/nilm_metadata/nilm_metadata/convert_yaml_to_hdf5.pyc in _set_data_location(elec_meters, building)
     85     for meter_instance in elec_meters:
     86         data_location = '/{:s}/elec/meter{:d}'.format(building, meter_instance)
---> 87         elec_meters[meter_instance]['data_location'] = data_location
     88 
     89 def _sanity_check_meters(meters, meter_devices):

building1.yaml

instance: 1   # this is the first building in the dataset
original_name: house_1   # original name from REDD dataset
elec_meters:
  1: &EM6400
    site_meter: true
    device_model: EM6400
  2: *EM6400
  3: &jplug
    submeter_of: 0 
    device_model: jplug
  4: *jplug
  5: *jplug
  6: *jplug
  7: *jplug
  8: *jplug
  9: *jplug
  10: *jplug
  11: &current cost

appliances:
- original_name: fridge
  type: fridge
  #floor: 0
  instance: 1
  meters: [3]

- original_name: air conditioner
  type: air conditioner
  instance: 1
  #floor: 1
  meters: [4]

- original_name: air conditioner
  type: air conditioner
  instance: 2
  #floor: 1
  meters: [5]

- original_name: washing machine
  type: washing machine
  instance: 1
  #floor: 1
  meters: [6]

- original_name: laptop computer
  #floor: 1
  type: computer   
  instance: 1
  meters: [7]

- original_name: clothes iron
  #floor: 1
  type: iron
  instance: 1  
  meters: [8]

- original_name: kitchen outlets
  instance: 1  
  meters: [9]

- original_name: television
  type: television
  instance: 1
  #floor: 0
  meters: [10]

- original_name: water filter
  type: water filter
  instance: 1
  meters: [11]

- original_name: water motor
  type: motor
  instance: 1
  meters: [12]

meter_devices.yaml

EM6400:
  model: EM6400
  manufacturer: Schneider Electric
  manufacturer_url: http://www.schneider-electric.com/
  description: >
    Multifunction meter for feeders
  sample_period: 1   # the interval between samples. In seconds.
  measurements:
  - physical_quantity: power   # power, voltage, energy, current?
    type: active   # active (real power), reactive or apparent?
  - physical_quantity: power   # power, voltage, energy, current?
    type: apparent   # active (real power), reactive or apparent?
  - physical_quantity: power   # power, voltage, energy, current?
    type: reactive   # active (real power), reactive or apparent?
  - physical_quantity: frequency   # power, voltage, energy, current?
  - physical_quantity: voltage   # power, voltage, energy, current?
  - physical_quantity: energy   # power, voltage, energy, current?
    type: apparent   # active (real power), reactive or apparent?
  - physical_quantity: power factor   # power, voltage, energy, current?
  - physical_quantity: phase angle   # power, voltage, energy, current?
  wireless: false

jplug:
  description: >
  sample_period: 1
  measurements:
  - physical_quantity: power
    type: apparent
    lower_limit: 0
  wireless: true

current cost:
  description: >
  sample_period: 6
  measurements:
  - physical_quantity: power   # power, voltage, energy, current?
    type: active   # active (real power), reactive or apparent?
  - physical_quantity: power   # power, voltage, energy, current?
    type: apparent   # active (real power), reactive or apparent?
  - physical_quantity: power   # power, voltage, energy, current?
    type: reactive   # active (real power), reactive or apparent?
  - physical_quantity: frequency   # power, voltage, energy, current?
  - physical_quantity: voltage   # power, voltage, energy, current?
  - physical_quantity: energy   # power, voltage, energy, current?
    type: apparent   # active (real power), reactive or apparent?
  - physical_quantity: power factor   # power, voltage, energy, current?
  - physical_quantity: phase angle   # power, voltage, energy, current?
  - physical_quantity: current   # power, voltage, energy, current?
    type: apparent   # active (real power), reactive or apparent?
  wireless: true

dataset.yaml

name: iAWE
long_name: Indian dataset for ambient, water and electricity sensing
creators:
- Batra, Nipun
- Gulati, Manoj
- Singh, Amarjeet
- Srivastava, Mani
publication_date: 2013
institution: Indraprastha Institute of Information Technology Delhi (IIITD)
contact: [email protected]
description: 73 days of ambient, water and electricity data for a home in Delhi
subject: First dataset from a developing country
number_of_buildings: 1
timezone: Asia/Kolkata
geo_location:
  locality: Delhi   # village, town, city or state
  country: IN  # standard two-letter country code defined by ISO 3166-1 alpha-2
  latitude: 28.64 # 
  longitude: 77.11
related_documents:
- http://iawe.github.io
- >
  Nipun Batra and Manoj Gulati and Amarjeet Singh and Mani Srivastava
  It's Different: Insights into home energy consumption in India.
  In proceedings of the 5th ACM Workshop On Embedded Systems 
  For Energy-Efficient Buildings (Buildsys 2013)
  http://nipunbatra.github.io/downloads/files/buildsys_2013.pdf
schema: https://github.com/nilmtk/nilm_metadata/tree/v0.2

CSVDataStore and Metadata Questions

I'd like to use NILMTK for my own input data from the smart plug. I've read some of the document but still confused about the CSVDataStore and Metadata.

My input data is in CSV file format and I intend to match it to the REDD dataset format and use the REDD converter to convert it to HDF5 format. Do I have to implement the CSVDatastore before running the REDD converter? And about the metadata that I have to build for my own data, is there any tool to help building it? Thank you very much.

Wizard to help users write metadata

Simple wizard to walk people through the writing of their metadata

Validation tool

add categories from Google Shopping schema

Not all objects in objects/ have google shopping categories associated with them yet.

Total Harmonic Distortion in MeterDevice

Hello

I have a site meter which generate samples of the Total Harmonic Distortion every second. I'd like to add it to the data set but the list of physical_quantity in MeterDevice has only {‘power’, ‘energy’, ‘cumulative energy’, ‘voltage’, ‘current’, ‘frequency’, ‘power factor’, ‘state’, ‘phase angle’}.

How I should deal with it?

Regards,

Hader

Appliance, Appliance Type and Prior metadata usage questions

see also issue #19. and nilmtk issue #413 "Storing appliance fingerprints/signatures/priors needs definition". Note paragraph numbers should be 1.2.3.4 below
I am trying to work out how to use metadata in conjunction with Hart'85 training and disaggregation, this has led to a closer examination of the metadata definitions to save reinventing any wheels. I have the following questions

Why is dominant appliance required, this would seem logical for sub-meter aimed at measuring a particular appliance, but inappropriate for a whole house meter where you may not know what the dominant appliance is, or it may change with time

dominant_appliance: 
(boolean) (required if multiple appliances attached to one meter). Is this appliance responsible for most of the power demand on this meter?

Usage of "on_power_threshold" and "nominal_consumption", (see definitions below) Is it intended that "on_power_threshold" be used to hold a more accurate (measured) on power value for the appliance for use in matching with training results as opposed to the manufacturer's view?

on_power_threshold:
(number) watts. Not required. Default is taken from the appliance type. The threshold (in watts) used to decide if the appliance is on or off.

nominal_consumption:    
(dict): Specifications reported by the manufacturer.

on_power:   (number) active power in watts when on.
standby_power:  (number) active power in watts when in standby.
energy_per_year:
    (number) kWh per year
energy_per_cycle:
    (number) kWh per cycle

Priors are defined for Appliance Types, rather than for Appliances, which is OK for the way they are defined in terms of statistical relationships between appliance types and between appliance types and general usage patterns. However if my interpretation of "on_power_threshold" above is correct, then this is in fact being used as a defacto appliance specific prior and it would be better to define this in a way that could be used by the disaggregation algorithms, and call it a Prior or Signature or Fingerprint and allow Appliances to link to Priors in a similar way to Appliance Types.
As discussed in nilmtk issue #413 a set of the steady states of power usage of an appliance could constitute a simple Prior or Signature or Fingerprint - to avoid confusion it might be better to call it a Signature rather than a Prior, this would keep the separation between central and dataset metadata. The definition of a Signature might look like this if part of appliance data

signature: (list of dicts) #steady_states this appliance can have
steady_state: (dict of power_values) #one or more of the power characteristics of this state
active: (number) #active power in Watts
reactive:   (number) #reactive power in VAR
apparent:   (number) #apparent power in VA
#so we might have
- original_name: Kettle
  description: stainless steel 2.5-3kw Hinged locking lid ; 360 degree base; One Cup Indicator
  manufacturer: Philips
  model: HD4671
  type: kettle
  room: kitchen
  meters: [1] 
  nominal_consumption: {on_power: 3000}
  on_power_threshold: 3000
  instance: 1
  signature: {active: 2900, reactive:0}

A signature may need to specify a tolerance or distribution of values as discussed for Priors, and a duration range for the steady state.

add categories and distributions from "Statistical Review of UK Residential Sector Loads"

http://arxiv.org/abs/1306.0802

CSVDataStore and Metadata Questions

I'd like to use NILMTK for my own input data from the smart plug. I've read some of the document but still confused about the CSVDataStore and Metadata.

My input data is in CSV file format and I intend to match it to the IAWE dataset format and use the IAWE converter to convert it to HDF5 format. Do I have to implement the CSVDatastore before running the IAWE converter? And about the metadata that I have to build for my own data, is there any tool to help building it? I have 2 buildings to process. Thank you very much.

Range of nominal_consumption?

Several of my appliance rating plates now have a range of nominal consumption, for example 1350 watt - 1600 watt, I don't think the metadata allows for this. If this represents the limits of range of on_power consumption possible from the device it could be quite useful when working without secondary meters.

`python setup.py install` doesn't copy over central metadata

Oli emailed this bug report:

Hi Jack,

I'm trying to write the metadata for wikienergy at the moment, but have hit a problem when I try to read in the building1.yaml metadata for my dataset. When I don't include building1.yaml in the metadata, things seem to work just fine. However, when I do include it, it generates this stacktrace:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    176             else:
    177                 filename = fname
--> 178             __builtin__.execfile(filename, *where)

/home/oli/nilmtk/nilmtk/dataset_converters/wikienergy/test_dataset_wikienergy.py in <module>()
     10 print 'testing queries over single table'
     11 houses = {26:('2014-05-02','2014-05-03')}
---> 12 download_wikienergy(database_username, database_password, output_directory, periods_to_load=houses)
     13 print('')
     14 '''

/home/oli/nilmtk/nilmtk/dataset_converters/wikienergy/download_wikienergy.py in download_wikienergy(database_username, database_password, hdf_filename, periods_to_load)
    206
    207     convert_yaml_to_hdf5(join(_get_module_directory(), 'metadata'),
--> 208                          hdf_filename)
    209
    210 def _wikienergy_dataframe_to_hdf(wikienergy_dataframe, store, nilmtk_building_id):

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/convert_yaml_to_hdf5.pyc in convert_yaml_to_hdf5(yaml_dir, hdf_filename)
     53         _set_data_location(elec_meters, building)
     54         _sanity_check_meters(elec_meters, meter_devices)
---> 55         _sanity_check_appliances(building_metadata)
     56         group._f_setattr('metadata', building_metadata)
     57

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/convert_yaml_to_hdf5.pyc in _sanity_check_appliances(building_metadata)
    105     """
    106     appliances = building_metadata['appliances']
--> 107     appliance_types = get_appliance_types()
    108     building_instance = building_metadata['instance']
    109

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/object_concatenation.pyc in get_appliance_types()
     12     recursively resolved.
     13     """
---> 14     appliance_types_from_disk = get_appliance_types_from_disk()
     15     appliance_types = _concatenate_all_appliance_types(appliance_types_from_disk)
     16     return appliance_types

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/file_management.pyc in get_appliance_types_from_disk()
      8
      9 def get_appliance_types_from_disk():
---> 10     obj_filenames = _find_all_appliance_type_files()
     11     obj_cache = {}
     12     for filename in obj_filenames:

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/file_management.pyc in _find_all_appliance_type_files()
     20 def _find_all_appliance_type_files():
     21     filenames = _find_all_files_with_suffix('.yaml',
---> 22 _get_appliance_types_directory())
     23     return filenames
     24

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/file_management.pyc in _get_appliance_types_directory()
     25
     26 def _get_appliance_types_directory():
---> 27     return _path_to_directory('..', 'central_metadata', 'appliance_types')
     28
     29

/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/file_management.pyc in _path_to_directory(*args)
     43 def _path_to_directory(*args):
     44     path_to_directory = join(_get_module_directory(), *args)
---> 45     assert isdir(path_to_directory)
     46     return path_to_directory
     47

AssertionError:
> /usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/file_management.py(45)_path_to_directory()
     44     path_to_directory = join(_get_module_directory(), *args)
---> 45     assert isdir(path_to_directory)
     46     return path_to_directory

so it seems like nilm-metadata is breaking while asserting that the provided path is in fact a directory. Debugging a little further:

ipdb> path_to_directory
'/usr/local/lib/python2.7/dist-packages/nilm_metadata-0.2.0-py2.7.egg/nilm_metadata/../central_metadata/appliance_types'

and it turns out that the central_metadata directory doesn't exist beside the nilm_metadata directory. I downloaded the central_metadata from github which seems to have solved the problem. Is this something that should have copied over automatically when I did 'sudo python setup.py install' for nilm-metadata?

Sorry for the long email. Feel free to call me if it's easier!
Oli

Add vocabulary for phase, power factor etc

See @nipunreddevil's dataset metadata in nilmtk

Microgeneration

Schema doesn't cover microgeneration. It needs to ;) Any thoughts?

Schema item ApplianceModel could be confusing

Given that Appliances have models, and ApplianceModel refers to the disaggregation model or method being use it might be better named DisaggregationModel or DisaggregationMethod to improve the clarity of the schema.

Testing my own dataset!

Hi, i have my own aggregate data set, which i want to test in NILMTK. I'm not quite sure how to convert this data to the appropriate format. My Data Looks like this

Reduce coupling with HDF5

It seems like the metadata project has a close coupling with the HDF5 format. Shouldn't it be agnostic of the data format, and just deal with a NILMTK DataStore? Each DataStore has a save_metadata() function which should deal with writing metadata, rather than requiring a convert_yaml_to_ function for each type of DataStore.

Define defaults to make metadata easier to write for common cases

Consider defining metadata defaults e.g if submeter_of isn't set then default to 0. And meter devices could have a "default_for_upstream" setting which would mean that individual meters wouldn't have to specify which MeterDevice they are an instance of.

Harmonic Current in MeterDevice

Hello

I have a site meter which generates samples of the 1st, 3rd, 5th, and 7th harmonic current. I would like to add it to the dataset but the list of physical_quantity in MeterDevice has only ['power', 'energy', 'cumulative energy', 'voltage', 'current', 'pf', 'frequency' power factor ',' state ',' phase angle ',' total harmonic distortion of current '].

How should I deal with it?

For now, I have added locally to nilmtk.measurement.PHYSICAL_QUANTITIES

['' 1 harmonic current '', '' 3 harmonic current '', '' 5 harmonic current '', '' 7 harmonic current '']

It is similar to # 25. @JackKelly

Regards,

Hader

Think about making metadata flatter

Some problems with the present design : the metadata is hard to navigate manually; what happens if we want to combine objects from multiple houses (e. G. With group by); also we have this really ugly problem where we repeat meter metadata even though most meters are the same (we have to do this because we can't inherit from objects within the actual metadata).

Each object (meter, appliance, building etc) would have an ID like "/REDD/building1/utility/electric/fridge1" we could also have "/UK-DALE/prototypes/meters/CurrentCostEnviR" which each instance in "/UK-DALE/buildingX/utility/electric/meterX" would reference (not inherit from)

one advantage of the flat setup is that dataset objects no longer need to be aware of the 'type' of the objects they contain. Instead they are completely ignorant of what they 'contain' and instead the contained items just reference the container.

Advantages of splitting metadata into smaller chunks:- easier for humans to parse. Don't want to have to scroll up and down a long way; want entire object to fit on screen. Also means schema for objects can be totally ignorant of objects down the hierarchy. The problem is that we have two representations: a hierarchical representation in memory and a flattish repr on disk. But maybe that's fine. The repr on disk isn't actually flat (because we use lots of refs)

get rid of manufacturer data from NILM Metadata; i.e. appliance metadata in dataset just uses appliance name as key and then supplies all model-related info inplace.

- type: dataset
  id: /REDD
  name: Reference Energy Disaggregation Dataset

- type: building
  id: /REDD/buildings/1
  geo_location: London

- type: electric
  id: /REDD/buildings/1/electric

- type: appliance
  id: /REDD/buildings/1/electricity/appliances/fridge,1
  manufacturer: Bosch
  meters: [4, 5]

or:

/REDD:
  type: dataset
  name: Reference Energy Disaggregation Dataset

/REDD/buildings/1:
  type: building
  geo_location: London

/REDD/buildings/1/electric
  type: electric

/REDD/buildings/1/electricity/appliances/fridge,1:
  type: appliance # or appliance/fridge? or fridge?
  manufacturer: Bosch
  meters: [4, 5]

or:

# The semantics will say that all objects in the root are dataset objects
/REDD:
  name: Reference Energy Disaggregation Dataset

# The semantics know how to interpret the slashes,
# (i.e. as a hierarchy) and know that 'buildings' contains
# building objects
 /REDD/buildings:
  1:
    n_occupants: 5

/REDD/buildings/1/rooms:
  kitchen,1:
  lounge,1:
  bedroom,1:
    description: master bedroom
  bedroom,2:
    description: eldest child's bedroom
  study,1:
    description: also used as a spare bedroom

/REDD/manufacturers:
  CurrentCost:
    url: 
    contact:

/REDD/meters:
  EnviR:
    manufacturer: /REDD/manufacturers/CurrentCost
    sample_rate: 6

/REDD/buildings/1/utilities/electric/meters:
  1: 
    parent: /REDD/meters/EnviR
    site_meter: true

  2: 
    parent: /REDD/meters/EnviR
    submeter_of: 1

/REDD/buildings/1/utilities/electric/appliances:

  fridge,1: # using the appliance name as the key will make it more readable
    meters: [5]
    room: kitchen

  television,1:
    meters: [6]
    room: kitchen
    manufacturer: blah
    model: foo

  television,2:
    parent: /REDD/buildings/1/electric/appliances/fridge,1
    meters: [7]
    room: bedroom,1

priors could be separate objects . e.g.

# priors.yaml
---!Priors
- subject: fridge,1
  variable: on power
  dataset: REDD # optional
  data: foo
  model: blah

- subject: fridge
  variable: on power
  data: bar
  model: foo

pros and cons? One pro is that we can easily keep 'appliance label data' (manufacturer etc) separate from measured data (priors).

Also, perhaps we should use separate files for different parts of the dataset e.g. 'redd.yaml', 'redd_building1.yaml', 'redd_building2.yaml' etc. But the software shouldn't enforce the filenames. Instead it loads all the files in the 'metadata' folder.

Concatenation is ugly. Let's stop doing that! Instead we look up category later.

Meter metadata: should be shipped with the dataset YAML file. There's little (no?) overlap in meters used in the datasets I'm aware of.

General questions on entering data

Moved from @gjwo's comment on the UK-DALE metadata issue queue: JackKelly/UK-DALE_metadata#2 (comment)

-components is confusing, does this refer to any electric device, or to the base components described in https://github.com/nilmtk/nilm_metadata/blob/v0.2/central_metadata/appliance_types/components.yaml

In order to be consistent would this be an appropriate appliance template for me to use
appliances:

original_name: description: manufacturer: type: subtype: room:
meters: [1] nominal_consumption: on_power_threshold: instance: 1 year_of_purchase: dates_active:

Questions
what is dominant_appliance: true for?
what about portable appliances?
what's the difference between nominal_consumption and on_power_threshold?
why no model or model number tag - could be significant as signatures become available
I would put the building related data first, then rooms, then meters then appliances i.e. least likely to change first, any reason not to do this?

Metadata file formats

At present, NILM Metadata uses YAML to store metadata. I've been doing some research on alternative formats. (It's quite likely that we'll stick with YAML, though).

CSV

A lot of NILM Metadata is tabular. For example, for each appliance, we need to know the dataset_id and building_id the appliance belongs in, we need to know the appliance_type (fridge, toaster etc) and appliance_instance. This type of tabular data can be stored in YAML but CSV is considerably more efficient at storing tabular data.

CSV is rather unfashionable at the moment but there is some really interesting work on making CSV a better format. For example, Jeni Tennison at the ODI wrote a blog post on "2014: The Year of CSV". To quote her blog:

There is a sweet spot between developer-friendly,
unexpressive CSV and developer-hostile, expressive Excel.

Formats such as the Simple Data Format (SDF) developed by OKF and the
DataSet Publishing Language (DSPL) developed by Google sit in that sweet spot.
They both define how to package CSV files with a metadata description (in JSON or
in XML) that describes the content of the CSV files, and how they relate to each other.

Formalising and standardising the sweet spot is the role of the CSV on the Web Working Group
which I am co-chairing with Dan Brickley. "

Advantages of CSV for NILM Metadata:

Simple, well supported format. Lots of software can load CSV (Python, R, MATLAB, Excel, DBMSes etc)
More space-efficient than JSON/YAML for tabular data
Makes validation very easy (just load it into Pandas with objects specified for each column and the loader will complain if wrong types).
We're thinking of storing metadata in tables within NILMTK anyway, so we could minimise the amount of code we need to write to import/export metadata if the on-disk format mimicks the in-memory format.

Disadvantages of CSV for NILM Metadata:

No especially elegant way to add extra fields for appliances. Could have an 'extra fields' string column and specify data using YAML/JSON. Or could have additional tables, one for each extra field.
No elegant way to represent lists (e.g. list of meters for appliance). Either use YAML/JSON in a field or have a separate table mapping from appliances to meters.
CSV is not as human-readable as YAML when using a simple text editor.
Lots of CSV 'dialects'. Although we could just standardise on RFC4180
some of our metadata definitely isn't a good fit for CSV. For example, describing the top-level dataset information would be pretty ugly to represent with CSV. So, if we did use CSV for some of our metadata, we'd still probably want to use YAML/JSON for other bits. Which perhaps adds complexity. Although there is something to be said for using the most appropriate format for each type of metadata.

HDF5

HDF5 can easily store metadata. We use HDF5 for storing meter data in NILMTK but we've avoided storing metadata in the HDF5.

Advantages of using HDF5 for metadata:

It works very nicely with Python. I'm pretty sure we can store any picklable Python object in HDF5. Modifying specific attributes should be easy.
If the data are stored in HDF5 then we can store everything (the metadata and data) together in a single file (although there are situations where this might actually be a disadvantage).

Disadvantages of using HDF5 for metadata:

HDF5 is not as easy to read/edit as simple text formats like YAML/JSON/CSV, nor is it as well supported by software packages (try loading HDF5 into Excel!).

XML

XML doesn't map well to in-memory data structures. It's also extremely verbose. It does have a mature schema definition language though. I'm not a big fan of XML.

Conclusions

I think the issue of "CSV vs YAML" boils down to one main question: should we prioritise human-readability over machine-readability? I think we definitely should.

CSV, even though it can be loaded by any spreadsheet program, might not be especially human-readable for NILM Metadata. The reason is that CSV doesn't support lists in fields, so we need to either use multiple tables (which makes it harder to humans to parse). Or maybe it doesn't. I'll do some experiments...

Proposal for a 'simple' NILM Metadata schema

NILM Metadata tries to make it possible to capture pretty much any conceivable scenario. But, as more datasets become available, it appears that a large proportion of datasets could be described using a simpler metadata schema. It would be great to discuss the design of a "Simple NILM Metadata" schema which could exist along side "NILM Metadata". Perhaps CSV is even easier to read than YAML in Matlab, Java etc so it might be nice if we can use CSV. We'd have a check-list to help people decide whether they require the full expressive power of "NILM Metadata" or if they can get by with "Simple NILM Metadata".

The simple schema could also be used for adding metadata to the output of disaggregation algorithms (hence helping to simplify NILMTK disaggregation algorithm implementation); and for describing the training dataset and the responses for any future NILM competition or validation tool (I'm working with a group of MSc students who aim to produce a proof-of-concept NILM validation tool by the end of this term; here's the project spec.)

So, here's an initial proposal, using REDD as an example:

building1_labels.csv

This looks a little like labels.dat in the REDD format except that:

we use a comma as a separator (which is standard for CSV, and also allows us to use spaces in strings without using quotes)
we use the file suffix csv not dat (so that spreadsheet applications know how to open the file)
we use our NILM Metadata controlled vocabulary for appliance names
we give an instance number for each appliance
if there are multiple appliances measured by a meter then separate them by a semicolon e.g. 6,television#1;light#1
we could optionally use a third column to specify the submeter_of property. If this is not specified then we assume that anything that isn't a site meter is downstream of all site meters, and that all site meters should be summed to get the total whole-house power demand. Or maybe we should keep "Simple NILM Metadata" as simple as possible and say that any non-standard wiring hierarchy simply cannot be expressed using "Simple NILM Metadata"?

meter instance, label

1,site meter
2,site meter
3,electric oven#1
4,electric oven#1
5,fridge#1

meter_devices.csv

We also need to specify what is measured in each data file. In NILM Metadata this is done in meter_devices.yaml. In "Simple NILM Metadata" this could be done in a meter_devices.csv files. The file would contain three columns; each row would be a <meter device name>,<key>,<value> tuple. e.g.:

meter device name,key,value

site meters,sample period,1
site meters,measurements,active power;apparent power
submeters,sample period,3
submeters,measurements,active power
submeters,model,eMonitor
submeters,manufacturer,Powerhouse Dynamics

The assumption would be that all meters with the label site meter would take attributes from site meters and all other meters would take attributes from submeters. If this is not the case (e.g. if there are several types of submeter) then we could do the following (and we'd only have to specify this for the meters for which the default assumption does not hold).

meter_devices_mapping.csv

building instance,meter instance,meter device name

1,1,Current Cost
1,2,SCPM

Any thoughts? If you use Matlab / Java / Scala / Julia / C++ etc, would you find it easier to load metadata described using CSV files rather than YAML files? If you maintain a dataset, is there anything in your own dataset that the proposal above cannot express?

Missing control components

When considering components, I think you are missing some components that are incorporated in many appliances and radically effect the fingerprints produced by those components, namely the control components. These should be in the components module and built into common combinations for use into components. Some of these may may have built into appliances at a less fundamental level. The main ones are

Thermostat - may be paired with heating elements
Speed controller - may be paired with motors
Dimmer/Rheostat - may be paired with lights or heating elements
Timers

There may also be sub classes of these such as

discretely variable, i.e. a fixed number of stepped settings (3 fan speed for example)
continuously variable, i.e. infinite possible settings such as a lamp dimmer

This would then give you

variable speed motors - such as a washing machine drum motor
fixed speed motors - such as pumps

etc.