GithubHelp home page GithubHelp logo

speechbrain / hyperpyyaml Goto Github PK

View Code? Open in Web Editor NEW
47.0 5.0 18.0 50 KB

Extensions to YAML syntax for better python interaction

License: Apache License 2.0

Python 100.00%
yaml python speechbrain

hyperpyyaml's Introduction

Introduction

A crucial element of systems for data-analysis is laying out all the hyperparameters of that system so they can be easily examined and modified. We add a few useful extensions to a popular human-readable data-serialization language known as YAML (YAML Ain't Markup Language). This provides support for a rather expansive idea of what constitutes a hyperparameter, and cleans up python files for data analysis to just the bare algorithm.

Table of Contents

Security note

Loading HyperPyYAML allows arbitrary code execution. This is a feature: HyperPyYAML allows you to construct anything and everything you need in your experiment. However, take care to verify any untrusted recipes' YAML files just as you would verify the Python code.

YAML basics

YAML is a data-serialization language, similar to JSON, and it supports three basic types of nodes: scalar, sequential, and mapping. PyYAML naturally converts sequential nodes to python lists and mapping nodes to python dicts.

Scalar nodes can take one of the following forms:

string: abcd  # No quotes needed
integer: 1
float: 1.3
bool: True
none: null

Note that we've used a simple mapping to demonstrate the scalar nodes. A mapping is a set of key: value pairs, defined so that the key can be used to easily retrieve the corresponding value. In addition to the format above, mappings can also be specified in a similar manner to JSON:

{foo: 1, bar: 2.5, baz: "abc"}

Sequences, or lists of items, can also be specified in two ways:

- foo
- bar
- baz

or

[foo, bar, baz]

Note that when not using the inline version, YAML uses whitespace to denote nested items:

foo:
    a: 1
    b: 2
bar:
    - c
    - d

YAML has a few more advanced features (such as aliases and merge keys) that you may want to explore on your own. We will briefly discuss one here since it is relevant for our extensions: YAML tags.

Tags are added with a ! prefix, and they specify the type of the node. This allows types beyond the simple types listed above to be used. PyYAML supports a few additional types, such as:

!!set                           # set
!!timestamp                     # datetime.datetime
!!python/tuple                  # tuple
!!python/complex                # complex
!!python/name:module.name       # A class or function
!!python/module:package.module  # A module
!!python/object/new:module.cls  # An instance of a class

These can all be quite useful, however we found that this system was a bit cumbersome, especially with the frequency with which we were using them. So we decided to implement some shortcuts for these features, which we are calling "HyperPyYAML".

HyperPyYAML

We make several extensions to yaml including easier object creation, nicer aliases, and tuples.

Objects

Our first extension is to simplify the structure for specifying an instance, module, class, or function. As an example:

model: !new:collections.Counter

This tag, prefixed with !new:, constructs an instance of the specified class. If the node is a mapping node, all the items are passed as keyword arguments to the class when the instance is created. A list can similarly be used to pass positional arguments. See the following examples:

foo: !new:collections.Counter
  - abracadabra
bar: !new: collections.Counter
  a: 2
  b: 1
  c: 5

We also simplify the interface for specifying a function or class or other static Python entity:

add: !name:operator.add

This code stores the add function. It can later be used in the usual way:

>>> loaded_yaml = load_hyperpyyaml("add: !name:operator.add")
>>> loaded_yaml["add"](2, 4)
6

Aliases

Another extension is a nicer alias system that supports things like string interpolation. We've added a tag written !ref that takes keys in angle brackets, and searches for them inside the yaml file itself. As an example:

folder1: abc/def
folder2: ghi/jkl
folder3: !ref <folder1>/<folder2>

foo: 1024
bar: 512
baz: !ref <foo> // <bar> + 1

This allows us to change some values and automatically change the dependent values accordingly. You can also refer to other references, and to sub-nodes using brackets.

block_index: 1
cnn1:
    out_channels: !ref <block_index> * 64
    kernel_size: (3, 3)
cnn2: 
    out_channels: !ref <cnn1[out_channels]>
    kernel_size: (3, 3)

Finally, you can make references to nodes that are objects, not just scalars.

yaml_string = """
foo: !new:collections.Counter
  a: 4
bar: !ref <foo>
baz: !copy <foo>
"""
loaded_yaml = load_hyperpyyaml(yaml_string)
loaded_yaml["foo"].update({"b": 10})
print(loaded_yaml["bar"])
print(loaded_yaml["baz"])

This provides the output:

Counter({'b': 10, 'a': 4})
Counter({'a': 4})

Note that !ref makes only a shallow copy, so updating foo also updates bar. If you want a deep copy, use the !copy tag.

There are some issues (#7 #11) mentioning that !ref cannot refer to the return value of !apply function. Thus we provide another !applyref tag to work with !ref, which can be used in four ways:

# 1. Pass the positional and keyword arguments at the same time. Like `!!python/object/apply:module.function` in pyyaml
c: !applyref:sorted
    _args: 
        - [3, 4, 1, 2]
    _kwargs:
        reverse: False
d: !ref <c>-<c>

# 2. Only pass the keyword arguments
e: !applyref:random.randint
    a: 1
    b: 3
f: !ref <e><e>

# 3. Only pass the positional arguments
g: !applyref:random.randint
    - 1
    - 3
h: !ref <g><g>

# 4. No arguments
i: !applyref:random.random
j: !ref <i><i>

Note that !applyref cannot return an object, otherwise the RepresenterError will be raised.

Tuples

One last minor extension to the yaml syntax we've made is to implicitly resolve any string starting with ( and ending with ) to a tuple. This makes the use of YAML more intuitive for Python users.

How to use HyperPyYAML

All of the listed extensions are available by loading yaml using the load_hyperpyyaml function. This function returns an object in a similar manner to pyyaml and other yaml libraries. Also, load_hyperpyyaml takes an optional argument, overrides which allows changes to any of the parameters listed in the YAML. The following example demonstrates changing the out_channels of the CNN layer:

>>> yaml_string = """
... block_index: 1
... cnn1:
...   out_channels: !ref <block_index> * 64
...   kernel_size: (3, 3)
... cnn2: 
...   out_channels: !ref <cnn1[out_channels]>
...   kernel_size: (3, 3)
... """
>>> overrides = {"block_index": 2}
>>> with open("hyperparameters.yaml") as f:
...    hyperparameters = load_hyperpyyaml(f, overrides)
>>> hyperparameters["block_index"]
2
>>> hyperparameters["cnn2"]["out_channels"]
128

Conclusion

We've defined a number of extensions to the YAML syntax, designed to make it easier to use for hyperparameter specification. Feedback is welcome!

hyperpyyaml's People

Contributors

adel-moumen avatar aqzlpm11 avatar gaetanlepage avatar gastron avatar matln avatar pplantinga avatar weiwei-ww avatar xin-w8023 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hyperpyyaml's Issues

Unexpected behaviour when using references with the results of !apply

The results of !apply behave differently from ordinary, fixed variables when used inside of references. It appears that what is being resolved in the reference is the arguments to !apply and not the result

Given below is a YAML example to illustrate this.

YAML:

a: "a"
b: !apply:operator.add
  - "b"
  - "c"
c: !apply:lib.test
d: !ref <a>-<b>-<c>

Python:

test.py:

from hyperpyyaml import load_hyperpyyaml

yaml = """
a: "a"
b: !apply:operator.add
  - "b"
  - "c"
c: !apply:lib.test
d: !ref <a>-<b>-<c>
"""

yaml_dict = load_hyperpyyaml(yaml)
print(yaml_dict)

lib.py:

def test():
        return "test"

Shell:

python test.py

Expected Output:

{'a': 'a', 'b': 'bc', 'c': 'test', 'd': "a-bc-test"}

Actual Output:

{'a': 'a', 'b': 'bc', 'c': 'test', 'd': "a-['b', 'c']-"}

Float gets parsed as string

Hi, example

import os
from hyperpyyaml import load_hyperpyyaml

def test(val):
    if os.path.exists('bla'):
        os.remove('bla')
    fh = open('bla', 'a+')
    fh.write('bla: 0.0001\n')
    fh.seek(0)
    dct = {'bla': val}
    params = load_hyperpyyaml(fh, overrides=dct)
    print(type(params['bla']))

def main():
    test(1e-4)
    test(1e-6)

main()

prints

<class 'float'>
<class 'str'>

can't figure out why this is happening myself..?

overrides not working properly for load_hyperpyyaml with !include

Suppose that I have the following two YAML files:

f1.yaml:

k1: v1
k2: !include:f2.yaml

f2.yaml

k3: v3

With no overrides, after calling load_hyperpyyaml, we should get {k1: v1, k2: {k3: v3}}.

When load_hyperpyyaml is called with overrides={k1: new_v1}, the expected result should be {k1: new_v1, k2: {k3: v3}} (i.e. with v1 replaced by new_v1). However, the actual result is {k1: new_v1, k2: {k3: v3, k1: new_v1}}, with an additional unexpected key inserted to k2.

Could not `!ref` a `!apply` value

Hi,
I found that because of the !ref tag was replaced before the !apply function was evaluated. The aliases system will fail when use !apply values in !ref. For example,

yaml_string = """
apply_date: !apply:time.strftime ["%Y-%m-%d"]
apply_time: !apply:datetime.datetime.now
apply_sum: !apply:sum
    - [1, 2]
 
output: !ref <apply_date>/<apply_sum>/<apply_time>/some_folder
date: !ref <apply_date>/some_folder
time: !ref <apply_time>/some_folder
sum: !ref <apply_sum>/some_folder
"""
load_hyperpyyaml(yaml_string)

will result in

{
 'apply_date': '2021-08-31',
 'apply_sum': 3,
 'apply_time': datetime.datetime(2021, 8, 31, 14, 46, 32, 46037),
 'date': "['%Y-%m-%d']/some_folder",
 'output': "['%Y-%m-%d']/[[1, 2]]//some_folder",
 'sum': '[[1, 2]]/some_folder',
 'time': '/some_folder'
}

The values in !ref will be replaced by the parameters passed to !apply or being empty.

`!applyref` two functions at a time

Hi,

Thanks for creating this config tool. I have an use case like this

In Python, I can easily call 2 function at a time like

import datetime
a = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

How can I do that in YAML file? I have tried

a: !applyref:datetime.datetime.now.strftime
    format: "%Y-%m-%d_%H-%M-%S"

b: !ref <a><a>

but it doesn't work.

Any helps are appreciated. Thank you!!

hyperpyyaml sometimes loads scientific e notation as string

When calling hparams=load_hyperpyyaml("a: 1e-6"), hparams["a"] is a string. This can be avoided by calling hparams=load_hyperpyyaml("a: 1.0e-6") instead.

This would not be much of an issue, but when using operations on references in the yaml such as

hparams=load_hyperpyyaml(
"""
a: 0.0001
b: 10.0
c: !ref <a> / <b>
"""
)

which results in hparams["c"]='1e-5' being a string, the user has no control. Hyperpyyaml resolves the references and the devision as 1e-5 and not 1.0e-5 which results in a string and not a float.

Cannot include documents with object `!ref`s

Right now, the !include tag first resolves references in the included document (turning objects into anchors like *id001) the names of which collide with any references in the original document.

tuple with `!ref` tag

Hi,

I am trying to use !ref tag inside a tuple but it fails.

load_hyperpyyaml(
"""
a: 10
b: (5, !ref <a>)
""")

gives the following error:

ConstructorError: could not determine a constructor for the tag '!ref'
  in "<unicode string>", line 1, column 5:
    [5, !ref <a>]
        ^

Overriding any value get parsed as a string

A boolean, float, int in a YAML file gets parsed as its type. But if it's overridden, it is parsed as a string.

File test.yaml

value_flt: 0.1234
value_int: 123
value_bool: True

File test.py

from hyperpyyaml import load_hyperpyyaml
from argparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument('yaml_file')
parser.add_argument('--overrides', nargs='+')
parsed_args, unknown = parser.parse_known_args()


params_file = parser['yaml_file']
o_iter = iter(parser.overrides)
overrides = dict(zip(o_iter, o_iter))

 with open(params_file) as test_file:
     params = load_hyperpyyaml(test_file, overrides)
     print(type(params['value_flt']))
     print(type(params['value_int']))
     print(type(params['value_bool']))

When run with $python test.py test.yaml it produces the following output:

<class 'float'>
<class 'int'>
<class 'bool'>

But running it with overrides such as $python test.py test.yaml --overrides value_flt '3.14' value_int 256 value_book False I get the following:

<class 'str'>
<class 'str'>
<class 'str'>

Support new version of ruamel.yaml

Hello !

I would like to know if you plan to support/accept the latest version of ruamel.yaml (currently 0.17.32).
The motivation for this demand is to repair the speechbrain package in nixpkgs. Indeed, in this repo ruamel.yaml has been updated and has caused hyperpyyaml to be broken. In this context, it is not possible to use an older version of ruamel.yaml.

Of course, this is a very niche demand and only a few users are affected.
If there is a good reason for freezing ruamel.yaml, it is surely better to keep it like this.

Relavant PR: NixOS/nixpkgs#248109

Saving config files with !new tags

Hello,

I have the following code (a MWE - minimum working example), where I want to dump a config with a !new instatiation.

yaml_string = """
foo: !new:collections.Counter
  a: 4
bar: !ref <foo>
baz: !copy <foo>
"""
loaded_yaml = load_hyperpyyaml(yaml_string)
from hyperpyyaml import dump_hyperpyyaml
# Importing the StringIO module.
from io import StringIO

dump_hyperpyyaml(loaded_yaml, StringIO())

This fails with the following

RepresenterError                          Traceback (most recent call last)

[<ipython-input-6-c8804319bfb8>](https://localhost:8080/#) in <cell line: 14>()
     12 from io import StringIO
     13 
---> 14 dump_hyperpyyaml(loaded_yaml, StringIO())

9 frames

[/usr/local/lib/python3.10/dist-packages/ruamel/yaml/representer.py](https://localhost:8080/#) in represent_undefined(self, data)
    343 
    344     def represent_undefined(self, data: Any) -> None:
--> 345         raise RepresenterError(f'cannot represent an object: {data!s}')
    346 
    347 

RepresenterError: cannot represent an object: Counter({'a': 4})

Neither the docs, nor the given Colab notebook contains an example of dumping with the !new tag. How could this be done?

`!include` statement with certain files not working

I want to implement dependency injection with my hyperpyyaml config files. This means keeping configs of datasets, experiments and as such as independent as possible (in different files) and then combining them using include statements. However, the following does not work:

Minimal (not-)working example:

dataset.yaml:

- name: libri_dev
  data: libri
  set: dev

main.py

from hyperpyyaml import load_hyperpyyaml

yaml_string = """
datasets: !include:dataset.yaml #should be the list of datasets in the dataset.yaml file
"""
loaded_yaml = load_hyperpyyaml(yaml_string)

Running main.py results in:

Traceback (most recent call last):
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 316, in resolve_references
    recursive_update(preview, overrides, must_match=overrides_must_match)
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 768, in recursive_update
    raise TypeError(f"Expected to update a mapping, but got: {d}")
TypeError: Expected to update a mapping, but got: [{'name': 'libri_dev', 'data': 'libri', 'set': 'dev'}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    loaded_yaml = load_hyperpyyaml(yaml_string)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 157, in load_hyperpyyaml
    yaml_stream = resolve_references(yaml_stream, overrides, overrides_must_match)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 325, in resolve_references
    _walk_tree_and_resolve("root", preview, preview, file_path)
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 406, in _walk_tree_and_resolve
    included_yaml = resolve_references(f, overrides)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "env/lib/python3.11/site-packages/hyperpyyaml/core.py", line 318, in resolve_references
    raise ValueError(
ValueError: ("The structure of the overrides doesn't match the structure of the document: ", {'datasets': []})

Tracing the error, it boils down to incomplete handling of the overrides in

if overrides is not None and overrides != "":
if isinstance(overrides, str):
overrides = ruamel_yaml.load(overrides)
try:
recursive_update(preview, overrides, must_match=overrides_must_match)
except TypeError:
raise ValueError(
"The structure of the overrides doesn't match "
"the structure of the document: ",
overrides,
)

The code is not able to understand the lack of overrides if it comes in the form {}.

I will submit a fix.

Bug with anchors and `!apply`

This minimal example crashes and claims there's duplicate tags:

import hyperpyyaml

yaml_string = """
a: &id010
 - a
 - b

b: &id001 !new:collections.Counter
 - abcd

d: !apply:operator.add
 a: *id010
 b: *id001
"""

print(hyperpyyaml.load_hyperpyyaml(yaml_string))

gives:

yaml.composer.ComposerError: found duplicate anchor 'id001'; first occurrence
  in "<file>", line 1, column 4
second occurrence
  in "<file>", line 5, column 4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.