GithubHelp home page GithubHelp logo

eladrich / pyrallis Goto Github PK

View Code? Open in Web Editor NEW
186.0 6.0 7.0 4.62 MB

Pyrallis is a framework for structured configuration parsing from both cmd and files. Simply define your desired configuration structure as a dataclass and let pyrallis do the rest!

Home Page: https://eladrich.github.io/pyrallis/

License: MIT License

Python 100.00%
argparse dataclasses configuration-management argparse-alternative argument-parsing deep-learning hydra machine-learning python

pyrallis's Introduction

logo

PyPI version PyTest Downloads All Contributors License: MIT

Pyrallis - Simple Configuration with Dataclasses

Pyrausta (also called pyrallis (πυραλλίς), pyragones) is a mythological insect-sized dragon from Cyprus.

Pyrallis is a simple library, derived from simple-parsing and inspired by Hydra, for automagically creating project configuration from a dataclass.

GIF

Why pyrallis?

With pyrallis your configuration is linked directly to your pre-defined dataclass, allowing you to easily create different configuration structures, including nested ones, using an object-oriented design. The parsed arguments are used to initialize your dataclass, giving you the typing hints and automatic code completion of a full dataclass object.

My First Pyrallis Example 👶

There are several key features to pyrallis but at its core pyrallis simply allows defining an argument parser using a dataclass.

from dataclasses import dataclass
import pyrallis


@dataclass
class TrainConfig:
    """ Training config for Machine Learning """
    workers: int = 8 # The number of workers for training
    exp_name: str = 'default_exp' # The experiment name

def main():
    cfg = pyrallis.parse(config_class=TrainConfig)
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')

The arguments can then be specified using command-line arguments, a yaml configuration file, or both.

$ python train_model.py --config_path=some_config.yaml --exp_name=my_first_exp
Training my_first_exp with 42 workers...

Assuming the following configuration file

exp_name: my_yaml_exp
workers: 42

Key Features

Building on that design pyrallis offers some really enjoyable features including

  • Builtin IDE support for autocompletion and linting thanks to the structured config. 🤓
  • Joint reading from command-line and a config file, with support for specifying a default config file. 😍
  • Support for builtin dataclass features, such as __post_init__ and @property 😁
  • Support for nesting and inheritance of dataclasses, nested arguments are automatically created! 😲
  • A magical @pyrallis.wrap() decorator for wrapping your main class 🪄
  • Easy extension to new types using pyrallis.encode.register and pyrallis.decode.register 👽
  • Easy loading and saving of existing configurations using pyrallis.dump and pyrallis.load 💾
  • Magical --help creation from dataclasses, taking into account the comments as well! 😎
  • Support for multiple configuration formats (yaml, json,toml) using pyrallis.set_config_type ⚙️

Getting to Know The pyrallis API in 5 Simple Steps 🐲

The best way to understand the full pyrallis API is through examples, let's get started!

🐲 1/5 pyrallis.parse for dataclass Parsing 🐲

Creation of an argparse configuration is really simple, just use pyrallis.parse on your predefined dataclass.

from dataclasses import dataclass, field
import pyrallis


@dataclass
class TrainConfig:
    """ Training config for Machine Learning """
    # The number of workers for training
    workers: int = field(default=8)
    # The experiment name
    exp_name: str = field(default='default_exp')


def main():
    cfg = pyrallis.parse(config_class=TrainConfig)
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')


if __name__ == '__main__':
    main()

Not familiar with dataclasses? you should probably check the Python Tutorial and come back here.

The config can then be parsed directly from command-line

$ python train_model.py --exp_name=my_first_model
Training my_first_model with 8 workers...

Oh, and pyrallis also generates an --help string automatically using the comments in your dataclass 🪄

$ python train_model.py --help
usage: train_model.py [-h] [--config_path str] [--workers int] [--exp_name str]

optional arguments:
  -h, --help      show this help message and exit
  --config_path str    Path for a config file to parse with pyrallis (default:
                  None)

TrainConfig:
   Training config for Machine Learning

  --workers int   The number of workers for training (default: 8)
  --exp_name str  The experiment name (default: default_exp)

🐲 2/5 The pyrallis.wrap Decorator 🐲

Don't like the pyrallis.parse syntax?

def main():
    cfg = pyrallis.parse(config_class=TrainConfig)
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')

One can equivalently use the pyrallis.wrap syntax 😎

@pyrallis.wrap()
def main(cfg: TrainConfig):
    # The decorator automagically uses the type hint to parsers arguments into TrainConfig
    print(f'Training {cfg.exp_name} with {cfg.workers} workers...')

We will use this syntax for the rest of our tutorial.

🐲 3/5 Better Configs Using Inherent dataclass Features 🐲

When using a dataclass we can add additional functionality using existing dataclass features, such as the post_init mechanism or @properties 😁

from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import pyrallis


@dataclass
class TrainConfig:
    """ Training config for Machine Learning """
    # The number of workers for training
    workers: int = field(default=8)
    # The number of workers for evaluation
    eval_workers: Optional[int] = field(default=None)
    # The experiment name
    exp_name: str = field(default='default_exp')
    # The experiment root folder path
    exp_root: Path = field(default=Path('/share/experiments'))

    def __post_init__(self):
        # A builtin method of dataclasses, used for post-processing our configuration.
        self.eval_workers = self.eval_workers or self.workers

    @property
    def exp_dir(self) -> Path:
        # Properties are great for arguments that can be derived from existing ones
        return self.exp_root / self.exp_name


@pyrallis.wrap()
def main(cfg: TrainConfig):
    print(f'Training {cfg.exp_name}...')
    print(f'\tUsing {cfg.workers} workers and {cfg.eval_workers} evaluation workers')
    print(f'\tSaving to {cfg.exp_dir}')
$ python -m train_model.py --exp_name=my_second_exp --workers=42
Training my_second_exp...
    Using 42 workers and 42 evaluation workers
    Saving to /share/experiments/my_second_exp

Notice that in all examples we use the explicit dataclass.field syntax. This isn't a requirement of pyrallis but rather a style choice. As some of your arguments will probably require dataclass.field (mutable types for example) we find it cleaner to always use the same notation.

🐲 4/5 Building Hierarchical Configurations 🐲

Sometimes configs get too complex for a flat hierarchy 😕, luckily pyrallis supports nested dataclasses 💥

@dataclass
class ComputeConfig:
    """ Config for training resources """
    # The number of workers for training
    workers: int = field(default=8)
    # The number of workers for evaluation
    eval_workers: Optional[int] = field(default=None)

    def __post_init__(self):
        # A builtin method of dataclasses, used for post-processing our configuration.
        self.eval_workers = self.eval_workers or self.workers


@dataclass
class LogConfig:
    """ Config for logging arguments """
    # The experiment name
    exp_name: str = field(default='default_exp')
    # The experiment root folder path
    exp_root: Path = field(default=Path('/share/experiments'))

    @property
    def exp_dir(self) -> Path:
        # Properties are great for arguments that can be derived from existing ones
        return self.exp_root / self.exp_name

# TrainConfig will be our main configuration class.
# Notice that default_factory is the standard way to initialize a class argument in dataclasses

@dataclass
class TrainConfig:
    log: LogConfig = field(default_factory=LogConfig)
    compute: ComputeConfig = field(default_factory=ComputeConfig)

@pyrallis.wrap()
def main(cfg: TrainConfig):
    print(f'Training {cfg.log.exp_name}...')
    print(f'\tUsing {cfg.compute.workers} workers and {cfg.compute.eval_workers} evaluation workers')
    print(f'\tSaving to {cfg.log.exp_dir}')

The argument parse will be updated accordingly

$ python train_model.py --log.exp_name=my_third_exp --compute.eval_workers=2
Training my_third_exp...
    Using 8 workers and 2 evaluation workers
    Saving to /share/experiments/my_third_exp

🐲 5/5 Easy Serialization with pyrallis.dump 🐲

As your config get longer you will probably want to start working with configuration files. Pyrallis supports encoding a dataclass configuration into a yaml file 💾

The command pyrallis.dump(cfg, open('run_config.yaml','w')) will result in the following yaml file

compute:
  eval_workers: 2
  workers: 8
log:
  exp_name: my_third_exp
  exp_root: /share/experiments

pyrallis.dump extends yaml.dump and uses the same syntax.

Configuration files can also be loaded back into a dataclass, and can even be used together with the command-line arguments.

cfg = pyrallis.parse(config_class=TrainConfig,
                              config_path='/share/configs/config.yaml')

# or the decorator synrax
@pyrallis.wrap(config_path='/share/configs/config.yaml')

# or with the CONFIG argument
python my_script.py --log.exp_name=readme_exp --config_path=/share/configs/config.yaml

# Or if you just want to load from a .yaml without cmd parsing
cfg = pyrallis.load(TrainConfig, '/share/configs/config.yaml')

Command-line arguments have a higher priority and will override the configuration file

Finally, one can easily extend the serialization to support new types 🔥

# For decoding from cmd/yaml
pyrallis.decode.register(np.ndarray,np.asarray)

# For encoding to yaml 
pyrallis.encode.register(np.ndarray, lambda x: x.tolist())

# Or with the wrapper version instead 
@pyrallis.encode.register
def encode_array(arr : np.ndarray) -> str:
    return arr.tolist()

🐲 That's it you are now a pyrallis expert! 🐲

Why Another Parsing Library?

XKCD 927 - Standards

XKCD 927 - Standards

The builtin argparse has many great features but is somewhat outdated 👴 with one its greatest weakness being the lack of typing. This has led to the development of many great libraries tackling different weaknesses of argparse (shout out for all the great projects out there! You rock! 🤘).

In our case, we were looking for a library that would support the vanilla dataclass without requiring dedicated classes, and would have a loading interface from both command-line and files. The closest candidates were hydra and simple-parsing, but they weren't exactly what we were looking for. Below are the pros and cons from our perspective:

A framework for elegantly configuring complex applications from Facebook Research.

  • Supports complex configuration from multiple files and allows for overriding them from command-line.
  • Does not support non-standard types, does not play nicely with datclass.__post_init__and requires a ConfigStore registration.

A framework for simple, elegant and typed Argument Parsing by Fabrice Normandin

  • Strong integration with argparse, support for nested configurations together with standard arguments.
  • No support for joint loading from command-line and files, dataclasses are still wrapped by a Namespace, requires dedicated classes for serialization.

We decided to create a simple hybrid of the two approaches, building from SimpleParsing with some hydra features in mind. The result, pyrallis, is a simple library that that is relatively low on features, but hopefully excels at what it does.

If pyrallis isn't what you're looking for we strongly advise you to give hydra and simpleParsing a try (where other interesting option include click, ext_argpase, jsonargparse, datargs and tap). If you do ❤️ pyrallis then welcome aboard! We're gonna have a great journey together! 🐲

Tips and Design Choices

Beware of Mutable Types (or use pyrallis.field)

Dataclasses are great (really!) but using mutable fields can sometimes be confusing. For example, say we try to code the following dataclass

@dataclass
class OptimConfig:
    worker_inds: List[int] = []
    # Or the more explicit version
    worker_inds: List[int] = field(default=[])

As [] is mutable we would actually initialize every instance of this dataclass with the same list instance, and thus is not allowed. Instead dataclasses would direct you the default_factory function, which calls a factory function for generating the field in every new instance of your dataclass.

worker_inds: List[int] = field(default_factory=list)

Now, this works great for empty collections, but what would be the alternative for

worker_inds: List[int] = field(default=[1,2,3])

Well, you would have to create a dedicated factory function that regenerates the object, for example

worker_inds: List[int] = field(default_factory=lambda : [1,2,3])

Kind of annoying and could be confusing for a new guest reading your code 😕 Now, while this isn't really related to parsing/configuration we decided it could be nice to offer a sugar-syntax for such cases as part of pyrallis

from pyrallis import field
worker_inds: List[int] = field(default=[1,2,3], is_mutable=True)

The pyrallis.field behaves like the regular dataclasses.field with an additional is_mutable flag. When toggled, the default_factory is created automatically, offering the same functionally with a more reader-friendly syntax.

Uniform Parsing Syntax

For parsing files we opted for yaml as our format of choice, following hydra, due to its concise format. Now, let us assume we have the following .yaml file which yaml successfully handles:

compute:
  worker_inds: [0,2,3]

Intuitively we would also want users to be able to use the same syntax

python my_app.py --compute.worker_inds=[0,2,3]

However, the more standard syntax for an argparse application would be

python my_app.py --compute.worker_inds 0 2 3

We decided to use the same syntax as in the yaml files to avoid confusion when loading from multiple sources.

Not a yaml fun? pyrallis also supports json and toml formats using pyrallis.set_config_type('json') or with pyrallis.config_type('json'):

TODOs:

  • Fix error with default Dict and List
    Underlying error: No decoding function for type ~KT, consider using pyrallis.decode.register
  • Refine the --help command

For example the options argument is confusing there

  • Add a test to omit_defaults

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Ido Weiss

🎨 🤔

Yair Feldman

🎨 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

pyrallis's People

Contributors

allcontributors[bot] avatar eladrich avatar kianmeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pyrallis's Issues

Running on a notebook (such as Colab)?

First of all, thanks for this amazing library.

I try to run on Jupyter notebook with a config file, but I get the following error message:

AttributeError: 'str' object has no attribute 'copy'

I use this kind of code: args = pyrallis.load(TrainConfig, "a.yaml")

Do you have any idea of using pyrallis on notebooks?

Configurable Help Text

This is an awesome project! One small enhancement - it would be really nice to support "configurable" population orders for help text beyond the default (look for docstring above argument, then below, then inline).

A common pattern for many codebases I'm working with ends up looking like:

@dataclass
class Config:
    # fmt: off
    mode: str = "train"                         # Mode in < train | evaluate | visualize >

    # Data / Preprocessing Parameters
    data: Path = Path("data/")                  # Path to data directory with MNIST images / labels
    download: bool = True                       # Whether to download MNIST data (if doesn't already exist)

    # Model Parameters
    hidden_dim: int = 256                       # Hidden Layer Dimensionality for 2-Layer MLP
    # fmt: on


@pyrallis.wrap()
def main(cfg: Config) -> None:
    print(cfg)

Because of style directives (fmt: off) and "factored" configuration arguments, the help text ends up populated as:

usage: main.py [-h] [--config_path str] [--mode str] [--data str] [--download str] [--hidden_dim str]

optional arguments:
  -h, --help         show this help message and exit
  --config_path str  Path for a config file to parse with pyrallis

Config:

  --mode str         fmt: off
  --data str         Data / Preprocessing Parameters
  --download str     Whether to download MNIST data (if doesn't already exist)
  --hidden_dim str   Model Parameters

Notably the mode and data and hidden_dim parameters are all populated incorrectly, when ideally they'd be populated inline (e.g. `mode --> "Mode in < train | evaluate | visualize >").

Happy to PR with a fix if that would be easy! I think we just need to pass an additional argument to the initializer in dataclass_wrapper.py!

Handling of typing.Literal

Hey there! I've been putting pyrallis to action lately and can't over emphasize how much cleaner it's been to integrate and apply configurations.

That said, there are a couple use cases I'm interested to hear your thoughts on. First being how pyrallis handles typing.Literal. I see in decoding/decode_field that the Literal field will decode to 'Any' and that Literal[arg1, arg2, ..] will decode into arg1. In the case of Literal["constant"] this decodes to "constant", which is not a type.

More succinctly, does the patternLiteral[object] make good sense to use and if so, what might be the way for pyralis to handle this type appropriately?

Traceback (most recent call last):
  File "/anaconda3/envs/spec/lib/python3.9/site-packages/pyrallis/parsers/decoding.py", line 65, in decode_dataclass
    field_value = decode_field(field, raw_value)
  File "/anaconda3/envs/spec/lib/python3.9/site-packages/pyrallis/parsers/decoding.py", line 102, in decode_field
    return decode(field_type, raw_value)
  File "/anaconda3/envs/spec/lib/python3.9/functools.py", line 877, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/anaconda3/envs/spec/lib/python3.9/site-packages/pyrallis/parsers/decoding.py", line 33, in decode
    return get_decoding_fn(t)(raw_value)
  File "/anaconda3/envs/spec/lib/python3.9/site-packages/pyrallis/parsers/decoding.py", line 176, in get_decoding_fn
    raise Exception(f"No decoding function for type {t}, consider using pyrallis.decode.register")
Exception: No decoding function for type typing.Literal['onecycle', 'constant'], consider using pyrallis.decode.register

Configure functions and classes

Hey,

Thanks for this library, I will try it in my next project.

One problem I often face is configuring which class or function my code should use, e.g. use ResNet50 or VGG as backbone. This often ends up in if-clauses, and in gin-config one could use the @-Syntax in the config file to bind to functions and classes.

Do you think something similar would be possible with this library too, since we can simply type custom classes like nn.Module/tf.Module/Callable/etc.?

Dumping without defaults

Hi,

When serializing a dataclass, there are cases in which we would like the output to be as concise as possible, and omit config entries if their value is the default one.

What do you think about adding a parameter to the encode/dump functions so that this behavior could be toggled on/off by the user?

Default value for field with no value specified / required field

Hi!

Is there any way to specify that some field is required (see, e.g., click's required option)?

Also, by default one cannot use field from dataclasses with no default value (with every other parameter left also by default) while with pyrallis it is replaced with None value instead. Although, personally I am not having any problem with that (even more, with that behaviour, I can somewhat manually check for required fields via assert some_value is not None for example) but it would be much cleaner if such required options are available.

Question/feature request: drop-in replace pydantic dataclass

Thanks alot for an amazing package!! Has most of the stuff I've been looking around for in (many) other packages.
Is it possible to replace python built-in dataclass with pydantic dataclass to allow for run-time data validation with pyrallis?
It would be a great feature improvement. I havent found the argparse+pydantic combination anywhere else.

idiomatic way to choose and init sub-config during parsing

I would like to be able to choose a model config if there are several different models. Example:

# main.py
@dataclass
class Model0Config:
    input_dim: int = 32
    output_dim: int = 32
    hidden_dim: int = 32

@dataclass
class Model1Config:
    input_dim: int = 32
    some_other_arg: str = "test_arg"

class ModelConfigs(Enum):
    model0 = Model0Config
    model1 = Model1Config

@dataclass
class TrainConfig:
    wandb_name: str = "some default name"
    model_config: ModelConfigs = field(default=ModelConfigs.model0)
   

Problem here is than I cannot setup sub-config for model is this way, as this will raise an error:

python main.py --model_config=model1 --model_config.input_dim=128

How would you implement something like this? Perhaps there is some other way? Thanks!

Is it possible to stop accepting arguments from the command line?

Hi, thank you for developing this powerful toolbox. I have a problem when trying to import a package which relies on the pyrallis into my own code. Because I use argparse in my script, pyrallis in that package seems also get my args from the command line, which leads to errors.
I wonder if it is possible to disable the pyrallis to read the args from command line(only get them from config_file)? Thank you!

Dead?

Hey,

I really like Pyrallis, and I've been using it happily since it came out. I did run into a place where I needed a new feature (subclass registries), that spiraled into a bunch of other issues. Some of it's kinda big.

Are you open to patches/new features at this point or are you doing other things? (No shame, I run out of time/energy/context to work on things too). I might fork the project if you're done with it...

TOML support

Hello,

Is there a chance to support different configuration formats, e.g. TOML or JSON? I wonder how deeply yaml is integrated.

Best,
-Justin

PS: Nice project. :)

support for -h raises an error

my code is like this.

@dataclass
class Person:
    name: str = 'John'
    age: int = 18

if __name__ == '__main__':
    import sys
    print(sys.argv)
    #main()
    parser = argparse.ArgumentParser()

    parser.add_argument('--config_path', type=str)
    parser.add_argument('hps', nargs=argparse.REMAINDER)
    args = parser.parse_args()
    print(args)
    cfg = pyrallis.parse(Person, config_path = args.config_path, args = args.hps[1:])
    print(cfg)

when I run python simple_test.py hps --help, it give right help messages.
but when I run python simple_test.py hps -h, it raise an exception.

Is this a bug?

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.