GithubHelp home page GithubHelp logo

mit-ll-responsible-ai / hydra-zen Goto Github PK

View Code? Open in Web Editor NEW
309.0 6.0 13.0 23.2 MB

Create powerful Hydra applications without the yaml files and boilerplate code.

Home Page: https://mit-ll-responsible-ai.github.io/hydra-zen/

License: MIT License

Python 100.00%
configuration machine-learning dataclasses dynamic-configuration yaml-configuration pytorch pytorch-lightning reproducibility reproducible-science scalable

hydra-zen's Introduction

hydra-zen

image

PyPI Python version support Conda-forge GitHub Actions Code Coverage Type-Completeness Score Tested with Hypothesis Cite Us

A library that facilitates configurable, reproducible, and scalable workflows, using Hydra.

Check out our documentation for more information.

Interested in machine learning? Check out our guide for using PyTorch Lightning with hydra-zen.⚡

hydra-zen is a Python library that simplifies the process of writing code (research-grade or production-grade) that is:

  • Configurable: you can configure all aspects of your code from a single interface (the command line or a single Python function).
  • Repeatable: each run of your code will be self-documenting; the full configuration of your software is saved alongside your results.
  • Scalable: launch multiple runs of your software, be it on your local machine or across multiple nodes on a cluster.

hydra-zen eliminates all hand-written yaml configs from your Hydra project. It does so by providing functions that dynamically and automatically generate dataclass-based configs for your code. It also provides a custom config-store API and task-function wrapper, which help to eliminate most of the Hydra-specific boilerplate from your project.

Learn about hydra-zen at a glance.

Installation

hydra-zen is lightweight: its only dependencies are hydra-core and typing-extensions.

pip install hydra-zen

Contributing

Before opening a PR to this repo, consider posting an issue or a discussion topic to share your ideas with us. We will work with you to ensure your feature is well-scoped and that your hard work goes to good use.

(See an obvious bug or typo? Go ahead and just open a PR :) )

For further details refer to these docs.

Join the Discussion

Share ideas, ask questions, and chat with us over at hydra-zen's discussion board.

Citation

Using hydra-zen for your research? Please cite the following publication:

@article{soklaski2022tools,
  title={Tools and Practices for Responsible AI Engineering},
  author={Soklaski, Ryan and Goodwin, Justin and Brown, Olivia and Yee, Michael and Matterer, Jason},
  journal={arXiv preprint arXiv:2201.05647},
  year={2022}
}

Disclaimer

DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.

© 2024 MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Subject to FAR 52.227-11 – Patent Rights – Ownership by the Contractor (May 2014)
SPDX-License-Identifier: MIT

This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

A portion of this research was sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

The software/firmware is provided to you on an As-Is basis

hydra-zen's People

Contributors

cameronraysmith avatar dependabot[bot] avatar jasha10 avatar jgbos avatar nukularrr avatar rlbellaire avatar rsokl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hydra-zen's Issues

Complex Numbers

So builds supports instantiating complex numbers:

builds(list, [1 + 1j])
#or
builds(torch.tensor, 1+1j)

but omegaconf does not support serializing to yaml:

to_yaml(builds(list, [1+1j])

leads to the following error

UnsupportedValueType: Value 'complex' is not a supported primitive type
    full_key: _args_[0][0]
    object_type=list

I'm not sure if we can add support for this, but should at least be documented.

hydra-zen Gotchya List

A list of "gotchyas":

Configuring "Interactive" Functions

Be careful configuring functions defined in interactive modes for reproducibility.

>> from hydra_zen import builds, to_yaml
>> def f(p=2):
..    return p
>> config = builds(f)
>> print(to_yaml(config))
_target_: __main__.f
_recursive_: true
_convert_: none

OmegaConf Interpolations

Be careful in setting OmegaConf interpolations as defaults.

>> from hydra_zen import builds, instantiate
>> def f(p=2):
..    return p
>> config = builds(f, p="${experiment.p})
>> instantiate(config)
InterpolationKeyError: Interpolation key 'experiment.p' not found

Config Defaults for Objects

If you need to override "objects" in a config you must set those objects to None in your configuration.

>> from hydra_zen import builds, to_yaml
>> from hydra_zen.experimental import hydra_run
>> from hydra.core.config_store import ConfigStore
>> cs = ConfigStore.instance()
>> def f(p=2):
..    return p
>> cs.store(group="myfun", name="f", node=builds(f))
>> def g(q=3):
..    return q
>> cs.store(group="myfun", name="g", node=builds(g))
>> config = dict(myfun=builds(f))
>> hydra_run(config, instantiate, overrides=["+myfun=g"])
Merge error : Builds_g is not a subclass of Builds_f. value: {'_target_': '__main__.g', '_recursive_': True, '_convert_': 'none'}
    full_key: 
    object_type=Config

Submitit-Launcher and Environment

If you try to submit jobs from an interactive node on Slurm you should first remove the environment variables for Slurm since submitit copies the current environment.

for k in os.environ.keys():
    if "SLURM" in k:
        del os.environ[k]

Providing a class method as a target for builds

Error:

ImportError: Error instantiating 'hydra_zen.funcs.zen_processing' : Encountered error: `No module named 'pytorch_lightning.core.datamodule.from_datasets'; 'pytorch_lightning.core.datamodule' is not a package` when loading module 'pytorch_lightning.core.datamodule.from_datasets'

To reproduce:

from hydra_zen import builds, MISSING
from pytorch_lightning import LightningDataModule
from torchvision import datasets

CIFAR10 = builds(datasets.CIFAR10,  root=MISSING,  download=True)

CIFAR10DataModule = builds(
    LightningDataModule.from_datasets,
    num_workers=4,
    batch_size=256,
    train_dataset=CIFAR10(root="${..data_dir}"),
    zen_meta=dict(data_dir=MISSING)
)

instantiate(CIFAR10DataModule, data_dir="/path/to/cifar10")

edit

This does not appear to be a zen_processing issue, but either a Hydra or hydra-zen issue. Removing zen_meta above provides the following error:

Traceback (most recent call last):
  File "/home/justin_goodwin/projects/raiden/hydra-zen-example/image_classifier/image_classifier/main.py", line 38, in main
    data = instantiate(cfg.experiment.data)
  File "/home/justin_goodwin/.conda/envs/raiden/lib/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 180, in instantiate
    return instantiate_node(config, *args, recursive=_recursive_, convert=_convert_)
  File "/home/justin_goodwin/.conda/envs/raiden/lib/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 240, in instantiate_node
    _target_ = _resolve_target(node.get(_Keys.TARGET))
  File "/home/justin_goodwin/.conda/envs/raiden/lib/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 104, in _resolve_target
    return _locate(target)
  File "/home/justin_goodwin/.conda/envs/raiden/lib/python3.8/site-packages/hydra/_internal/utils.py", line 587, in _locate
    raise ImportError(
ImportError: Encountered error: `No module named 'pytorch_lightning.core.datamodule.from_datasets'; 'pytorch_lightning.core.datamodule' is not a package` when loading module 'pytorch_lightning.core.datamodule.from_datasets'

edit edit

An issue with classmethods:

>>> print(to_yaml(builds(LightningDataModule)))
_target_: pytorch_lightning.core.datamodule.LightningDataModule

>>> print(to_yaml(builds(LightningDataModule.from_datasets)))
_target_: pytorch_lightning.core.datamodule.from_datasets

`_partial_` is now supported by Hydra's instantiate API

_partial_ = True can now be specified in structured configs to pipe instantiation through functools.partial. Added at facebookresearch/hydra#1905, will be included in Hydra 1.2.1.

This effectively removes the need for

if _zen_partial is True:
return _functools.partial(obj, *args, **kwargs)

other than for backwards compatibility. Instead, specifying builds(..., zen_partial=True) will simply set the _partial_ field, and will not require the zen_processing indirection at all.

How to apply partial when zen_processing is needed for instantiation

Given that

if _zen_partial is True:
return _functools.partial(obj, *args, **kwargs)
occurs at the end of zen_partial, we could have _partial_ be responsible for describing all partial'd configs, even for those involving indirection through zen_processing.

That being said, there are cons of having Hydra being solely responsible for applying partial when zen_processing: The resulting partial'd object will have a .func attribute that points to zen_processing, not the actual target. Similarly .keywords will include zen-specific fields. These issues do not arise when zen_processing is actually responsible for applying partial.

The con of having zen_processing be responsible for administering partial is that it complicates our PartialBuilds protocol -- we would effectively need two such protocols, if we want to be totally accurate. It would also slightly complicate the implication of is_partial_builds.

I think that the former con, of solely using _partial_ = True to handle all partial'd configs, is the worst of the two. Having the partial'd object produced by instantiation reflect the actual target and its arguments is important to the user experience; the zen_processing function should never be exposed to users, and it is possible that some users will actually need the .func attribute to reflect the appropriate target.

Other changes:

Note get_target will not need to be modified.

Run pyright as part of CI to verify static annotations

Should be able to validate:

class A:
    x = 11
    def __init__(self) -> None:
        pass

def f(x: int) -> int: return x

conf_a_partial = builds(A, hydra_partial=True)
partial_out = instantiate(conf_a_partial)
should_be_A = partial_out()

conf_f_partial = builds(f, hydra_partial=True)
partial_out_f = instantiate(conf_f_partial)
should_be_f = partial_out_f()


conf_A = builds(A)
should_be_A_again = instantiate(conf_A)
a = should_be_A()
a


def f(x: int) -> int: return x

conf_f = builds(f)
should_be_int = instantiate(conf_f)


should_be_int


conf_just_f = just(f)
conf_just_f
oo = instantiate(conf_just_f)
oo

New feature: `hydra_zen.get_target`

hydra_zen.get_target would enable users to access (and import) the target of a config.

>>> Conf = builds(dict)
>>> get_target(Conf)
<class 'dict'>

This is useful for things like loading a lightning module from a checkpoint, via a hydra yaml.

>>> ExpConfig = load_from_yaml("my_exp.yaml")
>>> LitModule = get_target(ExpConfig.lit_module)  # get lightning module class
>>> module_kwargs = instantiate(builds(dict, builds_bases=(ExpConfig.lit_module,))) 
>>> LitModule.load_from_checkpoint(checkpoint_path="./chk.ckpt, **module_kwargs)

Without get_target, one would need to do

>>> LitModule = type(instantiate(ExpConfig.lit_module))

which is a bit wasteful, since you have to initialize the whole module just to get the type.

I can imaging that get_target could be handy in simpler cases as well.

Add warning to PL example about issues of keeping models in-memory

Regarding: Run Boilerplate-Free ML Experiments with PyTorch Lightning & hydra-zen

While the following

    # return the trained model instance and the final fit
    return (
        lit_module,
        final_fit.detach().numpy().ravel(),
    )

is very convenient and appropriate for our basic how-to, it is inadvisable to return the model in-memory at the end of the job. A multirun will result in multiple models being held in-memory, which could easily lead to memory issues at-scale.

We should add a warning.

Optimizer Example

Imports

import matplotlib.pyplot as plt

import numpy as np
from torch.optim import SGD, Adadelta, Adagrad, Adam
import torch
from torch import nn

from hydra.core.config_store import ConfigStore
from hydra_zen import builds, instantiate
from hydra_zen.experimental import hydra_launch

Configs

SGDConf = builds(SGD, lr=0.3, momentum=0.5, hydra_partial=True)
AdadeltaConf = builds(Adadelta, lr=40, hydra_partial=True)
AdagradConf = builds(Adagrad, lr=0.3, hydra_partial=True)
AdamConf = builds(Adam, lr=0.1, hydra_partial=True)

cs = ConfigStore.instance()
cs.store(group="optim", name="sgd", node=SGDConf)
cs.store(group="optim", name="adadelta", node=AdadeltaConf)
cs.store(group="optim", name="adagrad", node=AdagradConf)
cs.store(group="optim", name="adam", node=AdamConf)

Model

class Model:    
    def __call__(self, z):
        if z.ndim == 2:
            x = z[:, 0]
            y = z[:, 1]
        else:
            x = z[0]
            y = z[1]
        return .1*x**2 + .2*y**2

x, y = np.arange(-2, 2, 0.1), np.arange(-2, 2, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.stack((X.flatten(), Y.flatten()), 1)
V = Model()(Z)
plt.contour(X, Y, V.reshape(len(x), len(y)), levels=[0.01, 0.12, .3, 0.5], colors="black")

Optimization Loop

def task_function(cfg):
    f = instantiate(cfg.model)
    x0 = torch.tensor([-1.5, 0.5]).requires_grad_(True)
    optim = instantiate(cfg.optim)([x0])
    trajectory = [x0.detach().clone().numpy()]
    for i in range(20):
        l = f(x0)
        optim.zero_grad()
        l.backward()
        optim.step()
        trajectory.append(x0.detach().clone().numpy())
    return np.stack(trajectory)

Execute Hydra Run

cfg = dict(model=builds(Model), optim=SGDConf(lr=0.5))
job = hydra_launch(cfg, task_function)

fig, ax = plt.subplots()
ax.contour(X, Y, V.reshape(len(x), len(y)), levels=[0.01, 0.12, .3, 0.5], colors="black")
ax.plot(job.return_value[:, 0], job.return_value[:, 1])

Execute Hydra Multirun

# NOTE: You have to set optim=None for multirun to work (omegaconf complains about merging with different types)
cfg= dict(model=builds(Model), optim=None)
jobs = hydra_launch(cfg, task_function, multirun_overrides=["+optim=sgd,adam,adagrad,adadelta"])

fig, ax = plt.subplots()
ax.contour(X, Y, V.reshape(len(x), len(y)), levels=[0.01, 0.12, .3, 0.5], colors="black")
for j in jobs[0]:
    ax.plot(j.return_value[:, 0], j.return_value[:, 1], label=j.hydra_cfg.hydra.overrides.task[0].split("=")[1])
    
plt.legend()

New feature: `hydra_zen.builds_decorator`

We had a discussion on Teams about possibly adding a builds_decorator of the form:

def builds_decorator(f):
    Conf = builds(f, populate_full_sig=True)
    f.config = Conf
    return f

Then we can automatically build configs for a function or class. For example, to get the config for the following function,

@builds_decorator
def foo(x, val=1):
  return x, val

simply access the config attribute,

FooConfig = foo.config(val=2)

Typing Error for `builds` using ART

I was hoping to make a simple example but I'm not sure what is causing the error:

from art.estimators.classification import PyTorchClassifier

builds(PyTorchClassifier)  # the config is actually incomplete but it never gets to that error

error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-81-0ae74d8b49b2> in <module>
----> 1 builds(PyTorchClassifier)

~/projects/raiden/hydra_zen/src/hydra_zen/structured_configs/_implementations.py in builds(target, populate_full_signature, hydra_partial, hydra_recursive, hydra_convert, frozen, dataclass_name, builds_bases, *pos_args, **kwargs_for_target)
    645             # target is class object...
    646             # calling `get_type_hints(target)` returns empty dict
--> 647             type_hints = get_type_hints(target.__init__)
    648         else:
    649             type_hints = get_type_hints(target)

~/.conda/envs/trusted_analytics/lib/python3.8/typing.py in get_type_hints(obj, globalns, localns)
   1262         if isinstance(value, str):
   1263             value = ForwardRef(value)
-> 1264         value = _eval_type(value, globalns, localns)
   1265         if name in defaults and defaults[name] is None:
   1266             value = Optional[value]

~/.conda/envs/trusted_analytics/lib/python3.8/typing.py in _eval_type(t, globalns, localns)
    268     """
    269     if isinstance(t, ForwardRef):
--> 270         return t._evaluate(globalns, localns)
    271     if isinstance(t, _GenericAlias):
    272         ev_args = tuple(_eval_type(a, globalns, localns) for a in t.__args__)

~/.conda/envs/trusted_analytics/lib/python3.8/typing.py in _evaluate(self, globalns, localns)
    516                 localns = globalns
    517             self.__forward_value__ = _type_check(
--> 518                 eval(self.__forward_code__, globalns, localns),
    519                 "Forward references must evaluate to types.",
    520                 is_argument=self.__forward_is_argument__)

~/software/adversarial-robustness-toolbox/art/estimators/classification/pytorch.py in <module>

NameError: name 'torch' is not defined

No issues when trying to load a PyTorchClassifier normally.

New feature: `hydra_zen.from_yaml`

hydra_zen.from_yaml would take in a yaml-string and returns a corresponding structured config (dataclass object).

Why is this useful? The case that I encountered is wanting to load and inherit from a config:

# just an example

# not a dataclass object (not inheritable)
LoadedConf = OmegaConf.load(exp_dir / ".hydra/config.yaml")

# now a dataclass object (all annotations will be `Any`)
LoadedConf = hydra_zen.from_yaml(LoadedConf)

ModifiedConf = builds(Trainer, gpus=1, accelerator=None, batch_size=25, builds_bases=(LoadedConf ,))

I realize that LoadedConf can have its attributes be updated directly, and that there are probably mechanisms provided by hydra/omegaconf to merge configs, but thus far we do not expect our users to use any variety of config other than a structured config dataclass. Nor do we expect them to be well-versed in omegaconf. Thus it seems like enabling this inheritance route is natural / elegant.

I would be interested to get feedback on this.

  • Is the above workflow something we want to recommend or is it simpler/cleaner to update the attributes directly (or, to leverage hydra/omegaconf)?
  • Are there any downsides/limitations to the above workflow?
  • Unanticipated hurdles to going from yaml to dataclass?

`hydra_multirun` and losing ability to create configs

hydra_multirun with a launcher messes with OmegaConf's ability to create a configuration from a builds (so dataclass) object:

from hydra_zen import builds
from hydra_zen.experimental import hydra_multirun
import pytorch_lightning as pl

conf = dict(raiden=dict(trainer=builds(pl.Trainer, accelerator="dp", num_nodes=1, gpus=1)))
job = hydra_multirun(conf, instantiate, overrides=["hydra/launcher=submitit_local"])

After running it, no matter what I try the following happens:

from omegaconf import OmegaConf
OmegaConf.create(conf)

results in

{'raiden': {'trainer': {}}}

The only way to recover is to restart the notebook.

This does not happen with hydra/launcher=basic....

`populate_full_signature` with `torch.tensor`

The following builds

fbuilds = make_custom_builds_fn(populate_full_signature=True)
fbuilds(torch.tensor, 1.0)

raises this ValueError

ValueError: Building: tensor ..
<built-in method tensor of type object at 0x7fcfb3e27ec0> does not have an inspectable signature. `builds(tensor, populate_full_signature=True)` is not supported

Sweepers and Launchers

Here are some thoughts regarding the limitation of Hydra to allow custom launchers and sweepers without installing them in the hydra_plugins package.

hydra_launch and instantiating sweepers

Currently hydra_launch instantiate's sweepers based on how Hydra does it:

sweeper = Plugins.instance().instantiate_sweeper(
    config=task_cfg,
    config_loader=hydra.config_loader,
    task_function=task_function,
)

Unfortunately this requires the user to install the package in hydra_plugins for Hydra to use a custom sweeper. I believe we do not need the "plugin discovery" part of Hydra and therefore we can change this line to:

sweeper = instantiate(task_cfg.hydra.sweeper)
assert isinstance(sweeper, Sweeper)
sweeper.setup(
    config=task_cfg,
    config_loader=hydra.config_loader,
    task_function=task_function,
)

Instantiating custom launchers in a sweeper

For a user to configure a custom launcher they would have to modify the sweeper setup method from the following (seems to be the same for all sweepers):

class MySweeper(Sweeper):
    ...
    def setup(
        self,
        config: DictConfig,
        config_loader: ConfigLoader,
        task_function: TaskFunction,
    ) -> None:
        self.config = config
        self.config_loader = config_loader
        self.launcher = Plugins.instance().instantiate_launcher(
            config=config, config_loader=config_loader, task_function=task_function
        )
        self.sweep_dir = config.hydra.sweep.dir

to

class MySweeper(Sweeper):
    ...
    def setup(
        self,
        config: DictConfig,
        config_loader: ConfigLoader,
        task_function: TaskFunction,
    ) -> None:
        self.config = config
        self.config_loader = config_loader
        self.launcher = instantiate(config.hydra.launcher)
        assert isinstance(self.launcher, Launcher)
        self.launcher.setup(
            config=config,
            config_loader=config_loader,
            task_function=task_function,
        )
        self.sweep_dir = config.hydra.sweep.dir

Not sure of a solution for hydra_zen other than documenting and showing a user how to do this, we might also consider shipping a SweeperZen with the above setup method.

Return object for `make_config` with `hydra_convert="all"` is a `dict`

Issue

When setting a parameter as a list in make_config the instantiated object remains a omegaconf.ListConfig object. Yet when using hydra_convert="all" the config object becomes a dict type while the, but the parameter is now a list type.

Example

from hydra_zen import make_config, instantiate

test = instantiate(make_config(val=[1, 2]))
# issue: type(test.val) == ListConfig

test_all = instantiate(make_config(val=[1,2], hydra_convert="all"))
# issue: type(test_all) == dict
# type(test_all["val"]) == list

Revamping our docs

Despite having put quite a bit of work into them... I don't like our docs!

It doesn't feel like I can confidently point a brand-new, has-never-used-Hydra, user to them and expect them to walk away feeling comfortable or happy with hydra-zen. In fact, I have had people tell me "I have wanted to use hydra-zen, but it's really intimidating".

I would like to redesign our docs to follow Divio's Grand Theory of Documentation, which breaks down docs into four quadrants:

image

Some preliminary ideas for this...

Tutorials

  • Making our first Python-launchable app with hydra-zen
  • Making our first CLI-launchable app with hydra-zen
  • Designing modular configurations with swappable groups
  • Replicating our results
  • Launching multiple runs of our app

How-To Guide

( I need to read / think more about these)

  • Design a project to be configurable, repeatable, and scalable
  • Customize configuration interfaces and behaviors with builds
  • Use yamls and python configs together
  • Use hydra-zen and PyTorch-Lightning for boilerplate-free ML
  • Submit jobs on an interactive node on Slurm
  • Write and run your own sweeper

Explanation

  • Discussion of relevant Hydra concepts; relationship between builds and dataclasses
  • Why use hydra-zen? (DRY principle)

Reference

This is pretty straight-forward

Implement Partial generic to be leveraged by instantiate

class partial(Generic[_T]):
    func: Callable[..., _T]
    args: Tuple[Any, ...]
    keywords: Dict[str, Any]
    def __init__(self, func: Callable[..., _T], *args: Any, **kwargs: Any) -> None: ...
    def __call__(self, *args: Any, **kwargs: Any) -> _T: ...

Path to `v0.3.0`

  • The bugfix caught in 481a0a2
  • hydra_meta and hydra_partial -> zen_meta and zen_partial
  • Implementation for zen_wrappers including support for specifying wrappers via interpolated strings
  • Support for pydantic
  • Support for beartype, which requires some heavy machinery on our end to do sequence-coercion
  • A nice interface for customizing the defaults on builds
  • The beginner-friendly make_config
  • #127

After these, I think this will be mark a feature-freeze on 0.3.0, which has become much more ambitious than I had expected!
Then.. docs!

Inconsistent static analysis via pyright

x = [1, 2, 3]
make_config(a=[1, 2, 3])
make_config(a=x)  # pyright marks this as invalid based on our annotations

This seems like a pyright issue, but we might consider revising annotations to accommodate

Hydra Launch Functionality

How hydra_launch Works

I plan to edit this, just wanted to start getting this documented, feel free to comment or edit

I'll try to document how hydra_launch is meant to be used and we can discuss how to make it more user friendly.

Configurations

There are two accepted approaches for providing configurations to hydra_launch:

  1. Provide a configuration object with a "hydra" attribute
  2. Provide a configuration object without a "hydra" attribute

There are also two methods for running an experiment:

  • A. Regular Hydra run
  • B. A Hydra multirun experiment

1A

Can the user update config before running? Yes

cfg = load_config_with_hydra(config_name="config")
cfg.experiment.param = 1
hydra_launch(cfg, task_function)

Can the user use the override keyword argument? No
This will raise an exception.

cfg = load_config_with_hydra(config_name="config")
hydra_launch(cfg, task_function, override=["experiment.param=1"])

1B

multirun is only executed when multirun_overrides is set.

Can the user update config before running? No

Hydra will not run any experiments with experiment.param=1 below:

cfg = load_config_with_hydra(config_name="config")
cfg.experiment.param = 1
hydra_launch(cfg, task_function, multirun_overrides=["experiment.foo=1,2,3"])

Can the user use the override keyword argument? No
This will raise an exception.

cfg = load_config_with_hydra(config_name="config")
hydra_launch(cfg, task_function, override=["experiment.param=1"], multirun_overrides=["experiment.foo=1,2,3"])

2A

Can the user update config before running? Yes

cfg = load_config_without_hydra(config_name="config")
cfg.experiment.param = 1
hydra_launch(cfg, task_function)

Can the user use the override keyword argument? Yes

cfg = load_config_without_hydra(config_name="config")
hydra_launch(cfg, task_function, override=["experiment.param=1"])

2B

multirun is only executed when multirun_overrides is set.

Can the user update config before running? Yes

cfg = load_config_without_hydra(config_name="config")
cfg.experiment.param = 1
hydra_launch(cfg, task_function, multirun_overrides=["experiment.foo=1,2,3"])

Can the user use the override keyword argument? Yes

cfg = load_config_without_hydra(config_name="config")
hydra_launch(cfg, task_function, override=["experiment.param=1"], multirun_overrides=["experiment.foo=1,2,3"])

Mirroring hydra / omegaconf

Bringing functions under one roof (namespace), making documentation more explicit, and adding type hints

  • OmegaConf.to_yaml
  • OmegaConf.create
  • OmegaConf.to_container

Hydra Launch and Multirun

Right now we have the signature:

def hydra_launch(
    config: Union[DataClass, DictConfig, Mapping],
    task_function: Callable[[DictConfig], Any],
    multirun_overrides: Optional[List[str]] = None,
    overrides: Optional[List[str]] = None,
    config_dir: Optional[Union[str, Path]] = None,
    config_name: str = "hydra_launch",
    job_name: str = "hydra_launch",
) -> JobReturn:

I do not believe there is a need after #32 to distinguish between overrides and multirun_overrides. The only difference is how Hydra is launched (and that Hydra run mode will raise exception if multirun parameters are provided). We should either create a Boolean variable is_multirun or a separate out the run modes into hydra_run and hydra_multirun instead of hydra_launch.

meta fields dont get set properly when populate_full_signature=True

def foo(a, b):
    return a, b

Here, c is appropriately included in the config.

>>> print(to_yaml(builds(foo, a="${.c}", zen_meta=dict(c=2), populate_full_signature=False)))
_target_: hydra_zen.funcs.zen_processing
_zen_target: __main__.foo
_zen_exclude:
- c
a: ${.c}
c: 2

Setting populate_full_signature=True, c is no longer included in the config.

>>> print(to_yaml(builds(foo, a="${.c}", zen_meta=dict(c=2), populate_full_signature=True)))
_target_: hydra_zen.funcs.zen_processing
_zen_target: __main__.foo
_zen_exclude:
- c
b: ???
a: ${.c}

disclaimer for reference

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

© 2021 Massachusetts Institute of Technology.

Subject to FAR52.227-11 Patent Rights - Ownership by the contractor (May 2014)

The software/firmware is provided to you on an As-Is basis

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

Make a note about `defaults` in our config-groups tutorial

Working through this tutorial has the user configure a defaults field:

Config = make_config("player", defaults=["_self_", {"player": "base"}])

a reader might naturally look to make_config's docs for explanation...but there is none since we don't give that field any special treatment in make_config. We should at least add a callout box in the tutorial that directs them, loud and clear, to the appropriate Hydra docs.

We also might consider:

  • Mentioning is in the Notes section of make_config
  • Actually including defaults or maybe hydra_defaults as a reserved field in make_config and thus document it... but I don't really see utility in that.

Validation error specific to resolving `dict`

Update: potential edge case / bug in Hydra

def f(x): return x

instantiate(builds(f, list))  # ok
instantiate(builds(f, str))  # ok
instantiate(builds(f, int))  # ok
instantiate(builds(f, len))  # ok
instantiate(builds(f, f))  # ok

instantiate(builds(f, x=dict))  # ok
instantiate(builds(f, dict))  # ValidationError: Unsupported value type: <class 'dict'>

This all gets handled by the same code:

if _pos_args:
base_fields.append(
(
_POS_ARG_FIELD_NAME,
Tuple[Any, ...],
_utils.field(
default=tuple(create_just_if_needed(x) for x in _pos_args),
init=False,
),
)
)

Providing first class support for user-specified dataclasses.field instances

There is a sizeable demand for Hydra to support custom help messages through the CLI: facebookresearch/hydra#633

It appears that the leading candidate for supporting this, at least for structured configs, is via dataclasses.field(..., metadata=<...>): facebookresearch/hydra#633 (comment) and omry/omegaconf#131 (comment).

Presently, we provide limited (and primarily undocumented) support for users to directly pass field objects to builds and make_config and do not make any assurances that we preserve any of the associated metadata. It also appears that we do not perform any value-validation on fields.

Thus I plan for us to:

  • Provide explicit support for passing field instances directly to our config-creation functions
    • ensuring metadata is preserved
    • ensuring parity with all validation that we perform
    • adding test cases that explicitly exercise fields-as-config-values

An edge case that I anticipate is that we might not permit fields as positional arguments.

Additionally ZenField should expose a metadata field. Ultimately ZenField is the same as dataclasses.field, except it can specify name and type as well; so perhaps we should expose all of the field options here, other than default_factory, which we handle automatically.

Lastly, if OmegaConf/Hydra does end up going this metadata route for supporting help messages, then we could potentially add a new feature to builds, where we can parse different styles of docstrings and auto-populate docs fields in metadata. This way, users need not duplicate any docs - the help messages will be derived directly from the source documentation!

TypeError with package `random`

Executing

import random
instantiate(builds(random.uniform, 2, 4))

Results in

TypeError: Error instantiating 'random.Random.uniform' : uniform() missing 1 required positional argument: 'b'

Bypassing omegaconf issue that leads to unexpected results/bug

See omry/omegaconf#830 for the description of the bug.

In short, instantiation "ignores" child class' default-factory-field in favor of the parent's value (unless that value is specifically MISSING)

>>> A = make_config(x=1)
>>> B = make_config(x=[1, 2], bases=(A,))  # must be a mutable value to trigger the bug
>>> instantiate(B)
{'x': 1}

Given that hydra-zen users are particularly heavy users of dataclasses who do not have to deal with the cumbersom nature of manually creating default-factories (since we create the default-factories for them), they are much more likely to be affected by this bug than the average Hydra user, I suspect.

Thus it is pressing that we address this on the hydra-zen end either by: hard-pinning the minimum version of omegaconf to require the patched version, or, "detecting" this scenario ourselves and use a workaround. E.g., @jgbos pointed out that we can attempt to detect and remedy this issue for users by effectively doing the following "under the hood":

B = make_config(x=builds(list, [1, 2]), bases=(A,))

this remedies the issue :

>>> A = make_config(x=1)
>>> B = make_config(x=builds(list, [1, 2]), bases=(A,))  # <- make_config(x=[1, 2], bases=(A,)) will be replaced by this
>>> instantiate(B)
{'x': [1, 2]}

Merging and moving hydra_run and hydra_multirun

@jgbos I agree that we should get rid of hydra_zen.experimental. A couple of thoughts.

  1. Is there any reason why we can't simply merge hydra_run and hydra_multirun into one function: hydra_run? They have identical interfaces. We can simply expose the option hydra_run(..., multirun=True); this would have a similar feel to using the --multirun CLI option.
  2. Regardless of where we actually put this in the code base, I propose that we expose this at the top-level of hydra-zen. E.g. imports become from hydra_zen import hydra_run

We could pretty easily make this change and deprecate all imports from hydra_zen.experimental.

Part of me wonders if we are making too many now-this-is-deprecated changes in one release. That being said, if I am going to go update my old hydra-zen code, I don't want to have to go back a month later and make more changes.

Thoughts?

Change signature of builds to not occupy name "target"

Our signature for builds, builds(target, ...), occupies the name target such that the following function is impossible to use with builds:

def f(*, target): return target

builds(f, target=1)  # TypeError: builds() got multiple values for argument 'target'
builds(f, 1)  # TypeError: f takes 0 positional args, but 1 were specified via `builds`

I...definitely should have seen this coming. The ideal fix for this would be to make target be a positional-only argument, but we have to wait until Python 3.8 until we can do that.

In the meantime, we can opt to rename this in the fashion of the other names that are reserved by builds. I.e. call it hydra_target.

I have never encountered anyone using builds(target=...) explicitly, but I would still like to handle this as responsibly/gracefully as is reasonable. I'll try to come up with a plan for raising a deprecation warning for cases where people used target as a named argument.

Add support for *args in `builds`

It looks like omegaconf supports positional args in structured configs now:

def f(*args): return list(args)

@dataclass
class C:
    _target_: str = '__main__.f'
    _args_: Any = (1, 2, 3)

>>> instantiate(C)
[1, 2, 3]

Time to add *args to builds! 🙌

Design Question: Omit default hydra-fields from `builds`?

Currently, builds populates all hydra-relevant fields in the dataclass even if those fields are the default values that hydra would infer if they were missing.

Potential Change / Pros of this

Our current behavior

>>> print(to_yaml(builds(dict)))
_target_: builtins.dict
_recursive_: true
_convert_: none

We could omit _recursive_ and _convert_ when these values are the defaults (and thus implicitly rely on hydra's default behavior), and thus end up with a cleaner yaml.

>>> print(to_yaml(builds(dict)))
_target_: builtins.dict

This matters more at-scale.

ExpConfig = builds(
    gradient_descent,
    optim=builds(SGD, lr=0.3, momentum=0.0, hydra_partial=True),
    landscape_fn=just(parabaloid),
    starting_xy=(-1.5, 0.5),
    num_steps=20,
)
print(to_yaml(ExpConfig))
_target_: __main__.gradient_descent
_recursive_: true
_convert_: none
optim:
  _target_: hydra_zen.funcs.partial
  _partial_target_:
    _target_: hydra_zen.funcs.get_obj
    path: torch.optim.sgd.SGD
  _recursive_: true
  _convert_: none
  lr: 0.3
  momentum: 0.0
landscape_fn:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.parabaloid
starting_xy:
- -1.5
- 0.5
num_steps: 20

could become..

_target_: __main__.gradient_descent
optim:
  _target_: hydra_zen.funcs.partial
  _partial_target_:
    _target_: hydra_zen.funcs.get_obj
    path: torch.optim.sgd.SGD
  lr: 0.3
  momentum: 0.0
landscape_fn:
  _target_: hydra_zen.funcs.get_obj
  path: __main__.parabaloid
starting_xy:
- -1.5
- 0.5
num_steps: 20

which is tidier.

We would potentially implement the signature with None as the defaults builds(..., hydra_recursive=None, hydra_convert=None). Here, fields are only omitted if they are None; any user-specified value, even if it is the default, would be included. This enables users to force these fields to appear in the dataclass (although it would be somewhat cumbersome to do so..)

Cons

  • The yamls are less explicit. Users need to know what hydra's default behaviors are. Although I do wonder how much users actually need to know about / adjust these at the yaml level
  • The aforementioned discussion of using None as the default values makes our documentation and behavior more complicated. We have to make None ultimately default to whatever the default hydra behavior is... which presumably is only available via "word of mouth" (i.e. docs). And we have to explain to the user why hydra_recursive=None exhibits the same behavior of hydra_recursive=True - i.e. answer the questions "well why isn't the default just True then?"
  • Omitting these fields make inheritance via builds_bases a bit more complicated; we have to fill in these fields, even if they are the default values, if any of the parents have these fields parents. Otherwise we would mistakenly inherit that parent's fields.

Ultimately what this all comes down to: how much do we care about making yamls more succinct?

To-do: add support for jax's obfuscated paths

For example: jax.numpy.add has the "discoverable" path jax._src.numpy.lax_numpy._maybe_bool_binop.<locals>.fn(x1, x2), which means that it is not compatible with just et al. We can patch a fix for this like we do for NumPy

Improving our internal code organization for handling cross-version compatibility

Issues #172 , #181, and #182 all pertain to fixes/capabilities being introduced to omegaconf/Hydra that we already support. Thus we need to adaptively defer to upstream support for these capabilities when users have sufficiently-new version of omageconf/Hydra installed.

Currently, we don't really have any tooling for making this process particularly centralized or simple. It would be good to have any fields, functions, and other objects that need be set dynamically based on omegaconf/Hydra versions all live in a common place.

Additionally, we should improve our CI to test against master branches of Hydra and omegaconf, since those projects don't seem to be pushing dev versions to pypi lately.

New feature: builds for structured conf without target

Pitch

There have been times where I want to use builds to make a structured config that doesn't have a target. There may be a relatively clean way to enable this, namely by providing a NOTHING (would like ideas for potentially better names) object, so that:

from hydra_zen import NOTHING, builds
Conf = builds(NOTHING, a=1, b="hi")

is equivalent to

@dataclass
class Conf:
    a : Any = 1
    b : Any = "hi"
>>> print(to_yaml(conf))  # look, Ma, no _target_!
a: 1
b: hi

Note: Definitely check out the "Alternatives" section; now that I have written it out, it feels like it might be preferable

Motivation

Brevity / Leveraging Power of builds

Not only is this more succinct (it is really nice to not have to specify annotations when you don't care about them), but it enables us to leverage the nice automation built into builds, e.g. it takes care of wrapping mutable values with default-factories. Thus things get to be even more succinct!

@dataclass
class Config:
    defaults: List[Any] = field(default_factory=lambda: list([dict(db="mysql")]))
    db : Any = MISSING

becomes

Conf = builds(NOTHING, defaults=[dict(db="mysql")], db=MISSING)

It's Newbie-friendly / Top-level annotations are often a lost-cause

This doesn't work, and it will likely never work

def returns_int() -> int:
    return 1

@dataclass
class Config:
    x:  int = builds(returns_int)  # raises ValidationError: Builds_returnsint is not a subclass of int

the advice from omry here (understandably) is "just use Any", which is precisely what builds(NOTHING, x=builds(returns_int)) will do. Thus new users who go this route would not even need to know that they are avoiding this type-validation situation, whereas writing a dataclass explicitly will naturally have them thinking "well what is the right annotation here, anyway"?

Seeing this in action

from hydra.conf import ConfigStore
from hydra_zen import MISSING
from hydra_zen.experimental import hydra_run

Conf = builds(NOTHING, defaults=[dict(db="mysql")], db=MISSING)

cs = ConfigStore.instance()
cs.store(group="db", name="mysql", node=builds(dict, a=1))
cs.store(group="db", name="postgresql", node=builds(dict, a=2))
>>> hydra_run(conf, instantiate).return_value
{'db': {'a': 1}}
>>> hydra_run(conf, instantiate, overrides=["db=postgresql"]).return_value
{'db': {'a': 2}}

Implementation details

Invoking zen-processing on NOTHING will always raise. E.g.

>>> builds(NOTHING, a=2, hydra_partial=True)  # raises
>>> builds(NOTHING, a=2, hydra_meta=dict(b=2))  # raises

Specifying positional arguments is not allowed

>>> builds(NOTHING, "hi")  # raises

In terms of type-annotations, this special case will be overloaded as:

builds(NOTHING)  # type: Type[Builds[Type[DictConfig]]]

since

>>> type(instantiate(builds(NOTHING)))
omegaconf.dictconfig.DictConfig

Alternatives

Another option would be providing a new function,

def make_config(
    *,
    hydra_recursive=None,
    hydra_convert=None,
    frozen=False,
    dataclass_name=None,
    config_bases=(),
    **fields
):
    ...

which would basically call builds(NOTHING, ...) under the hood. The benefits of this are

  • People don't need to worry about passing around NOTHING; config(a=1, b=2) is cleaner than builds(NOTHING, a=1, b=2)
  • We can exclusively expose the hydra options that are valid (e.g. they don't even have the option of using hydra_partial)
  • builds's docstring doesn't suffer the burden of having to document the NOTHING use case.
  • This does give us the potential for exposing non-targeted-conf-specific options, e.g. controlling default-list composition order without cluttering builds

Feedback

  • Is the motivation for this substantial enough that this feature is warranted?
  • If we go with make_config, is that a good name? Is it distinct enough from builds?
  • Are there any opportunities or downsides that I am missing here?

PEP 561 compatibility

Hi,

Would it be possible to make hydra-zen compliant with PEP 561 by distributing a py.typed file with the package?

Currently I'm getting Skipping analyzing "hydra_zen": found module but no type hints or library stubs when I run mypy on a test file. Here are steps to reproduce this error:

$ pip install hydra-zen mypy
...
Successfully installed PyYAML-5.4.1 antlr4-python3-runtime-4.8 hydra-core-1.1.1 hydra-zen-0.2.0 mypy-0.910 mypy-extensions-0.4.3 omegaconf-2.1.1 toml-0.10.2 typing-extensions-3.10.0.2
...
$ echo "from hydra_zen import builds" > tmp.py
$ mypy tmp.py
tmp.py:1: error: Skipping analyzing "hydra_zen": found module but no type hints or library stubs
tmp.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 1 source file)

I believe that adding an empty py.typed file to the src/hydra_zen directory (and modifying setup.py so that the py.typed file is distributed with the hydra-zen package) would make it possible for type checkers following PEP 561 to discover the type hints in src.
(I'd be happy to submit a PR to this effect.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.