gfnorg / torchgfn Goto Github PK

View Code? Open in Web Editor NEW

191.0 191.0 23.0 6.17 MB

GFlowNet library

Home Page: https://torchgfn.readthedocs.io/en/latest/

License: Other

Python 11.23% Jupyter Notebook 88.77%

gflownets pytorch

torchgfn's People

Contributors

Stargazers

Watchers

Forkers

tristandeleu pablo-lemos tianyu-z vincent-quirion jiafengttang josephrrb jcathalina lbn187 aanjaa ermiaetemadi hubayirp marpaia saleml thematrixmaster michaelrizvi cameronraysmith kiristern

torchgfn's Issues

Reproduce results on Box environment

Utility for seamless optimizer param groups creation

In train_hypergrid.py, for example

    # 3. Create the optimizer
    params = [
        {
            "params": [
                val
                for key, val in parametrization.parameters.items()
                if "logZ" not in key
            ],
            "lr": args.lr,
        }
    ]
    if "logZ.logZ" in parametrization.parameters:
        params.append(
            {
                "params": [parametrization.parameters["logZ.logZ"]],
                "lr": args.lr_Z,
            }
        )

should ideally be a one liner, using a utility function.

More abstract classes for a broader class of tasks

The current design of the library is too specific to environments with discrete actions that can be masked out in a straightforward way.

Following a discussion with @EricElmoznino, the library would be much more flexible if there is a more abstract set of classes and methods. While this might require significant changes in how the states and environments and linked, and how that is used for sampling trajectories and evaluating losses, the core components of the library (sampling, loss evaluation) are already written.

Suggestions made by @EricElmoznino:

The current ActionsSampler's are too restrictive: for instance, in environments with continuous action spaces, the action could correspond to a vector of floats, and the distribution over actions might be something like a tuple of the mean and standard deviation over actions predicted by the forward policy. Another example is when your states are graphs and you want to perform some kind of graph surgery. Your actions might be something like "remove the node at a particular location" or "join a set of nodes into a clique", where you would want more complex data structures to represent these actions and the distributions over them. Hence:

We should have a more abstract Action class, where :
The Env.step() method could then take Action instance as argument.
The ActionSampler.sample() method could output (log_prob, action) tuples, where log_prob is the log probability of the sampled action. This would be very similar to the current behaviour, except that the actions are no longer constrained to be integer tensors (they would be instances of the Action class).
The FunctionEstimator.__call__() method could output arbitrary datatype(s) rather than just tensors. If we wanted to be very verbose, we could also have the forward and backward policy subclasses output something like an ActionDistribution subclass, which different ActionSampler subclasses would take as input in their .sample() methods.
For instance, we could have a CategoricalActionDistribution class. A DiscreteLogitPFEstimator subclass that outputs a CategoricalActionDistribution, and a DiscreteActionSampler that requires an instance of DiscreteLogitPFEstimator at initialization. As another example, we could have a GaussianActionDistribution class (essentially consisting of mean and variance), a GaussianLogitPFEstimator subclass that outputs a GaussianActionDistribution, and a GaussianActionSampler that requires an instance of GaussianLogitPFEstimator at initialization.
You could imagine extending this for your particular project that uses an idiosyncratic action space and distribution.
For now, masks of the type actions != -1 or actions == env.n_actions - 1 are used in multiple places (notably during trajectory sampling). These would obviously need to be changed to something like actions.is_dummy() and actions.is_exit() where the abstract Actions class somehow defines what’s the dummy action (that’s appended to short trajectories), and what’s the exit action (i.e. actions corresponding to s -> s_f transitions)
Having a more abstract PFEstimator class with an abstract get_actions_distribution, which should be used for ActionsSamplers and TrajectoriesSamplers. The current estimators can be subclasses of PFEstimator that would work well for simple discrete environment like HyperGrid, but the user would have the ability to write their own PFEstimator that has access to the user-defined environment.

Currently, FunctionEstimator.__call__(states: States) -> OutputTensor passes the states through an instance of Preprocessor in order to transform the abstract state into something that can be processed by a GFNModule (often a neural net). However, the output of Preprocessor and the input to GFNModule must be a single, fixed-sized tensor.
This is too restrictive. In general, a model might take in more complex data structures. For instance, if the state represents a graph, the GFNModule might want to take in a tuple of (node_attributes: Tensor, adjacency_matrix: Tensor). The used should be able to use arguments that are not tensors. This would also be more consistent with how generic nn.Module's work, where they can take in arbitrary arguments. Hence:

The Preprocessor should output a tuple of arbitrary datatypes and then we pass *the_tuple to the GFNModule.

Going forward, we could have a separate branch on which such features are implemented and tested (on the current environments at least), and this issue update as we discover what other significant changes would be needed.

Add a Molecule Environment

Goal is to ensure that the level of specification needed to implement it is satisfying (i.e. we only need to code what's really specific to the environment, rather than boilerplate code that could have been abstracted away in the changes above).

Better tests

So far, with I just put under pytest some code I used to test the features as I was implementing them. It's filled with prints. Better tests would include assert statements for example (or errors...)

Do a second test to ensure CI is functioning properly.

trajectories_to_training_samples as part of the parametrization

          Another way would be to have

class Parametrization(...):
    ...

	@abstractmethod
    def trajectories_to_training_samples(self, trajectories):
        pass

class FMParametrization(...):
    ...

	def trajectories_to_training_samples(self, trajectories):
        return trajectories.to_non_initial_intermediary_and_terminating_states()

That's more OOP but lacks centralization. Food for thoughts

Originally posted by @vict0rsch in #56 (comment)

Function to revert backward trajectories

In previous versions of the code, when actions were integers, we had this function that reverts backward trajectories. It's not used as part of the codebase, but I remember using it for another project (probably GFN vs HVI). I just removed it (in an upcoming PR), and it would be nice to fix it and have it back

    @staticmethod
    def revert_backward_trajectories(trajectories: Trajectories) -> Trajectories:
        """Reverses a trajectory, but not compatible with continuous GFN. Remove."""
        # TODO: this isn't used anywhere - it doesn't work as it assumes that the
        # actions are ints. Do we need it?
        assert trajectories.is_backward
        new_actions = torch.full_like(trajectories.actions, -1)
        new_actions = torch.cat(
            [new_actions, torch.full((1, len(trajectories)), -1)], dim=0
        )

        # env.sf should never be None unless something went wrong during class
        # instantiation.
        if trajectories.env.sf is None:
            raise AttributeError(
                "Something went wrong during the instantiation of environment {}".format(
                    trajectories.env
                )
            )

        new_states = trajectories.env.sf.repeat(
            trajectories.when_is_done.max() + 1, len(trajectories), 1
        )
        new_when_is_done = trajectories.when_is_done + 1

        for i in range(len(trajectories)):
            new_actions[trajectories.when_is_done[i], i] = (
                trajectories.env.n_actions - 1
            )

            new_actions[: trajectories.when_is_done[i], i] = trajectories.actions[
                : trajectories.when_is_done[i], i
            ].flip(0)

            new_states[
                : trajectories.when_is_done[i] + 1, i
            ] = trajectories.states.tensor[: trajectories.when_is_done[i] + 1, i].flip(
                0
            )

        new_states = trajectories.env.States(new_states)

        return Trajectories(
            env=trajectories.env,
            states=new_states,
            actions=new_actions,
            log_probs=trajectories.log_probs,
            when_is_done=new_when_is_done,
            is_backward=False,
        )

Get rid of the need to write `PFEstimator` and `PBEstimator` for each continuous environment

Following the discussion #36 (comment), remove estimators, and just have nn.Modules that have to_probability_distribution function, or even remove both parametrization and estimators, and use one class as in the following pseudocode (to be completed):

class GFN:
    pf: nn.module
    pb: nn.module
    z: nn.module

    def loss(self, trajectories):
        return blah

    def to_dist(self, States):
        tensor = self.pf(States)
        a = torch.sigmoid(tensor[:, 0])
        b = torch.tanh(tensor([:, 1])
        return self.distribution(a, b)

Add baseline - Dynamic Programming algorithm

With tabular representations used for LogEdgeFlow Parametrization, a dynamic programming algorithm exists, and can be used as a baseline to compare the edge flows obtained by other algorithms to. Implement that !

simplify `check_output_dim`

After the fix for #77, write the boilerplate code for check_output_dim in GFNModule, and the abstract method would just be def required_output_dim(self) -> int ?

Add molecules environment

Implement the molecules environment following the current API. More functions might be needed !

Clear out TODOs in codebase.

Remove configs

Make train_discreteebm.py as a v0 script, without much options.
Remove train.py and configs/

Simplifying parameterizations

OK so I think we can de-complexify dramatically the Parameterization - Estimator framework.

Right now, Parameterizations (which are tied to specific losses) accept multiple Estimators, which in turn might contain nn.Modules or torch.Tensors.

This is tricky for a few reasons. First we need to do a tonne of bookkeeping when we are doing things like saving/loading parameters, or calling Paramaterization.named_parameters() - in order to actually return the correct dict of parameters, the Paramaterization and Estimator classes need to handle everything correctly and store the correct dict manually. This should not be required.

This is also tricky when using optimizers, because you need to feed the right subset of parameters to the optimizer. Sometimes there's a trainable logz parameter, but it won't be in named_parameters(), and you need to point the optimizer at it explicitly. If you don't do this the model won't learn well but you will never know why without serious digging.

I would propose that I re-write these classes using inheritance and mixin the nn.Module class directly such that pytorch features like parameters() and save() just work. The reason I'm concerned here is this is an excellent vector through which to build in a silent bug. In fact I think it's highly likely this will happen when we start getting external users.

The expected behavior is that feeding Parameterization.parameters() to Adam would just work. Or saving would just work - everything would get saved and loaded and without any concern that the user must handle edge cases or other logic manually.

Thoughts?

Improve how the parameters of a parametrization are obtained

Each parametrization has its own parameters. The parameters are eventually passed to the optimizer. The base parametrization class defines what counts as parameters. This is super ugly and depends on the class of self. A more elegant and less error-prone alternative is to define the parameters as the union of the estimators (that define the parametrization)'s parameters. Each estimator contains either a module or a tensor.
An idea would be to make the logZ estimator's tensor a module (that inherits from GFNModule), and make an abstract property (called gfn_parameters) that each module needs to implement. Those of Uniform would be empty for example. The nn.Modules can call self.named_parameters() to implement gfn_parameters. Inheritance there should be revisited. Tabular shouldn't need to inherit from nn.Module.

Beautiful Website Etc

We should take inspiration from these folks

https://torchprotein.ai/

I can touch base to see how they built all of this stuff. Very pro.

Avoid the double forward pass for online learning (TB and DB)

A design choice was to separate the sampling of the learning objects (trajectories or transitions) and the actual learning, to allow off-policy learning. This means that two forward passes are required when doing online learning: the first pass to sample the actions to get to the next states, and the second pass to get the logits of the chosen actions. If a training object is discarded as soon as it is used for learning (i.e. online, i.e. no replay buffer), then this is inefficient, as the logits can be evaluated during the first pass. What do you think is the best way to store the "online logits" in the containers to avoid a second pass when these online logits are available ?

Add helper methods `policy_parameters()` and `logz_parameters()` helper functions.

module.parameters() will by default return both of these as a single list / dict, but it is common to give a unique learning rate to the logZ parameter (for example).

So we should provide the user with policy_parameters() and logz_parameters() (at least).

Incorporate BoxDistWrapper in `test_parametrizations_and_losses.py`

Add a test for Trajectories extend function

As was done in #79 for the master branch:

def test_extend_trajectories_on_cuda():
    import os
    import sys

    sys.path.insert(0, os.path.abspath("__file__" + "/../"))

    from src.gfn.containers.trajectories import Trajectories as Traj

    torch.manual_seed(0)

    env = HyperGrid(ndim=4, height=8, R0=0.01, device_str="cuda")
    sampler = TrajectoriesSampler(
        env=env,
        actions_sampler=DiscreteActionsSampler(
            estimator=LogitPFEstimator(env=env, module_name="NeuralNet"),
        ),
    )

    trajectories_1 = sampler.sample(n_trajectories=10)
    trajectories_2 = sampler.sample(n_trajectories=10)

    trajectories_1 = Traj(
        env=sampler.env,
        states=trajectories_1.states,
        actions=trajectories_1.actions,
        when_is_done=trajectories_1.when_is_done,
        is_backward=sampler.is_backward,
        log_rewards=trajectories_1.log_rewards,
        log_probs=trajectories_1.log_probs,
    )
    trajectories_2 = Traj(
        env=sampler.env,
        states=trajectories_2.states,
        actions=trajectories_2.actions,
        when_is_done=trajectories_2.when_is_done,
        is_backward=sampler.is_backward,
        log_rewards=trajectories_2.log_rewards,
        log_probs=trajectories_2.log_probs,
    )

    trajectories_1.extend(trajectories_2)

improve box_utils.py file

Copied from #66

Simplify the code. I don't think we need the BoxPFEstimator and BoxPFNeuralNet to be two distinct classes. It would be easier if all the logical existed in the BoxPFEstimator. But we can debate this.
Ensure the Tabular case works as intended.
Put the tests in a proper place. But where?

update README according to new structure

Review and complete all docstrings.

Move the environments outside of the source code

Add tests for Modified DB loss

Now that ModifiedDB is its own class, we might want to add tests for it, as the other losses, and even incorporate it into the scripts.

Use coherent imports, decide what should be a package, what should be a module, etc...

For now, from gfn.xxx import zzz, from gfn.xxx.yyy import zzz, and from ..xxx.yyy import zzz are used carelessly. There should ideally be some coherency here.

Why is te torso neccessary to learn anything

Hey,

I was running the example code to solve for the 2-dimensional Hypergrid, and I noticed that if I don't share parameters between the forward and backward model:
logit_PB = LogitPBEstimator(env=env, module_name='NeuralNet', torso=None)
the agent doesn't learn anything. Is this expected?

Think of a better way to include environment utilitaries

There is a utils.py file in envs that implements HyperGrid specific stuff. In gfn/utils.py, there are again HyperGrid specific functions (validation functions mainly). This won't scale to an arbitrary amount of environments, and ideally the validation stuff should be environment agnostic. What's the best way to go about this ?

Improve how parameters of a parametrization are defined

This stores redundant parameters twice (e.g. when P_F and P_B share parameters), and torch optimizers do not like that (throwing warnings).

Let's find a way to improve it !?

v1 release

After #57 #58 #59 #60 #33 #68

Remove TrajectoriesSampler class

and make sample_trajectories a method of Sampler, which would be the new ActionsSampler, requiring only a GFNModule that implements the to_probability_distribution method.
After #77 ! Should be close enough to the example provided in the README of #84.

Thoughts @vict0rsch @josephdviviano ?

Implement the Flow Matching loss

Then try it on the HyperGrid environment (make the loss accessible via the configs for train.py). Make sure to reproduce the results reported in Trajectory Balance: Improved Credit Assignment in GFlowNets.

Make a separate preprocessor for Tabular representations

For now, Tabular representations require the Identity preprocessor. The preprocessed states (which are raw tensors, given that they are the output of the Identity preprocessor) are passed back to the environment to create States objects, on which get_states_indices is called. This is highly inefficient, and the code could benefit from having a TabularPreprocessor that calls get_states_indices under the hood. The output of TabularPreprocessor would thus be indices that can readily be used for any lookup table.

Need easy way to install example-only dependencies

a fresh install of torchgfn will not let me run python train_box.py because I'm missing at least one dependency. I'll add it to the poetry config.

Finish BoxDistWrapper

Ensure that external users who are relying on this library retain all functionality

For example - sampling with temperature.

[Help Wanted for Bug] NonValidActionsError when training HyperGrid environment

Hello everyone,

Thank you for making this repo. My colleagues and I are applying GFlowNets on a hypergrid environment using our own reward distribution. However upon training, we get the NonValidActionsError. I was able to reproduce the error using the built-in hypergrid environment but I can't isolate it. I can also provide the jupyter notebook where I got the error, if needed. The code to reproduce it is below:

import torch

from gfn.envs import HyperGrid
from gfn import LogitPBEstimator, LogitPFEstimator, LogZEstimator
from gfn.losses import TBParametrization, TrajectoryBalance
from gfn.samplers import DiscreteActionsSampler, TrajectoriesSampler


torch.manual_seed(0)

exploration_rate = 0.5
learning_rate=0.0005

env = HyperGrid(ndim=5, height=2)

logit_PF = LogitPFEstimator(
    env=env,
    module_name="NeuralNet",
)
logit_PB = LogitPBEstimator(
    env=env,
    module_name="NeuralNet",
    torso=logit_PF.module.torso,
)
logZ = LogZEstimator(torch.tensor(0.0))

training_sampler = TrajectoriesSampler(
    env=env,
    actions_sampler=DiscreteActionsSampler(
        estimator=logit_PF,
        epsilon=exploration_rate
    )
)

parametrization = TBParametrization(logit_PF, logit_PB, logZ)
loss_fn = TrajectoryBalance(
    parametrization=parametrization,
)

params = [
    {
        "params": [
            val for key, val in parametrization.parameters.items() if "logZ" not in key
        ],
        "lr": learning_rate,
    },
    {"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
]
optimizer = torch.optim.Adam(params=params)

from tqdm import tqdm


n_iterations = int(1e4)
batch_size = int(1e5)

for i in (pbar := tqdm(range(n_iterations))):
    trajectories = training_sampler.sample(
        n_trajectories=batch_size
    )

    optimizer.zero_grad()
    loss = loss_fn(trajectories)
    loss.backward()
    optimizer.step()

    if i % int(1e4) == 0:
        pbar.set_postfix({"loss": loss.item()})

I followed the instructions to install torchgfn and I got the torch version 2.0.1+cu117.

The DiscreteActionsSampler seems produce an invalid action at a random time when running the whole code but I can't see the error when I'm using a debugger to run the code line by line at the appropriate iteration.

Please let us know what can be done to resolve this.

Thank you very much and have a nice day!

Explore renaming `maskless_step` into `step` and `step` into `maskless_step`.

[Help Wanted] Incompatible log_probs shape when combining trajectories

Hello again everyone,

We are trying to augment our training with backward trajectories starting from states sampled from a reward-prioritized replay buffer, as described here: https://arxiv.org/abs/2305.07170. I used Trajectories.revert_backward_trajectories() to transform the backward trajectories into forward ones. But attempting to combine them with the forward sampled trajectories causes an error. Specifically, the code below reproduces the error:

torch.manual_seed(0)

env = HyperGrid(ndim=2, height=3, R0=0.01)
logit_PF = LogitPFEstimator(env=env, module_name="NeuralNet")
logit_PB = LogitPBEstimator(
    env=env, module_name="NeuralNet", torso=logit_PF.module.torso,
)
forward_sampler = TrajectoriesSampler(
    env=env, actions_sampler=DiscreteActionsSampler(estimator=logit_PF)
)
backward_sampler = TrajectoriesSampler(
    env=env, actions_sampler=BackwardDiscreteActionsSampler(estimator=logit_PB)
)

trajectories = forward_sampler.sample(n_trajectories=4)
states = env.reset(batch_shape=4, random=True) # Would come from replay buffer
backward_trajectories = backward_sampler.sample_trajectories(states)
offline_trajectories = Trajectories.revert_backward_trajectories(backward_trajectories)

trajectories.extend(offline_trajectories)

The error is:

    def extend(self, other: Trajectories) -> None:
        """Extend the trajectories with another set of trajectories."""
        self.extend_actions(required_first_dim=max(self.max_length, other.max_length))
        other.extend_actions(required_first_dim=max(self.max_length, other.max_length))
    
        self.states.extend(other.states)
        self.actions = torch.cat((self.actions, other.actions), dim=1)
        self.when_is_done = torch.cat((self.when_is_done, other.when_is_done), dim=0)
>       self.log_probs = torch.cat((self.log_probs, other.log_probs), dim=1)
E       RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 5 but got size 4 for tensor number 1 in the list.

Inserting the code

offline_trajectories.log_probs = torch.cat([
    offline_trajectories.log_probs,
    torch.full(
        size=(
            1,
            offline_trajectories.n_trajectories,
        ),
        fill_value=0,
        dtype=torch.float,
    )
], dim=0)

before trajectories.extend(offline_trajectories) seems to work but I don't know if there will be unexpected behavior downstream. It seems that log_probs needs to be padded after Trajectories.revert_backward_trajectories(). I would appreciate your insight.

Possibly a bit off topic, would it be better to sample forward trajectories stored in ReplayBuffer instead? I would just need to sort the trajectories according to the rewards of the terminating states?

Thank you very much for your time!

Use `torch.nested.nested_tensor` when it's not a prototype anymore

https://pytorch.org/docs/stable/nested.html#module-torch.nested

log_probs in trajectories getitem

Direct fix, don't overthink it:
Change
log_probs = self.log_probs[:, index][:new_max_length]
with

if self.log_probs.shape != (0, 0):
    log_probs = self.log_probs[:, index]
    log_probs = log_probs[:new_max_length]
else:
    log_probs = self.log_probs

improve `setup.py`

and get rid of requirements.txt

Simplify configs

1 yaml per namespace not many small files in folders (to master)

logZ_logZ

let's fix this, and think of better ways of saving/loading parameterization's state dicts
In the latest branch: from https://github.com/saleml/torchgfn/blob/master/src/gfn/losses/base.py#L50 to the end of the class

Add docstrings where needed

Refactor of the loss into a model class

Basically, the current way of interacting with the parameterization is a bit confusing to explain to others. We have parameters, which produce predictions, in conjunction with trajectories which are sampled, and this produces a loss. But when we call the loss function, we never see the parameters. We propose here to define a model with parameterizations, which has a model.loss() method, which accepts trajectories, to produce a loss.

Simmply:

# Define a model and a sampler.
tb_model = TBParametrization(logit_PF, logit_PB, logZ)
trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)

# Get some trajectories and produce a loss.
trajectories = trajectories_sampler.sample(n_trajectories=16)
loss = tb_model.loss(trajectories)

More concretely,

    env = HyperGrid(ndim=4, height=8, R0=0.01)  # Grid of size 8x8x8x8

    module_PF = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions)
    module_PB = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions - 1, torso=module_PF.torso)

    logit_PF = DiscretePFEstimator(env=env, module=module_PF)
    logit_PB = DiscretePBEstimator(env=env, module=module_PB)
    logZ = LogZEstimator(torch.tensor(0.0))

    # In another PR let's consider merging these (`trajectory_sampler` becomes a 
    # method of `actions_sampler`).
    actions_sampler = ActionsSampler(estimator=logit_PF)
    trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)

    # To remove.
    #parametrization = TBParametrization(logit_PF, logit_PB, logZ)
    #loss_fn = TrajectoryBalance(parametrization=parametrization)
    
    # Replaced with.
    tb_model = TBParametrization(logit_PF, logit_PB, logZ)

    params = [
        {
            "params": [
                val for key, val in parametrization.parameters.items() if "logZ" not in key
            ],
            "lr": 0.001,
        },
        {"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
    ]
    optimizer = torch.optim.Adam(params)

    for i in (pbar := tqdm(range(1000))):
        trajectories = trajectories_sampler.sample(n_trajectories=16)
        optimizer.zero_grad()

        # loss = loss_fn(trajectories)  # replaced with
        loss = tb_model.loss(trajectories)

        loss.backward()
        optimizer.step()
        if i % 25 == 0:
            pbar.set_postfix({"loss": loss.item()})

This is largely a renaming of the currently available abstraction made to make the code easier to grok to newcomers.

gfnorg / torchgfn Goto Github PK

torchgfn's People

Contributors

Stargazers

Watchers

Forkers

torchgfn's Issues

Suggestions made by @EricElmoznino:

Recommend Projects

Recommend Topics

Recommend Org

Jobs