GithubHelp home page GithubHelp logo

gfnorg / torchgfn Goto Github PK

View Code? Open in Web Editor NEW
191.0 191.0 23.0 6.17 MB

GFlowNet library

Home Page: https://torchgfn.readthedocs.io/en/latest/

License: Other

Python 11.23% Jupyter Notebook 88.77%
gflownets pytorch

torchgfn's People

Contributors

aanjaa avatar josephdviviano avatar josephrrb avatar marpaia avatar saleml avatar vict0rsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchgfn's Issues

Utility for seamless optimizer param groups creation

In train_hypergrid.py, for example

    # 3. Create the optimizer
    params = [
        {
            "params": [
                val
                for key, val in parametrization.parameters.items()
                if "logZ" not in key
            ],
            "lr": args.lr,
        }
    ]
    if "logZ.logZ" in parametrization.parameters:
        params.append(
            {
                "params": [parametrization.parameters["logZ.logZ"]],
                "lr": args.lr_Z,
            }
        )

should ideally be a one liner, using a utility function.

More abstract classes for a broader class of tasks

The current design of the library is too specific to environments with discrete actions that can be masked out in a straightforward way.

Following a discussion with @EricElmoznino, the library would be much more flexible if there is a more abstract set of classes and methods. While this might require significant changes in how the states and environments and linked, and how that is used for sampling trajectories and evaluating losses, the core components of the library (sampling, loss evaluation) are already written.

Suggestions made by @EricElmoznino:

The current ActionsSampler's are too restrictive: for instance, in environments with continuous action spaces, the action could correspond to a vector of floats, and the distribution over actions might be something like a tuple of the mean and standard deviation over actions predicted by the forward policy. Another example is when your states are graphs and you want to perform some kind of graph surgery. Your actions might be something like "remove the node at a particular location" or "join a set of nodes into a clique", where you would want more complex data structures to represent these actions and the distributions over them. Hence:

  • We should have a more abstract Action class, where :

  • The Env.step() method could then take Action instance as argument.

  • The ActionSampler.sample() method could output (log_prob, action) tuples, where log_prob is the log probability of the sampled action. This would be very similar to the current behaviour, except that the actions are no longer constrained to be integer tensors (they would be instances of the Action class).

  • The FunctionEstimator.__call__() method could output arbitrary datatype(s) rather than just tensors. If we wanted to be very verbose, we could also have the forward and backward policy subclasses output something like an ActionDistribution subclass, which different ActionSampler subclasses would take as input in their .sample() methods.
    For instance, we could have a CategoricalActionDistribution class. A DiscreteLogitPFEstimator subclass that outputs a CategoricalActionDistribution, and a DiscreteActionSampler that requires an instance of DiscreteLogitPFEstimator at initialization. As another example, we could have a GaussianActionDistribution class (essentially consisting of mean and variance), a GaussianLogitPFEstimator subclass that outputs a GaussianActionDistribution, and a GaussianActionSampler that requires an instance of GaussianLogitPFEstimator at initialization.
    You could imagine extending this for your particular project that uses an idiosyncratic action space and distribution.

  • For now, masks of the type actions != -1 or actions == env.n_actions - 1 are used in multiple places (notably during trajectory sampling). These would obviously need to be changed to something like actions.is_dummy() and actions.is_exit() where the abstract Actions class somehow defines what’s the dummy action (that’s appended to short trajectories), and what’s the exit action (i.e. actions corresponding to s -> s_f transitions)

  • Having a more abstract PFEstimator class with an abstract get_actions_distribution, which should be used for ActionsSamplers and TrajectoriesSamplers. The current estimators can be subclasses of PFEstimator that would work well for simple discrete environment like HyperGrid, but the user would have the ability to write their own PFEstimator that has access to the user-defined environment.

Currently, FunctionEstimator.__call__(states: States) -> OutputTensor passes the states through an instance of Preprocessor in order to transform the abstract state into something that can be processed by a GFNModule (often a neural net). However, the output of Preprocessor and the input to GFNModule must be a single, fixed-sized tensor.
This is too restrictive. In general, a model might take in more complex data structures. For instance, if the state represents a graph, the GFNModule might want to take in a tuple of (node_attributes: Tensor, adjacency_matrix: Tensor). The used should be able to use arguments that are not tensors. This would also be more consistent with how generic nn.Module's work, where they can take in arbitrary arguments. Hence:

  • The Preprocessor should output a tuple of arbitrary datatypes and then we pass *the_tuple to the GFNModule.

Going forward, we could have a separate branch on which such features are implemented and tested (on the current environments at least), and this issue update as we discover what other significant changes would be needed.

Add a Molecule Environment

Goal is to ensure that the level of specification needed to implement it is satisfying (i.e. we only need to code what's really specific to the environment, rather than boilerplate code that could have been abstracted away in the changes above).

Better tests

So far, with I just put under pytest some code I used to test the features as I was implementing them. It's filled with prints. Better tests would include assert statements for example (or errors...)

trajectories_to_training_samples as part of the parametrization

          Another way would be to have 
class Parametrization(...):
    ...

	@abstractmethod
    def trajectories_to_training_samples(self, trajectories):
        pass

class FMParametrization(...):
    ...

	def trajectories_to_training_samples(self, trajectories):
        return trajectories.to_non_initial_intermediary_and_terminating_states()

That's more OOP but lacks centralization. Food for thoughts

Originally posted by @vict0rsch in #56 (comment)

Function to revert backward trajectories

In previous versions of the code, when actions were integers, we had this function that reverts backward trajectories. It's not used as part of the codebase, but I remember using it for another project (probably GFN vs HVI). I just removed it (in an upcoming PR), and it would be nice to fix it and have it back

    @staticmethod
    def revert_backward_trajectories(trajectories: Trajectories) -> Trajectories:
        """Reverses a trajectory, but not compatible with continuous GFN. Remove."""
        # TODO: this isn't used anywhere - it doesn't work as it assumes that the
        # actions are ints. Do we need it?
        assert trajectories.is_backward
        new_actions = torch.full_like(trajectories.actions, -1)
        new_actions = torch.cat(
            [new_actions, torch.full((1, len(trajectories)), -1)], dim=0
        )

        # env.sf should never be None unless something went wrong during class
        # instantiation.
        if trajectories.env.sf is None:
            raise AttributeError(
                "Something went wrong during the instantiation of environment {}".format(
                    trajectories.env
                )
            )

        new_states = trajectories.env.sf.repeat(
            trajectories.when_is_done.max() + 1, len(trajectories), 1
        )
        new_when_is_done = trajectories.when_is_done + 1

        for i in range(len(trajectories)):
            new_actions[trajectories.when_is_done[i], i] = (
                trajectories.env.n_actions - 1
            )

            new_actions[: trajectories.when_is_done[i], i] = trajectories.actions[
                : trajectories.when_is_done[i], i
            ].flip(0)

            new_states[
                : trajectories.when_is_done[i] + 1, i
            ] = trajectories.states.tensor[: trajectories.when_is_done[i] + 1, i].flip(
                0
            )

        new_states = trajectories.env.States(new_states)

        return Trajectories(
            env=trajectories.env,
            states=new_states,
            actions=new_actions,
            log_probs=trajectories.log_probs,
            when_is_done=new_when_is_done,
            is_backward=False,
        )

Get rid of the need to write `PFEstimator` and `PBEstimator` for each continuous environment

Following the discussion #36 (comment), remove estimators, and just have nn.Modules that have to_probability_distribution function, or even remove both parametrization and estimators, and use one class as in the following pseudocode (to be completed):

class GFN:
    pf: nn.module
    pb: nn.module
    z: nn.module

    def loss(self, trajectories):
        return blah

    def to_dist(self, States):
        tensor = self.pf(States)
        a = torch.sigmoid(tensor[:, 0])
        b = torch.tanh(tensor([:, 1])
        return self.distribution(a, b)

Add baseline - Dynamic Programming algorithm

With tabular representations used for LogEdgeFlow Parametrization, a dynamic programming algorithm exists, and can be used as a baseline to compare the edge flows obtained by other algorithms to. Implement that !

simplify `check_output_dim`

After the fix for #77, write the boilerplate code for check_output_dim in GFNModule, and the abstract method would just be def required_output_dim(self) -> int ?

Remove configs

  • Make train_discreteebm.py as a v0 script, without much options.
  • Remove train.py and configs/

Simplifying parameterizations

OK so I think we can de-complexify dramatically the Parameterization - Estimator framework.

Right now, Parameterizations (which are tied to specific losses) accept multiple Estimators, which in turn might contain nn.Modules or torch.Tensors.

This is tricky for a few reasons. First we need to do a tonne of bookkeeping when we are doing things like saving/loading parameters, or calling Paramaterization.named_parameters() - in order to actually return the correct dict of parameters, the Paramaterization and Estimator classes need to handle everything correctly and store the correct dict manually. This should not be required.

This is also tricky when using optimizers, because you need to feed the right subset of parameters to the optimizer. Sometimes there's a trainable logz parameter, but it won't be in named_parameters(), and you need to point the optimizer at it explicitly. If you don't do this the model won't learn well but you will never know why without serious digging.

I would propose that I re-write these classes using inheritance and mixin the nn.Module class directly such that pytorch features like parameters() and save() just work. The reason I'm concerned here is this is an excellent vector through which to build in a silent bug. In fact I think it's highly likely this will happen when we start getting external users.

The expected behavior is that feeding Parameterization.parameters() to Adam would just work. Or saving would just work - everything would get saved and loaded and without any concern that the user must handle edge cases or other logic manually.

Thoughts?

Improve how the parameters of a parametrization are obtained

Each parametrization has its own parameters. The parameters are eventually passed to the optimizer. The base parametrization class defines what counts as parameters. This is super ugly and depends on the class of self. A more elegant and less error-prone alternative is to define the parameters as the union of the estimators (that define the parametrization)'s parameters. Each estimator contains either a module or a tensor.
An idea would be to make the logZ estimator's tensor a module (that inherits from GFNModule), and make an abstract property (called gfn_parameters) that each module needs to implement. Those of Uniform would be empty for example. The nn.Modules can call self.named_parameters() to implement gfn_parameters. Inheritance there should be revisited. Tabular shouldn't need to inherit from nn.Module.

Avoid the double forward pass for online learning (TB and DB)

A design choice was to separate the sampling of the learning objects (trajectories or transitions) and the actual learning, to allow off-policy learning. This means that two forward passes are required when doing online learning: the first pass to sample the actions to get to the next states, and the second pass to get the logits of the chosen actions. If a training object is discarded as soon as it is used for learning (i.e. online, i.e. no replay buffer), then this is inefficient, as the logits can be evaluated during the first pass. What do you think is the best way to store the "online logits" in the containers to avoid a second pass when these online logits are available ?

Add a test for Trajectories extend function

As was done in #79 for the master branch:

def test_extend_trajectories_on_cuda():
    import os
    import sys

    sys.path.insert(0, os.path.abspath("__file__" + "/../"))

    from src.gfn.containers.trajectories import Trajectories as Traj

    torch.manual_seed(0)

    env = HyperGrid(ndim=4, height=8, R0=0.01, device_str="cuda")
    sampler = TrajectoriesSampler(
        env=env,
        actions_sampler=DiscreteActionsSampler(
            estimator=LogitPFEstimator(env=env, module_name="NeuralNet"),
        ),
    )

    trajectories_1 = sampler.sample(n_trajectories=10)
    trajectories_2 = sampler.sample(n_trajectories=10)

    trajectories_1 = Traj(
        env=sampler.env,
        states=trajectories_1.states,
        actions=trajectories_1.actions,
        when_is_done=trajectories_1.when_is_done,
        is_backward=sampler.is_backward,
        log_rewards=trajectories_1.log_rewards,
        log_probs=trajectories_1.log_probs,
    )
    trajectories_2 = Traj(
        env=sampler.env,
        states=trajectories_2.states,
        actions=trajectories_2.actions,
        when_is_done=trajectories_2.when_is_done,
        is_backward=sampler.is_backward,
        log_rewards=trajectories_2.log_rewards,
        log_probs=trajectories_2.log_probs,
    )

    trajectories_1.extend(trajectories_2)

improve box_utils.py file

Copied from #66

  • Simplify the code. I don't think we need the BoxPFEstimator and BoxPFNeuralNet to be two distinct classes. It would be easier if all the logical existed in the BoxPFEstimator. But we can debate this.
  • Ensure the Tabular case works as intended.
  • Put the tests in a proper place. But where?

Add tests for Modified DB loss

Now that ModifiedDB is its own class, we might want to add tests for it, as the other losses, and even incorporate it into the scripts.

Why is te torso neccessary to learn anything

Hey,

I was running the example code to solve for the 2-dimensional Hypergrid, and I noticed that if I don't share parameters between the forward and backward model:
logit_PB = LogitPBEstimator(env=env, module_name='NeuralNet', torso=None)
the agent doesn't learn anything. Is this expected?

Think of a better way to include environment utilitaries

There is a utils.py file in envs that implements HyperGrid specific stuff. In gfn/utils.py, there are again HyperGrid specific functions (validation functions mainly). This won't scale to an arbitrary amount of environments, and ideally the validation stuff should be environment agnostic. What's the best way to go about this ?

Remove TrajectoriesSampler class

and make sample_trajectories a method of Sampler, which would be the new ActionsSampler, requiring only a GFNModule that implements the to_probability_distribution method.
After #77 ! Should be close enough to the example provided in the README of #84.

Thoughts @vict0rsch @josephdviviano ?

Make a separate preprocessor for Tabular representations

For now, Tabular representations require the Identity preprocessor. The preprocessed states (which are raw tensors, given that they are the output of the Identity preprocessor) are passed back to the environment to create States objects, on which get_states_indices is called. This is highly inefficient, and the code could benefit from having a TabularPreprocessor that calls get_states_indices under the hood. The output of TabularPreprocessor would thus be indices that can readily be used for any lookup table.

[Help Wanted for Bug] NonValidActionsError when training HyperGrid environment

Hello everyone,

Thank you for making this repo. My colleagues and I are applying GFlowNets on a hypergrid environment using our own reward distribution. However upon training, we get the NonValidActionsError. I was able to reproduce the error using the built-in hypergrid environment but I can't isolate it. I can also provide the jupyter notebook where I got the error, if needed. The code to reproduce it is below:

import torch

from gfn.envs import HyperGrid
from gfn import LogitPBEstimator, LogitPFEstimator, LogZEstimator
from gfn.losses import TBParametrization, TrajectoryBalance
from gfn.samplers import DiscreteActionsSampler, TrajectoriesSampler


torch.manual_seed(0)

exploration_rate = 0.5
learning_rate=0.0005

env = HyperGrid(ndim=5, height=2)

logit_PF = LogitPFEstimator(
    env=env,
    module_name="NeuralNet",
)
logit_PB = LogitPBEstimator(
    env=env,
    module_name="NeuralNet",
    torso=logit_PF.module.torso,
)
logZ = LogZEstimator(torch.tensor(0.0))

training_sampler = TrajectoriesSampler(
    env=env,
    actions_sampler=DiscreteActionsSampler(
        estimator=logit_PF,
        epsilon=exploration_rate
    )
)

parametrization = TBParametrization(logit_PF, logit_PB, logZ)
loss_fn = TrajectoryBalance(
    parametrization=parametrization,
)

params = [
    {
        "params": [
            val for key, val in parametrization.parameters.items() if "logZ" not in key
        ],
        "lr": learning_rate,
    },
    {"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
]
optimizer = torch.optim.Adam(params=params)
from tqdm import tqdm


n_iterations = int(1e4)
batch_size = int(1e5)

for i in (pbar := tqdm(range(n_iterations))):
    trajectories = training_sampler.sample(
        n_trajectories=batch_size
    )

    optimizer.zero_grad()
    loss = loss_fn(trajectories)
    loss.backward()
    optimizer.step()

    if i % int(1e4) == 0:
        pbar.set_postfix({"loss": loss.item()})

I followed the instructions to install torchgfn and I got the torch version 2.0.1+cu117.

The DiscreteActionsSampler seems produce an invalid action at a random time when running the whole code but I can't see the error when I'm using a debugger to run the code line by line at the appropriate iteration.

Please let us know what can be done to resolve this.

Thank you very much and have a nice day!

[Help Wanted] Incompatible log_probs shape when combining trajectories

Hello again everyone,

We are trying to augment our training with backward trajectories starting from states sampled from a reward-prioritized replay buffer, as described here: https://arxiv.org/abs/2305.07170. I used Trajectories.revert_backward_trajectories() to transform the backward trajectories into forward ones. But attempting to combine them with the forward sampled trajectories causes an error. Specifically, the code below reproduces the error:

torch.manual_seed(0)

env = HyperGrid(ndim=2, height=3, R0=0.01)
logit_PF = LogitPFEstimator(env=env, module_name="NeuralNet")
logit_PB = LogitPBEstimator(
    env=env, module_name="NeuralNet", torso=logit_PF.module.torso,
)
forward_sampler = TrajectoriesSampler(
    env=env, actions_sampler=DiscreteActionsSampler(estimator=logit_PF)
)
backward_sampler = TrajectoriesSampler(
    env=env, actions_sampler=BackwardDiscreteActionsSampler(estimator=logit_PB)
)

trajectories = forward_sampler.sample(n_trajectories=4)
states = env.reset(batch_shape=4, random=True) # Would come from replay buffer
backward_trajectories = backward_sampler.sample_trajectories(states)
offline_trajectories = Trajectories.revert_backward_trajectories(backward_trajectories)

trajectories.extend(offline_trajectories)

The error is:

    def extend(self, other: Trajectories) -> None:
        """Extend the trajectories with another set of trajectories."""
        self.extend_actions(required_first_dim=max(self.max_length, other.max_length))
        other.extend_actions(required_first_dim=max(self.max_length, other.max_length))
    
        self.states.extend(other.states)
        self.actions = torch.cat((self.actions, other.actions), dim=1)
        self.when_is_done = torch.cat((self.when_is_done, other.when_is_done), dim=0)
>       self.log_probs = torch.cat((self.log_probs, other.log_probs), dim=1)
E       RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 5 but got size 4 for tensor number 1 in the list.

Inserting the code

offline_trajectories.log_probs = torch.cat([
    offline_trajectories.log_probs,
    torch.full(
        size=(
            1,
            offline_trajectories.n_trajectories,
        ),
        fill_value=0,
        dtype=torch.float,
    )
], dim=0)

before trajectories.extend(offline_trajectories) seems to work but I don't know if there will be unexpected behavior downstream. It seems that log_probs needs to be padded after Trajectories.revert_backward_trajectories(). I would appreciate your insight.

Possibly a bit off topic, would it be better to sample forward trajectories stored in ReplayBuffer instead? I would just need to sort the trajectories according to the rewards of the terminating states?

Thank you very much for your time!

log_probs in trajectories getitem

Direct fix, don't overthink it:
Change
log_probs = self.log_probs[:, index][:new_max_length]
with

if self.log_probs.shape != (0, 0):
    log_probs = self.log_probs[:, index]
    log_probs = log_probs[:new_max_length]
else:
    log_probs = self.log_probs

Simplify configs

1 yaml per namespace not many small files in folders (to master)

Refactor of the loss into a model class

Basically, the current way of interacting with the parameterization is a bit confusing to explain to others. We have parameters, which produce predictions, in conjunction with trajectories which are sampled, and this produces a loss. But when we call the loss function, we never see the parameters. We propose here to define a model with parameterizations, which has a model.loss() method, which accepts trajectories, to produce a loss.

Simmply:

# Define a model and a sampler.
tb_model = TBParametrization(logit_PF, logit_PB, logZ)
trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)

# Get some trajectories and produce a loss.
trajectories = trajectories_sampler.sample(n_trajectories=16)
loss = tb_model.loss(trajectories)

More concretely,

    env = HyperGrid(ndim=4, height=8, R0=0.01)  # Grid of size 8x8x8x8

    module_PF = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions)
    module_PB = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions - 1, torso=module_PF.torso)

    logit_PF = DiscretePFEstimator(env=env, module=module_PF)
    logit_PB = DiscretePBEstimator(env=env, module=module_PB)
    logZ = LogZEstimator(torch.tensor(0.0))

    # In another PR let's consider merging these (`trajectory_sampler` becomes a 
    # method of `actions_sampler`).
    actions_sampler = ActionsSampler(estimator=logit_PF)
    trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)

    # To remove.
    #parametrization = TBParametrization(logit_PF, logit_PB, logZ)
    #loss_fn = TrajectoryBalance(parametrization=parametrization)
    
    # Replaced with.
    tb_model = TBParametrization(logit_PF, logit_PB, logZ)

    params = [
        {
            "params": [
                val for key, val in parametrization.parameters.items() if "logZ" not in key
            ],
            "lr": 0.001,
        },
        {"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
    ]
    optimizer = torch.optim.Adam(params)

    for i in (pbar := tqdm(range(1000))):
        trajectories = trajectories_sampler.sample(n_trajectories=16)
        optimizer.zero_grad()

        # loss = loss_fn(trajectories)  # replaced with
        loss = tb_model.loss(trajectories)

        loss.backward()
        optimizer.step()
        if i % 25 == 0:
            pbar.set_postfix({"loss": loss.item()})

This is largely a renaming of the currently available abstraction made to make the code easier to grok to newcomers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.