gfnorg / torchgfn Goto Github PK
View Code? Open in Web Editor NEWGFlowNet library
Home Page: https://torchgfn.readthedocs.io/en/latest/
License: Other
GFlowNet library
Home Page: https://torchgfn.readthedocs.io/en/latest/
License: Other
In train_hypergrid.py
, for example
# 3. Create the optimizer
params = [
{
"params": [
val
for key, val in parametrization.parameters.items()
if "logZ" not in key
],
"lr": args.lr,
}
]
if "logZ.logZ" in parametrization.parameters:
params.append(
{
"params": [parametrization.parameters["logZ.logZ"]],
"lr": args.lr_Z,
}
)
should ideally be a one liner, using a utility function.
The current design of the library is too specific to environments with discrete actions that can be masked out in a straightforward way.
Following a discussion with @EricElmoznino, the library would be much more flexible if there is a more abstract set of classes and methods. While this might require significant changes in how the states and environments and linked, and how that is used for sampling trajectories and evaluating losses, the core components of the library (sampling, loss evaluation) are already written.
The current ActionsSampler
's are too restrictive: for instance, in environments with continuous action spaces, the action could correspond to a vector of floats, and the distribution over actions might be something like a tuple of the mean and standard deviation over actions predicted by the forward policy. Another example is when your states are graphs and you want to perform some kind of graph surgery. Your actions might be something like "remove the node at a particular location" or "join a set of nodes into a clique", where you would want more complex data structures to represent these actions and the distributions over them. Hence:
We should have a more abstract Action
class, where :
The Env.step()
method could then take Action
instance as argument.
The ActionSampler.sample()
method could output (log_prob, action)
tuples, where log_prob
is the log probability of the sampled action. This would be very similar to the current behaviour, except that the actions are no longer constrained to be integer tensors (they would be instances of the Action
class).
The FunctionEstimator.__call__()
method could output arbitrary datatype(s) rather than just tensors. If we wanted to be very verbose, we could also have the forward and backward policy subclasses output something like an ActionDistribution subclass, which different ActionSampler subclasses would take as input in their .sample() methods.
For instance, we could have a CategoricalActionDistribution
class. A DiscreteLogitPFEstimator
subclass that outputs a CategoricalActionDistribution
, and a DiscreteActionSampler
that requires an instance of DiscreteLogitPFEstimator
at initialization. As another example, we could have a GaussianActionDistribution
class (essentially consisting of mean and variance), a GaussianLogitPFEstimator
subclass that outputs a GaussianActionDistribution
, and a GaussianActionSampler
that requires an instance of GaussianLogitPFEstimator
at initialization.
You could imagine extending this for your particular project that uses an idiosyncratic action space and distribution.
For now, masks of the type actions != -1
or actions == env.n_actions - 1
are used in multiple places (notably during trajectory sampling). These would obviously need to be changed to something like actions.is_dummy()
and actions.is_exit()
where the abstract Actions
class somehow defines what’s the dummy action (that’s appended to short trajectories), and what’s the exit action (i.e. actions corresponding to s -> s_f transitions)
Having a more abstract PFEstimator
class with an abstract get_actions_distribution
, which should be used for ActionsSampler
s and TrajectoriesSampler
s. The current estimators can be subclasses of PFEstimator
that would work well for simple discrete environment like HyperGrid, but the user would have the ability to write their own PFEstimator
that has access to the user-defined environment.
Currently, FunctionEstimator.__call__(states: States) -> OutputTensor
passes the states through an instance of Preprocessor
in order to transform the abstract state into something that can be processed by a GFNModule
(often a neural net). However, the output of Preprocessor
and the input to GFNModule
must be a single, fixed-sized tensor.
This is too restrictive. In general, a model might take in more complex data structures. For instance, if the state represents a graph, the GFNModule
might want to take in a tuple of (node_attributes: Tensor, adjacency_matrix: Tensor)
. The used should be able to use arguments that are not tensors. This would also be more consistent with how generic nn.Module's work, where they can take in arbitrary arguments. Hence:
Preprocessor
should output a tuple of arbitrary datatypes and then we pass *the_tuple
to the GFNModule
.Going forward, we could have a separate branch on which such features are implemented and tested (on the current environments at least), and this issue update as we discover what other significant changes would be needed.
Goal is to ensure that the level of specification needed to implement it is satisfying (i.e. we only need to code what's really specific to the environment, rather than boilerplate code that could have been abstracted away in the changes above).
So far, with I just put under pytest some code I used to test the features as I was implementing them. It's filled with prints. Better tests would include assert statements for example (or errors...)
Another way would be to have
class Parametrization(...):
...
@abstractmethod
def trajectories_to_training_samples(self, trajectories):
pass
class FMParametrization(...):
...
def trajectories_to_training_samples(self, trajectories):
return trajectories.to_non_initial_intermediary_and_terminating_states()
That's more OOP but lacks centralization. Food for thoughts
Originally posted by @vict0rsch in #56 (comment)
In previous versions of the code, when actions were integers, we had this function that reverts backward trajectories. It's not used as part of the codebase, but I remember using it for another project (probably GFN vs HVI). I just removed it (in an upcoming PR), and it would be nice to fix it and have it back
@staticmethod
def revert_backward_trajectories(trajectories: Trajectories) -> Trajectories:
"""Reverses a trajectory, but not compatible with continuous GFN. Remove."""
# TODO: this isn't used anywhere - it doesn't work as it assumes that the
# actions are ints. Do we need it?
assert trajectories.is_backward
new_actions = torch.full_like(trajectories.actions, -1)
new_actions = torch.cat(
[new_actions, torch.full((1, len(trajectories)), -1)], dim=0
)
# env.sf should never be None unless something went wrong during class
# instantiation.
if trajectories.env.sf is None:
raise AttributeError(
"Something went wrong during the instantiation of environment {}".format(
trajectories.env
)
)
new_states = trajectories.env.sf.repeat(
trajectories.when_is_done.max() + 1, len(trajectories), 1
)
new_when_is_done = trajectories.when_is_done + 1
for i in range(len(trajectories)):
new_actions[trajectories.when_is_done[i], i] = (
trajectories.env.n_actions - 1
)
new_actions[: trajectories.when_is_done[i], i] = trajectories.actions[
: trajectories.when_is_done[i], i
].flip(0)
new_states[
: trajectories.when_is_done[i] + 1, i
] = trajectories.states.tensor[: trajectories.when_is_done[i] + 1, i].flip(
0
)
new_states = trajectories.env.States(new_states)
return Trajectories(
env=trajectories.env,
states=new_states,
actions=new_actions,
log_probs=trajectories.log_probs,
when_is_done=new_when_is_done,
is_backward=False,
)
Following the discussion #36 (comment), remove estimators, and just have nn.Modules
that have to_probability_distribution
function, or even remove both parametrization and estimators, and use one class as in the following pseudocode (to be completed):
class GFN:
pf: nn.module
pb: nn.module
z: nn.module
def loss(self, trajectories):
return blah
def to_dist(self, States):
tensor = self.pf(States)
a = torch.sigmoid(tensor[:, 0])
b = torch.tanh(tensor([:, 1])
return self.distribution(a, b)
With tabular representations used for LogEdgeFlow Parametrization, a dynamic programming algorithm exists, and can be used as a baseline to compare the edge flows obtained by other algorithms to. Implement that !
After the fix for #77, write the boilerplate code for check_output_dim
in GFNModule
, and the abstract method would just be def required_output_dim(self) -> int
?
Implement the molecules environment following the current API. More functions might be needed !
train_discreteebm.py
as a v0 script, without much options.train.py
and configs/
OK so I think we can de-complexify dramatically the Parameterization - Estimator framework.
Right now, Parameterizations
(which are tied to specific losses) accept multiple Estimators
, which in turn might contain nn.Module
s or torch.Tensor
s.
This is tricky for a few reasons. First we need to do a tonne of bookkeeping when we are doing things like saving/loading parameters, or calling Paramaterization.named_parameters()
- in order to actually return the correct dict of parameters, the Paramaterization
and Estimator
classes need to handle everything correctly and store the correct dict manually. This should not be required.
This is also tricky when using optimizers, because you need to feed the right subset of parameters to the optimizer. Sometimes there's a trainable logz parameter, but it won't be in named_parameters()
, and you need to point the optimizer at it explicitly. If you don't do this the model won't learn well but you will never know why without serious digging.
I would propose that I re-write these classes using inheritance and mixin the nn.Module
class directly such that pytorch features like parameters()
and save()
just work. The reason I'm concerned here is this is an excellent vector through which to build in a silent bug. In fact I think it's highly likely this will happen when we start getting external users.
The expected behavior is that feeding Parameterization.parameters()
to Adam would just work. Or saving would just work - everything would get saved and loaded and without any concern that the user must handle edge cases or other logic manually.
Thoughts?
Each parametrization has its own parameters. The parameters are eventually passed to the optimizer. The base parametrization class defines what counts as parameters. This is super ugly and depends on the class of self
. A more elegant and less error-prone alternative is to define the parameters as the union of the estimators (that define the parametrization)'s parameters. Each estimator contains either a module or a tensor.
An idea would be to make the logZ
estimator's tensor a module (that inherits from GFNModule
), and make an abstract property (called gfn_parameters
) that each module needs to implement. Those of Uniform
would be empty for example. The nn.Module
s can call self.named_parameters()
to implement gfn_parameters
. Inheritance there should be revisited. Tabular shouldn't need to inherit from nn.Module
.
We should take inspiration from these folks
I can touch base to see how they built all of this stuff. Very pro.
A design choice was to separate the sampling of the learning objects (trajectories or transitions) and the actual learning, to allow off-policy learning. This means that two forward passes are required when doing online learning: the first pass to sample the actions to get to the next states, and the second pass to get the logits of the chosen actions. If a training object is discarded as soon as it is used for learning (i.e. online, i.e. no replay buffer), then this is inefficient, as the logits can be evaluated during the first pass. What do you think is the best way to store the "online logits" in the containers to avoid a second pass when these online logits are available ?
module.parameters()
will by default return both of these as a single list / dict, but it is common to give a unique learning rate to the logZ
parameter (for example).
So we should provide the user with policy_parameters()
and logz_parameters()
(at least).
As was done in #79 for the master
branch:
def test_extend_trajectories_on_cuda():
import os
import sys
sys.path.insert(0, os.path.abspath("__file__" + "/../"))
from src.gfn.containers.trajectories import Trajectories as Traj
torch.manual_seed(0)
env = HyperGrid(ndim=4, height=8, R0=0.01, device_str="cuda")
sampler = TrajectoriesSampler(
env=env,
actions_sampler=DiscreteActionsSampler(
estimator=LogitPFEstimator(env=env, module_name="NeuralNet"),
),
)
trajectories_1 = sampler.sample(n_trajectories=10)
trajectories_2 = sampler.sample(n_trajectories=10)
trajectories_1 = Traj(
env=sampler.env,
states=trajectories_1.states,
actions=trajectories_1.actions,
when_is_done=trajectories_1.when_is_done,
is_backward=sampler.is_backward,
log_rewards=trajectories_1.log_rewards,
log_probs=trajectories_1.log_probs,
)
trajectories_2 = Traj(
env=sampler.env,
states=trajectories_2.states,
actions=trajectories_2.actions,
when_is_done=trajectories_2.when_is_done,
is_backward=sampler.is_backward,
log_rewards=trajectories_2.log_rewards,
log_probs=trajectories_2.log_probs,
)
trajectories_1.extend(trajectories_2)
Copied from #66
Now that ModifiedDB is its own class, we might want to add tests for it, as the other losses, and even incorporate it into the scripts.
For now, from gfn.xxx import zzz
, from gfn.xxx.yyy import zzz
, and from ..xxx.yyy import zzz
are used carelessly. There should ideally be some coherency here.
Hey,
I was running the example code to solve for the 2-dimensional Hypergrid, and I noticed that if I don't share parameters between the forward and backward model:
logit_PB = LogitPBEstimator(env=env, module_name='NeuralNet', torso=None)
the agent doesn't learn anything. Is this expected?
There is a utils.py
file in envs
that implements HyperGrid specific stuff. In gfn/utils.py
, there are again HyperGrid specific functions (validation functions mainly). This won't scale to an arbitrary amount of environments, and ideally the validation stuff should be environment agnostic. What's the best way to go about this ?
This stores redundant parameters twice (e.g. when P_F and P_B share parameters), and torch optimizers do not like that (throwing warnings).
Let's find a way to improve it !?
and make sample_trajectories
a method of Sampler
, which would be the new ActionsSampler
, requiring only a GFNModule that implements the to_probability_distribution
method.
After #77 ! Should be close enough to the example provided in the README of #84.
Thoughts @vict0rsch @josephdviviano ?
Then try it on the HyperGrid environment (make the loss accessible via the configs for train.py
). Make sure to reproduce the results reported in Trajectory Balance: Improved Credit Assignment in GFlowNets.
For now, Tabular representations require the Identity preprocessor. The preprocessed states (which are raw tensors, given that they are the output of the Identity preprocessor) are passed back to the environment to create States objects, on which get_states_indices
is called. This is highly inefficient, and the code could benefit from having a TabularPreprocessor that calls get_states_indices
under the hood. The output of TabularPreprocessor would thus be indices that can readily be used for any lookup table.
a fresh install of torchgfn
will not let me run python train_box.py
because I'm missing at least one dependency. I'll add it to the poetry config.
For example - sampling with temperature.
Hello everyone,
Thank you for making this repo. My colleagues and I are applying GFlowNets on a hypergrid environment using our own reward distribution. However upon training, we get the NonValidActionsError. I was able to reproduce the error using the built-in hypergrid environment but I can't isolate it. I can also provide the jupyter notebook where I got the error, if needed. The code to reproduce it is below:
import torch
from gfn.envs import HyperGrid
from gfn import LogitPBEstimator, LogitPFEstimator, LogZEstimator
from gfn.losses import TBParametrization, TrajectoryBalance
from gfn.samplers import DiscreteActionsSampler, TrajectoriesSampler
torch.manual_seed(0)
exploration_rate = 0.5
learning_rate=0.0005
env = HyperGrid(ndim=5, height=2)
logit_PF = LogitPFEstimator(
env=env,
module_name="NeuralNet",
)
logit_PB = LogitPBEstimator(
env=env,
module_name="NeuralNet",
torso=logit_PF.module.torso,
)
logZ = LogZEstimator(torch.tensor(0.0))
training_sampler = TrajectoriesSampler(
env=env,
actions_sampler=DiscreteActionsSampler(
estimator=logit_PF,
epsilon=exploration_rate
)
)
parametrization = TBParametrization(logit_PF, logit_PB, logZ)
loss_fn = TrajectoryBalance(
parametrization=parametrization,
)
params = [
{
"params": [
val for key, val in parametrization.parameters.items() if "logZ" not in key
],
"lr": learning_rate,
},
{"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
]
optimizer = torch.optim.Adam(params=params)
from tqdm import tqdm
n_iterations = int(1e4)
batch_size = int(1e5)
for i in (pbar := tqdm(range(n_iterations))):
trajectories = training_sampler.sample(
n_trajectories=batch_size
)
optimizer.zero_grad()
loss = loss_fn(trajectories)
loss.backward()
optimizer.step()
if i % int(1e4) == 0:
pbar.set_postfix({"loss": loss.item()})
I followed the instructions to install torchgfn and I got the torch version 2.0.1+cu117
.
The DiscreteActionsSampler seems produce an invalid action at a random time when running the whole code but I can't see the error when I'm using a debugger to run the code line by line at the appropriate iteration.
Please let us know what can be done to resolve this.
Thank you very much and have a nice day!
Hello again everyone,
We are trying to augment our training with backward trajectories starting from states sampled from a reward-prioritized replay buffer, as described here: https://arxiv.org/abs/2305.07170. I used Trajectories.revert_backward_trajectories()
to transform the backward trajectories into forward ones. But attempting to combine them with the forward sampled trajectories causes an error. Specifically, the code below reproduces the error:
torch.manual_seed(0)
env = HyperGrid(ndim=2, height=3, R0=0.01)
logit_PF = LogitPFEstimator(env=env, module_name="NeuralNet")
logit_PB = LogitPBEstimator(
env=env, module_name="NeuralNet", torso=logit_PF.module.torso,
)
forward_sampler = TrajectoriesSampler(
env=env, actions_sampler=DiscreteActionsSampler(estimator=logit_PF)
)
backward_sampler = TrajectoriesSampler(
env=env, actions_sampler=BackwardDiscreteActionsSampler(estimator=logit_PB)
)
trajectories = forward_sampler.sample(n_trajectories=4)
states = env.reset(batch_shape=4, random=True) # Would come from replay buffer
backward_trajectories = backward_sampler.sample_trajectories(states)
offline_trajectories = Trajectories.revert_backward_trajectories(backward_trajectories)
trajectories.extend(offline_trajectories)
The error is:
def extend(self, other: Trajectories) -> None:
"""Extend the trajectories with another set of trajectories."""
self.extend_actions(required_first_dim=max(self.max_length, other.max_length))
other.extend_actions(required_first_dim=max(self.max_length, other.max_length))
self.states.extend(other.states)
self.actions = torch.cat((self.actions, other.actions), dim=1)
self.when_is_done = torch.cat((self.when_is_done, other.when_is_done), dim=0)
> self.log_probs = torch.cat((self.log_probs, other.log_probs), dim=1)
E RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 5 but got size 4 for tensor number 1 in the list.
Inserting the code
offline_trajectories.log_probs = torch.cat([
offline_trajectories.log_probs,
torch.full(
size=(
1,
offline_trajectories.n_trajectories,
),
fill_value=0,
dtype=torch.float,
)
], dim=0)
before trajectories.extend(offline_trajectories)
seems to work but I don't know if there will be unexpected behavior downstream. It seems that log_probs
needs to be padded after Trajectories.revert_backward_trajectories()
. I would appreciate your insight.
Possibly a bit off topic, would it be better to sample forward trajectories stored in ReplayBuffer
instead? I would just need to sort the trajectories according to the rewards of the terminating states?
Thank you very much for your time!
Direct fix, don't overthink it:
Change
log_probs = self.log_probs[:, index][:new_max_length]
with
if self.log_probs.shape != (0, 0):
log_probs = self.log_probs[:, index]
log_probs = log_probs[:new_max_length]
else:
log_probs = self.log_probs
and get rid of requirements.txt
1 yaml per namespace not many small files in folders (to master
)
let's fix this, and think of better ways of saving/loading parameterization's state dicts
In the latest branch: from https://github.com/saleml/torchgfn/blob/master/src/gfn/losses/base.py#L50 to the end of the class
Basically, the current way of interacting with the parameterization is a bit confusing to explain to others. We have parameters, which produce predictions, in conjunction with trajectories which are sampled, and this produces a loss. But when we call the loss function, we never see the parameters. We propose here to define a model
with parameterizations, which has a model.loss()
method, which accepts trajectories, to produce a loss.
Simmply:
# Define a model and a sampler.
tb_model = TBParametrization(logit_PF, logit_PB, logZ)
trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)
# Get some trajectories and produce a loss.
trajectories = trajectories_sampler.sample(n_trajectories=16)
loss = tb_model.loss(trajectories)
More concretely,
env = HyperGrid(ndim=4, height=8, R0=0.01) # Grid of size 8x8x8x8
module_PF = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions)
module_PB = NeuralNet(input_dim=env.preprocessor.output_shape[0], output_dim=env.n_actions - 1, torso=module_PF.torso)
logit_PF = DiscretePFEstimator(env=env, module=module_PF)
logit_PB = DiscretePBEstimator(env=env, module=module_PB)
logZ = LogZEstimator(torch.tensor(0.0))
# In another PR let's consider merging these (`trajectory_sampler` becomes a
# method of `actions_sampler`).
actions_sampler = ActionsSampler(estimator=logit_PF)
trajectories_sampler = TrajectoriesSampler(actions_sampler=actions_sampler)
# To remove.
#parametrization = TBParametrization(logit_PF, logit_PB, logZ)
#loss_fn = TrajectoryBalance(parametrization=parametrization)
# Replaced with.
tb_model = TBParametrization(logit_PF, logit_PB, logZ)
params = [
{
"params": [
val for key, val in parametrization.parameters.items() if "logZ" not in key
],
"lr": 0.001,
},
{"params": [val for key, val in parametrization.parameters.items() if "logZ" in key], "lr": 0.1},
]
optimizer = torch.optim.Adam(params)
for i in (pbar := tqdm(range(1000))):
trajectories = trajectories_sampler.sample(n_trajectories=16)
optimizer.zero_grad()
# loss = loss_fn(trajectories) # replaced with
loss = tb_model.loss(trajectories)
loss.backward()
optimizer.step()
if i % 25 == 0:
pbar.set_postfix({"loss": loss.item()})
This is largely a renaming of the currently available abstraction made to make the code easier to grok to newcomers.
The goal is to be able to replicate the results of Trajectory Balance: Improved Credit Assignment in GFlowNets using the Detailed Balance loss
https://cirun.io/ have a free Open Source plan for instance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.