google / learned_optimization Goto Github PK

License: Apache License 2.0

Python 89.09% Jupyter Notebook 10.91%

learned_optimization's Issues

colab demo error

the demo colab pt1.introduction comes up with an error on the second running block
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.9.0 requires jedi>=0.10, which is not installed.

Understanding the differences compared to your earlier work & library

What is the situation when comparing this work to your earlier work of opt_list? Is that obsolete? What are the practical differences?

Also, any plans to add AdaBound and Adam-HD to your benchmarks? Both have fairly high citations/year.

TF uses GPU in tf.data datasets.

Ensure that tf.data does not try to use the GPU for data processing.

Related to: #50

PyTorch port?

Any plans to do this? I might be interested to try working on this if not

typo in the tutorial

learned_optimization/docs/notebooks/Part1_Introduction.ipynb

Line 779 in a49615f

" momentums=jax.tree_util.tree_unflatten(struct, output_params),\n",

output_params used instead of output_momentums

Keras integration

Are there any plans for integration into Keras? Thanks

jnp.sign(mean_rms) is always 1

In the list of features for nn_adam, you are using this feature, but if I understand correctly, this feature is always 1.

learned_optimization/learned_optimization/learned_optimizers/nn_adam.py

Line 305 in 242e218

inputs["mean_sign"] = jnp.sign(mean_rms)

Error while runing image_test.py

Hi!
I was having a look at the test functions of the datasets module to try to create my own task & dataset, and I get an error while running image_test.py
ValueError: Dataset imagenet2012_16 with split train doesn't appear to be preprocessed? Please run dataset creation.

I have checked the ase.py but it is not clear to me how to run the dataset creation.
Thanks

Wrong implementation of hyper_v2 mix_layers

Hi,
In VeLO (https://arxiv.org/pdf/2211.09760.pdf) Section B.3, it states that mixing is done by F0(x) + max(σ(F1(σ(F2(x)))), axis = 0, keep_dims = True).
However, in the implementation of hyper_v2 (https://github.com/google/learned_optimization/blob/main/learned_optimization/research/general_lopt/hyper_v2.py#L330-L335), it essentially use only one linear layer instead of two, as the input to second linear layer is x instead of mix_layer (L332).

Very Large Memory Consumption for Even A Small Dataset

Dataset: fashion_mnist
Dataset Size: 36.42MB (https://www.tensorflow.org/datasets/catalog/fashion_mnist)

Reproduce the Issue:

from learned_optimization.tasks import fixed_mlp
task = fixed_mlp.FashionMnistRelu32_8()

from learned_optimization.tasks.datasets import base

batch_size=128
image_size=(8, 8)
splits = ("train[0:80%]", "train[80%:90%]", "train[90%:]", "test")
stack_channels = 1

dataset = preload_tfds_image_classification_datasets(
      "fashion_mnist",
      splits,
      batch_size=batch_size,
      image_size=image_size,
      stack_channels=stack_channels)

Issue Description:
As you can see, the original FashionMnist dataset is very small. However, when I run the above code, the memory usage became crazy high, such as 10G+.

In my case, the issues occurs when the program reaches this line which in the function preload_tfds_image_classification_datasets:

  return Datasets(
      *[make_python_iter(split) for split in splits],
      extra_info={"num_classes": num_classes})

Here is the code of make_python_iter:

  def make_python_iter(split: str) -> Iterator[Batch]:
    # load the entire dataset into memory
    dataset = tfds.load(datasetname, split=split, batch_size=-1)
    data = tfds.as_numpy(_image_map_fn(cfg, dataset))

    use a python iterator as this is faster than TFDS.
    def generator_fn():

      def iter_fn():
        batches = data["image"].shape[0] // batch_size
        idx = onp.arange(data["image"].shape[0])
        while True:
          # every epoch shuffle indicies
          onp.random.shuffle(idx)
          for bi in range(0, batches):
            idxs = idx[bi * batch_size:(bi + 1) * batch_size]

            def index_into(idxs, x):
              return x[idxs]

            yield jax.tree_map(functools.partial(index_into, idxs), data)

      return prefetch_iterator.PrefetchIterator(iter_fn(), prefetch_batches)

    return ThreadSafeIterator(LazyIterator(generator_fn))

Could you please suggest a way to reduce the huge memory usage, do you have any idea why it requires so high memory, and do you (or anybody) also have this issue?

Thank you very much and looking forward to your comments.

pytorch implementation?

It would be very helpful if you could provide implementation in pytorch.

Notebook Not Found

Um.. the notebooks listed on the README are non exist idk if this is intend, but they don't work :((

Colab link not working

Link in the following section doesn't work:

Build a learned optimizer from scratch

Simple, self-contained, learned optimizer example that does not depend on the learned_optimization library: Open In Colab

The link goes like this [https://colab.research.google.com/github/google/learned_optimization/blob/main/docs/notebooks/no_dependency_learned_optimizer.ipynb.ipynb]. There's a trailing .ipynb that should not be there.

License of checkpoints

What is the license of the checkpoints listed in https://github.com/google/learned_optimization/blob/main/learned_optimization/research/general_lopt/pretrained_optimizers.py ?
Is it an Apache 2 license, too?

Use with Tensorflow JS?

How would I integrate VeLO with Tensorflow JS?

Issue with the Demo_for_training_a_model_with_a_learned_optimizer.ipynb

Upon running the notebook as it is, I observe an AttributeError: 'tuple' object has no attribute 'dtype', while 'Training Resnets with VeLO'. On the 114th line of the 2nd cell under "Training Resnets with VeLO" i.e.

state = solver.init_state(params, L2REG, next(test_ds), batch_stats)

The return value of the "init_state" method of optaxSolver is an OptaxState and it returns

OptaxState(iter_num=jnp.asarray(0),
value=jnp.asarray(jnp.inf, value.dtype),
error=jnp.asarray(jnp.inf, dtype=params_dtype),
aux=aux,
internal_state=opt_state)

I cannot understand if value is a tuple or just a scalar and why is this error keep occurring. Please help in solving it.

google / learned_optimization Goto Github PK

learned_optimization's Issues

Build a learned optimizer from scratch

Recommend Projects

Recommend Topics

Recommend Org

Jobs