captain-pool / gsoc Goto Github PK
View Code? Open in Web Editor NEWRepository for Google Summer of Code 2019 https://summerofcode.withgoogle.com/projects/#4662790671826944
License: MIT License
Repository for Google Summer of Code 2019 https://summerofcode.withgoogle.com/projects/#4662790671826944
License: MIT License
The quality of the image keeps on degrading with each prediction call(...)
This happens for both the SavedModel exports of ESRGAN and Compressed ESRGAN.
Create unit tests for model, custom layers and loss functions
Setup Checkpointing for saving intermediate steps for training.
Implement Network Interpolation for producing final result.
Interpolation is to be done on the parameters between,
Relativistic Average Generator and PSNR based Generator.
Are you scaling the loss when using distribute strategy, or are you using the same setup as training using estimators?
Documentation Needed for:
Create Discriminator for the Model
Setup training for PSNR oriented model using L1 Loss.
Currently the system uses 2 Training files for 2 phases of training. This can cause some unwanted future issues and it would be really difficult to track the model.
The Following Updates are requested,
Is this code structure of having a different file per-training phase followed elsewhere? Instead you can split your original model file into one module each for G, D & the overall model. And add each training phase as a helper on the main model class - this way its easier to track the behavior of the model across its entire lifetime in the pipeline. Originally posted by @srjoglekar246 in #28
The idea is to do the following:
phase_1
and phase_2
and convert the class based approach to trainer functions.Add Docstrings for functions, classes and py files.
Build a residual model of lesser number of layers compared to ESRGAN to act as the student model.
Setup loader for checkpoints from GCS and to initialize the teacher Generator and Discrimintor from ESRGAN.
Create final exporter for SavedModel from the Trained model.
When run bash train.sh
, the following error occurs. How to solve it?
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 643, in load
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 168, in __call__
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 643, in load
dbuilder = _fetch_builder(
^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 498, in _fetch_builder
return builder(name, data_dir=data_dir, try_gcs=try_gcs, **builder_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 168, in __call__
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 225, in builder
raise not_found_error
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 202, in builder
cls = builder_cls(str(name))
^^^^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 124, in builder_cls
cls = registered.imported_builder_cls(str(ds_name))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cld/.conda/envs/ml-tf/lib/python3.11/site-packages/tensorflow_datasets/core/registered.py", line 296, in imported_builder_cls
raise DatasetNotFoundError(f'Dataset {name} not found.')
tensorflow_datasets.core.registered.DatasetNotFoundError: Dataset image_label_folder not found.
Setup Automation script for training the model.
Create Bazel Build Files for the Trainer.
Dear captain-pool,
Thank you for releasing this!
I've been trying to train my own copy of E2_ESRGAN and I've run into a few problems. This made me think I am on a newer version of tensorflow with a different API, so I hoped you could update the readme to list what version of tensorflow you've used to make this work?
In particular, I'm running tf.__version__ '2.2.0'
and when I set both steps to false in stats.yaml
and run python3 main.py --data_dir data_dir/ --log_dir log_dir/ --model_dir model_dir/ --phases "phase1_phase2"
I first got the error TypeError: tf__experimental_run_v2() missing 1 required positional argument: 'kwargs'
then after I resolved that I got the error TypeError: Variable is unhashable. Instead, use tensor.ref() as the key.
and after I resolved that I got the error TypeError: tf__reduce() got multiple values for argument 'axis'
If you agree these errors arose because I'm on a different version of TF to you, please could you update the README to say the version you used? Thanks!
when the call function of generator is decorated with tf.function, it raises an issue, saying Model trying to create variables on first call.
Write Bazel BUILD file.
Tensorflow version: tensorflow==2.0.0b0
Tensorflow Datasets Version: tfds-nightly==1.0.2.dev201906090105
Tensorflow Hub Version: tf-hub-nightly==0.5.0.dev201905270046
Code Raises
End of sequence [[node input_pipeline_task0/while/IteratorGetNext (defined at image_retraining_tpu.py:139) ]]
for All values of max_steps
in TPUEstimator.train(...)
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=8
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=4
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=100
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=500
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=1000
GSOC/E1_TPU_Sample/image_retraining_tpu.py
Lines 135 to 139 in 513a0ec
Error starts from Line 230 of output.log
output.log
Add README.md for the project.
Setup checkpointing for the student network
Setup training of the model (initialized with weights from 1st phase) using combined loss.
Setup Trainer for the MSE loss.
model_trainable_variables
Doesn't exist.
Tensorflow Version: 1.14
OS Version: Elementary OS Loki
Built from Source: No
$ python3 export.py
import tensorflow_hub as hub
import tensorflow as tf
module = hub.Module("onnx/shufflenet/1")
preds = module(tf.random_normal(shape=[1, 3, 224, 224], dtype=tf.float32)
with tf.Session() as sess:
print(sess.run(preds))
Error Thrown
PyFunc:0 not Found
No Error
CC: @srjoglekar246
Automatically adjust the depth parameter of the student network (d) to match up with the accuracy of the teacher network.
For a huge model like this, Multi GPU training is the way to go.
Reference:
https://www.tensorflow.org/beta/tutorials/distribute/training_loops
Setup Trainer for Joint MSE and Adversarial Loss
Augment images using a function which can be mapped on the the dataset, every iteration, using tf.data.Dataset.map(...)
to produce new images.
Augmentation steps should include but not limited to:
Hello, I found a Performance issue in in the definition of call
, E3_Distill_ESRGAN/libs/models/student_rrdb.py, tf.math.add_n will created repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop in train_model_random.
Even after employing a video cache in a parallel setting, the player is too slow and has a lot of freeze frames.
Setup file and Directory structure for Initial Commit
Player crashes suddenly when TFLite inference is requested instead of SavedModel inference.
$ python3 player.py --file video.mp4 --tflite compressed_esrgan.tflite
Traceback (most recent call last):
File "player.py", line 208, in <module>
player.run()
File "player.py", line 172, in run
self.fetch_video()
File "player.py", line 125, in fetch_video
video = self.video_second()
File "player.py", line 115, in video_second
frames = pool.map(resolution_fn, frames)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "player.py", line 70, in tflite_super_resolve
self.interpreter.invoke()
File "/home/rick/tf2.0/env/lib/python3.5/site-packages/tensorflow/lite/python/interpreter.py", line 303, in invoke
self._ensure_safe()
File "/home/rick/tf2.0/env/lib/python3.5/site-packages/tensorflow/lite/python/interpreter.py", line 123, in _ensure_safe
data access.""")
RuntimeError: There is at least 1 reference to internal data
in the interpreter in the form of a numpy array or slice. Be sure to
only hold the function returned from tensor() if you are using raw
data access.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.