inferno-pytorch / inferno Goto Github PK
View Code? Open in Web Editor NEWA utility library around PyTorch
License: Other
A utility library around PyTorch
License: Other
I just checked the example we have in the README and I think it does
not make sense....
We are adding a Softmax
and use CrossEntropyLoss
(which combines Softmax
and NLLLoss
).
@nasimrahaman I think the unit tests are to aggressive.
They tend to fail too often on travis.
I guess one should relax these tests a bit
self.assertLess(trainer.get_state('validation_error_averaged'), (1 - 1/self.NUM_CLASSES))
E AssertionError: 0.9244186046511628 not less than 0.9
Right now the version information is present in two different files:
https://github.com/inferno-pytorch/inferno/blob/master/setup.py#L41
https://github.com/inferno-pytorch/inferno/blob/master/inferno/__init__.py#L14
Would be better to store this in a single place. I am not sure about the best practices though.
You guys seem to be doing something very similar to the torchsample project and it seems to me that there's a ton of room for collaboration / code merging. Please consider at least pulling in some of the functionality from that library. Also, I really like the way the callbacks are structured in torchsample. I didn't see anything similar in inferno but I think it would be a good idea.
Thanks
Several parts of the Trainer class require a location to save to but don't complain until it is too late.
Examples are (of course) when a save point is specified via save_every, but the trainer also defaults to saving after a validation run even without a necessary directory.
unittest.skip
-ped.Trainer
, perhaps with a dummy model on a dummy dataset with a dummy criterion and a dummy metric.create conda recipe
I try to use inferno trainers, come with the output:
cannot import name 'is_image_file'
$ python alderley_patchcwganp.py
Traceback (most recent call last):
File "alderley_patchcwganp.py", line 9, in <module>
from inferno.trainers.basic import Trainer
File "/home/jiangsht/anaconda2/envs/chi/inferno/__init__.py", line 6, in <module>
from . import io
File "/home/jiangsht/anaconda2/envs/chi/inferno/io/__init__.py", line 1, in <module>
from . import box
File "/home/jiangsht/anaconda2/envs/chi/inferno/io/box/__init__.py", line 3, in <module>
from .camvid import CamVid, get_camvid_loaders
File "/home/jiangsht/anaconda2/envs/chi/inferno/io/box/camvid.py", line 9, in <module>
from torchvision.datasets.folder import is_image_file, default_loader
ImportError: cannot import name 'is_image_file'
This requires:
Trainer
class to specify the number of expected inputs and outputs,Is there a way to register call back for gradient clip?
scipy.misc.toimage
is deprecated since scipy 1.0.0 and removed since 1.3.0.
we use it here:
Trainer.print
may stay, but the print statements must be optional (especially with Tensorboard logging). This can be easily done with a Verbosity
callback.
I have tried to run the script in the Readme, after having set the three directories that must be set and disabling CUDA. When running the script with python3 hello_world.py
I got two errors, I made the first disappear (see below), but the second is still present. The expected behavior is to get no error.
The full code is reported below, in a file called hello_world.py
.
import torch.nn as nn
from inferno.io.box.cifar import get_cifar10_loaders
from inferno.trainers.basic import Trainer
from inferno.trainers.callbacks.logging.tensorboard import TensorboardLogger
from inferno.extensions.layers.convolutional import ConvELU2D
from inferno.extensions.layers.reshape import Flatten
# Fill these in:
LOG_DIRECTORY = 'log'
SAVE_DIRECTORY = 'save'
DATASET_DIRECTORY = 'data'
DOWNLOAD_CIFAR = True
USE_CUDA = False
# Build torch model
model = nn.Sequential(
ConvELU2D(in_channels=3, out_channels=256, kernel_size=3),
nn.MaxPool2d(kernel_size=2, stride=2),
ConvELU2D(in_channels=256, out_channels=256, kernel_size=3),
nn.MaxPool2d(kernel_size=2, stride=2),
ConvELU2D(in_channels=256, out_channels=256, kernel_size=3),
nn.MaxPool2d(kernel_size=2, stride=2),
Flatten(),
nn.Linear(in_features=(256 * 4 * 4), out_features=10),
nn.LogSoftmax(dim=1)
)
# Load loaders
train_loader, validate_loader = get_cifar10_loaders(DATASET_DIRECTORY,
download=DOWNLOAD_CIFAR)
# Build trainer
trainer = Trainer(model) \
.build_criterion('NLLLoss') \
.build_metric('CategoricalError') \
.build_optimizer('Adam') \
.validate_every((2, 'epochs')) \
.save_every((5, 'epochs')) \
.save_to_directory(SAVE_DIRECTORY) \
.set_max_num_epochs(10) \
.build_logger(TensorboardLogger(log_scalars_every=(1, 'iteration'),
log_images_every='never'),
log_directory=LOG_DIRECTORY)
# Bind loaders
trainer \
.bind_loader('train', train_loader) \
.bind_loader('validate', validate_loader)
if USE_CUDA:
trainer.cuda()
# Go!
trainer.fit()
I first created the three folders specified in the script with mkdir log
, mkdir save
, mkdir data
. I then ran the script with python3 hello_world.py
. I first got the error:
File "hello_world.py", line 2, in <module>
from inferno.io.box.cifar import get_cifar10_loaders
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/__init__.py", line 6, in <module>
from . import io
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/io/__init__.py", line 4, in <module>
from . import volumetric
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/io/volumetric/__init__.py", line 1, in <module>
from .volume import VolumeLoader, HDF5VolumeLoader, TIFVolumeLoader
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/io/volumetric/volume.py", line 8, in <module>
from ...utils import io_utils as iou
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/utils/io_utils.py", line 5, in <module>
from scipy.misc import imsave
ImportError: cannot import name 'imsave'
which I could solve by running conda install -c anaconda scipy
. I was not expecting this error because, since I installed inferno with conda, I expected all the dependencies to be already installed.
The second error that now I get is the following:
Traceback (most recent call last):
File "hello_world.py", line 2, in <module>
from inferno.io.box.cifar import get_cifar10_loaders
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/__init__.py", line 7, in <module>
from . import trainers
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/trainers/__init__.py", line 1, in <module>
from . import basic
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/trainers/basic.py", line 20, in <module>
from .callbacks.logging.base import Logger
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/trainers/callbacks/logging/__init__.py", line 4, in <module>
from .tensorboard import TensorboardLogger
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/inferno/trainers/callbacks/logging/tensorboard.py", line 1, in <module>
import tensorboardX as tX
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/__init__.py", line 5, in <module>
from .torchvis import TorchVis
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/torchvis.py", line 11, in <module>
from .writer import SummaryWriter
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/writer.py", line 15, in <module>
from .event_file_writer import EventFileWriter
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 28, in <module>
from .proto import event_pb2
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/proto/event_pb2.py", line 15, in <module>
from tensorboardX.proto import summary_pb2 as tensorboardX_dot_proto_dot_summary__pb2
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/proto/summary_pb2.py", line 15, in <module>
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/proto/tensor_pb2.py", line 15, in <module>
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/miniconda3/envs/my_test/lib/python3.6/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 22, in <module>
serialized_pb=_b('\n(tensorboardX/proto/resource_handle.proto\x12\x0ctensorboardX\"r\n\x13ResourceHandleProto\x12\x0e\n\x06\x64\x65vice\x18\x01 \x01(\t\x12\x11\n\tcontainer\x18\x02 \x01(\t\x12\x0c\n\x04name\x18\x03 \x01(\t\x12\x11\n\thash_code\x18\x04 \x01(\x04\x12\x17\n\x0fmaybe_type_name\x18\x05 \x01(\tB/\n\x18org.tensorflow.frameworkB\x0eResourceHandleP\x01\xf8\x01\x01\x62\x06proto3')
How to fix it?
conda list | grep inferno
inferno v0.4.0 py_0 conda-forge
inferno-pytorch 0.4.0 pypi_0 pypi
3.7.4
centOS
I get this error (I show only the last line of the backtrace)
File "/home/my_username/anaconda3/envs/my_project/lib/python3.7/site-packages/inferno/trainers/callbacks/logging/tensorboard.py", line 292, in extract_images_from_batch
batch = batch.float().numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
Produced by running the following code (I post only the part that should be relevant)
# ...
trainer = Trainer(vae)
trainer.save_to_directory(folder)
trainer.cuda()
trainer.build_criterion(vae.loss_function())
trainer.build_optimizer('Adam', lr=0.001)
trainer.save_every((1, 'epochs'))
trainer.set_max_num_epochs(100)
trainer.build_logger(TensorboardLogger(log_scalars_every=(1, 'iteration'),
log_images_every=(1, 'iteration'),
log_directory=folder))
# ...
It would be great to have the neuroglancer viewer availble for 3D volumetric data during inference.
This would make data inspection much easier especially for data with multiple channels.
create conda forge recipe
Maintaining a building documentation on readthedocs is a pain in the ass:
Building the docs by ourselfs and host them via https://pages.github.com/ is not very hard, It just means we need do this on a regular basis. But we get a supernice auto example gallery and it is not fragile at all.
@nasimrahaman what do you think?
I think there is a logic bug is save_now. Please correct me if I am wrong.
https://github.com/inferno-pytorch/inferno/blob/master/inferno/trainers/basic.py#L484
The second condition is currently:
elif self._is_iteration_with_best_validation_score:
return self._save_at_best_validation_score
Shouldn't that be:
elif self._save_at_best_validation_score:
return self._is_iteration_with_best_validation_score
If you are only saving at the best score, then only save_now if you are the best score.
However, if you are currently at the best score and save at best is off, it will not save. Should be an easy fix just swapping those two variables.
It looks like the multi-processing is not really stable in pytorch 1.0 yet.
This leads to a few non-terminating tests (not an issue in 0.4.1 !).
I have disabled the 1.0 tests for now:
https://github.com/inferno-pytorch/inferno/blob/master/.travis.yml#L11
We should check again if this works once there is a new torch release.
There are some issues with the TensorboardLogger
default arguments.
All log_X_every
get the default argument None
, which will be mapped to once every iteration.
This is problematic:
log_histograms_every
set to once every iteration will lead to calling log_histogram and raise a NotImplementedError
log_images_every
set to once every iteration can result in huge log-files, because it stores a lot of images.Probably the best solution is to change the handling of None
for log_images
and log_histogram
The latest version in pypi right now is 0.1.7.
In the current implementation of validation smoothing, we use momentum.
This puts a very high importance on the first validation score.
E.g. for 3 validation scores [.75, .2, .1] the smoothed value would be something like 0.7.
I think using a sliding window with some decay would be more appropriate.
As of 77fd5e7, they're tagged training_prediction/0
, training_prediction/1
and so on. We should have something like training_prediction/batch_0/channel_0
, training_prediction/batch_2/channel_1/z_0
.
conda install -c pytorch -c conda-forge inferno
)Initialising a model raised an recursion error.
Specifically, in inferno/extensions/initializers/presets.py
line 23
if isinstance(tensor, Variable):
self.call_on_tensor(tensor.data)
the if clause is always true and one gets stuck in infinite recursion.
Delete the lines
if isinstance(tensor, Variable):
self.call_on_tensor(tensor.data)
return tensor
Hey team. I was just wondering what the best way to get the trainer to print training accuracies to the console was? Thanks
project page is not up to date, the version number referenced there as goal are already reached
inferno/examples/plot_unet_tutorial.py
Line 71 in 5493e9e
Am I missing something or shouldn't this import be from inferno.extensions.model
instead?
(Btw. thanks for the Unets and the tutorial Thorsten)
code for class RandomScaleSegmentation duplicated.
it's identical except for the padding mode (and empty lines)
inferno/inferno/io/transform/image.py
Line 86 in c83f2ec
inferno/inferno/io/transform/image.py
Line 644 in c83f2ec
This project looks like a good replacement for the manual tensorboard business we currently have going. It makes it much easier to integrate histograms, distributions, and even audio.
As of this commit, the problem can be reproduced as follows:
import torch
from torch.autograd import Variable
import torch.nn as nn
from torch.nn.parallel.data_parallel import data_parallel
from inferno.extensions.containers.graph import Graph
input_shape = [8, 1, 3, 128, 128]
model = Graph()\
.add_input_node('input')\
.add_node('conv0', nn.Conv3d(1, 10, 3, padding=1), previous='input')\
.add_node('conv1', nn.Conv3d(10, 1, 3, padding=1), previous='conv0')\
.add_output_node('output', previous='conv1')
model.cuda()
input = Variable(torch.rand(*input_shape).cuda())
output = data_parallel(model, input, device_ids=[0, 1, 2, 3])
This raises:
RuntimeError: tensors are on different GPUs
Could this be due to this add_module
?
we should make the behavior off the GarbageCollection
callback default and make this callback obsolete.
We should proidve an API like the following and have reasonable defaults:
trainer.garbage_collect(collect_every=(1, 'iteration'))
Power users can disable gc via
trainer.garbage_collect(collect_every='never')
I think it would be useful to have a nicer interface for input and output batches with multiple elements.
Currently it can be cumbersome to keep track of what is where in the batch, especially when using multiple transforms that add or remove elements from the batch, or when using multiple loss functions that act on different ground truth/predictions.
This would be a lot easier if elements in the batch could have tags (such as 'raw', 'segmentation', 'affinities'). Transforms and loss functions could use these tags to select what they act on, and also label their outputs.
I will probably implement this at least for myself, but doing so in a nice way while keeping the current functionality will be harder. So I am interested in whether this feature would be useful to others, and if someone has ideas on how to implement it.
This line crashes with networkx version 2.2
As of now, Trainer does not work if max_num_epochs
or max_num_iterations
is not specified. Not providing either should result in the trainer training till interrupted (via Ctrl+C or SIGINT).
conda install -c pytorch -c conda-forge inferno
)When importing inferno, I got the error:
File "<stdin>", line 1, in <module>
File "/home/sdamrich/anaconda3/envs/condaenv/inferno/__init__.py", line 6, in <module>
from . import io
File "/home/sdamrich/anaconda3/envs/condaenv/inferno/io/__init__.py", line 1, in <module>
from . import box
File "/home/sdamrich/anaconda3/envs/condaenv/inferno/io/box/__init__.py", line 3, in <module>
from .camvid import CamVid, get_camvid_loaders
File "/home/sdamrich/anaconda3/envs/condaenv/inferno/io/box/camvid.py", line 9, in <module>
from torchvision.datasets.folder import is_image_file, default_loader
ImportError: cannot import name 'is_image_file'
I use torchvision 0.2.1 and pytorch 1.0.1
Commenting out the first two lines in
/home/sdamrich/anaconda3/envs/condaenv/inferno/io/box/__init__.py
solved the issue for me.
inferno/io/trasnform/generic.py
class Normalize(Transform)
def tensor_function(self, tensor):
mean = np.asarray(tensor.mean()) if self.mean is None else self.mean
std = np.asarray(tensor.std()) if self.std is None else self.std
# Figure out how to reshape mean and std
reshape_as = [-1] + [1] * (tensor.ndim - 1)
# Normalize
tensor = (tensor - mean.reshape(*reshape_as))/(std.reshape(*reshape_as) + self.eps)
return tensor
I am not sure I'm getting the intentions here, but I guess this reshaping the mean and std part is meant to apply separate means and stds for channels, right?
In this case it looks like it wouldn't work if the mean and std were not supplied as arguments (tensor.mean() would return the mean of a flattened array by default?)
Was it meant like this?
With the current pytorch (1.0), the IOU metric fails with
File "/home/pape/Work/software/conda/miniconda3/envs/torch10/lib/python3.7/site-packages/inferno/extensions/metrics/categorical.py", line 104, in forward
numerator = (flattened_prediction * onehot_targets).sum(-1)
RuntimeError: expected type torch.cuda.FloatTensor but got torch.cuda.LongTensor
I could fix this by casting onehot_targets
to float
before this line:
https://github.com/inferno-pytorch/inferno/blob/master/inferno/extensions/metrics/categorical.py#L104
But we should probably double check that this is the right thing to do.
We currently have a bunch of files in extensions/layers
that implement somewhat redundant functionality:
building_blocks
: Implements residual block in ResBlockBase
and ResBlock
prefab
: Implements residual block in ResidualBlock
res_unet
: Implements residual u-net.unet_base
: Implements u-net base class.I would vote to merge building_blocks
and prefab
and if possible also merge the residual block implementations in there. I like @DerThorsten suggestions to name the new file conv_blocks
,
because this makes clear what's in there.
Regarding the unet:
Maybe put everything into a single unet
file?
We should add more examples for the following things:
num_input>1
and num_output > 1
)badges/shields seem to be down / 'gray'
I guess travis, readthedocs and other urls needs to be updated due to repo ownership transfer.
inferno/inferno/trainers/basic.py
Line 1050 in 17e7262
UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
Something along the lines of TensorboardLogger(log_images_every='never')
is what we're after. The cleanest way of getting that going is by making inferno.utils.train_utils.Frequency
understand what 'never' means.
This is an issue to track the next inferno release built around PyTorch 0.4. Below is a list of what is to come, feel free to populate it and/or suggest changes.
Variables
,variable.data[0]
in the codebase),tensors.to(...)
or model.to(...)
),reduce=False
in all inferno-managed losses functions,Trainer
class to smaller classes to facilitate future support for multi-model trainers.To fully implement all 0.4+ features without bloating the codebase, we'd need to deprecate v0.3 and below, potentially invalidating a lot of code. I guess this can wait till v1.0.
The TensorboardLogger
fails when logging images and using tensorboardX 1.4 with the
stack trace below.
Note that this error does not occur in tensorboardX 1.2.
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/inferno/trainers/callbacks/logging/tensorboard.py", line 354, in log_image_or_volume_batch
self.log_images(tag, image_list, step)
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/inferno/trainers/callbacks/logging/tensorboard.py", line 395, in log_images
self.writer.add_image(tag, img_tensor=image, global_step=step)
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/tensorboardX/writer.py", line 412, in add_image
self.file_writer.add_summary(image(tag, img_tensor), global_step, walltime)
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/tensorboardX/summary.py", line 205, in image
image = make_image(tensor, rescale=rescale)
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/tensorboardX/summary.py", line 243, in make_image
image = Image.fromarray(tensor)
File "/home/pape/Work/software/conda/miniconda3/envs/torch41/lib/python3.6/site-packages/PIL/Image.py", line 2463, in fromarray
raise TypeError("Cannot handle this data type")
Already mentioned in #103, we should finally remove torch.autograd.Variable
If one uses SaveAtBestValidationScore , the first computed validation score is wrongly never considered as the best and therefore not saved.
[INFO ] Breaking to validate.
[INFO ] Validating.
[INFO ] validate generator exhausted, breaking.
[INFO ] Done validating. Logging results...
[INFO ] Validation loss: 3.2442782860133543; validation error: None
[INFO ] Current smoothed validation score 3.2442782860133543 is not better than the best smoothed validation score 3.2442782860133543.
I build a model and saved it using
trainer.save_every((1, 'epochs'))
trainer.save_to_directory(folder)
When I rerun my Python script to load and continue training the previous model I get an error.
This is my code.
def train(load=False, folder='out'):
print('starting training')
os.makedirs(folder, exist_ok=True)
# setup logger
Logger.instance().setup('log')
vae = Vae()
ds = MyDataset(root_folder=root_folder, training=True)
train_loader = torch.utils.data.DataLoader(ds, batch_size=512, num_workers=16)
# build trainer
trainer = Trainer(vae)
trainer.cuda()
trainer.build_criterion(vae.loss_function())
trainer.build_optimizer('Adam', lr=0.001)
# trainer.validate_every((2, 'epochs'))
trainer.save_every((1, 'epochs'))
trainer.save_to_directory(folder)
trainer.set_max_num_epochs(100)
# bind loaders
trainer.bind_loader('train', train_loader, num_inputs=1, num_targets=1)
# bind callbacks
trainer.register_callback(GarbageCollection())
# trainer.register_callback(ShowMinimalConsoleInfo())
if load:
trainer.load()
trainer.fit()
When calling train(load=True) I get the following error:
File "main.py", line 104, in my_train
trainer.fit()
File "/data/l989o/anaconda3/envs/hemo/lib/python3.7/site-packages/inferno/trainers/basic.py", line 1336, in fit
self.train_for(break_callback=lambda *args: self.stop_fitting(max_num_iterations,
File "/data/l989o/anaconda3/envs/hemo/lib/python3.7/site-packages/inferno/trainers/basic.py", line 1410, in train_for
batch = self.fetch_next_batch('train')
File "/data/l989o/anaconda3/envs/hemo/lib/python3.7/site-packages/inferno/trainers/basic.py", line 1092, in fetch_next_batch
self._loader_iters.update({from_loader: self._loaders[from_loader].__iter__()})
KeyError: 'train'
Any ideas how to fix it? Thanks.
This is the page we show most potential users, it shouldn't look shabby.
The TensorboardLogger
needs a end_of_validation_iteration
method. Also, the _trainer_states_being_observed
attribute needs to be split in two or more subsets (e.g. one for training and one for validation).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.