GithubHelp home page GithubHelp logo

mic-dkfz / trixi Goto Github PK

View Code? Open in Web Editor NEW
219.0 14.0 19.0 15.9 MB

Manage your machine learning experiments with trixi - modular, reproducible, high fashion. An experiment infrastructure optimized for PyTorch, but flexible enough to work for your framework and your tastes.

Home Page: https://trixi.readthedocs.io

License: MIT License

Makefile 0.04% Python 87.47% CSS 1.29% JavaScript 0.70% HTML 10.43% Shell 0.07%
deep-learning deeplearning pytorch visdom experiment-infrastructure visdom-logger deep-neural-networks deep-reinforcement-learning machine-learning pytorch-cnn

trixi's Introduction

DOI PyPI version Build Status Documentation Status Downloads GitHub

Finally get some structure into your machine learning experiments. trixi (Training & Retrospective Insights eXperiment Infrastructure) is a tool that helps you configure, log and visualize your experiments in a reproducible fashion.

Contribute

We're always grateful for contributions, even small ones! We're PhD students and this is just a side project, so there will always be something to improve.

The best way is to create pull requests on Github. Fork the repository and work either directly on develop or create a feature branch, whichever you like best. Then go to "Pull requests" on our Github, select "New pull request" and "compare across forks". Select our develop as base and your work as head/compare.

We currently don't support the full Github workflow, because we have to mirror from our working repository to Github, but don't worry, we can export the pull requests and apply them so that your contribution will still appear on Github :)

Features

trixi consists of three parts:

  • Logging API
    Log whatever data you like in whatever way you like to whatever backend you like.

  • Experiment Infrastructure
    Standardize your experiment, let the framework do all the inconvenient stuff, and simply start, resume, change and finetune all your experiments.

  • Experiment Browser
    Compare, combine and visually inspect the results of your experiments.

An implementation diagram is given here.

Logging API

The Logging API provides a standardized way for logging results to different backends. The Logging API supports (among others):

  • Values
  • Text
  • Plots (Bar, Line, Scatter, Piechart, ...)
  • Images (Single, Grid)

And offers different Backends, e.g. :

And an experiment-logger for logging your experiments, which uses a file logger to automatically create a structured directory and allows storing of config, results, plots, dict, array, images, etc. That way your experiments will always have the same structure on disk.

Here are some examples:

visdom-logger

  • Files:

file-logger

  • Telegram:

telegram-logger

Experiment Infrastructure

The Experiment Infrastructure provides a unified way to configure, run, store and evaluate your results. It gives you an experiment interface, for which you can implement the training, validation and testing. Furthermore it automatically provides you with easy access to the Logging API and stores your config as well as the results for easy evaluation and reproduction. There is an abstract Experiment class and a PytorchExperiment with many convenience features.

exp-trainexp-test

For more info, visit the Documentation.

Experiment Browser

(We're currently remaking this from scratch, expect major improvements :))

The Experiment Browser offers a complete overview of experiments along with all config parameters and results. It also allows to combine and/or compare different experiments, giving you an interactive comparison highlighting differences in the configs and a detailed view of all images, plots, results and logs of each experiment, with live plots and more. trixi browser

Installation

Install trixi:

pip install trixi

Or to always get the newest version you can install trixi directly via git:

git clone https://github.com/MIC-DKFZ/trixi.git
cd trixi
pip install -e .

Documentation

The docs can be found here: trixi.rtfd.io

Or you can build your own docs using Sphinx.

Sphinx Setup

Install Sphinx (fixed to 1.7.0 for now because of issues with Readthedocs):
pip install sphinx==1.7.0

Generate HTML:
path/to/PROJECT/doc$ make html

index.html will be at:
path/to/PROJECT/doc/_build/html/index.html

Notes

  • Rerun make html each time existing modules are updated (this will automatically call sphinx-apidoc)
  • Do not forget indent or blank lines
  • Code with no classes or functions is not automatically captured using apidoc

Example Documentation

We use Google style docstrings:

def show_image(self, image, name, file_format=".png", **kwargs):
    """
    This function shows an image.

    Args:
        image(np.ndarray): image to be shown
        name(str): image title
    """

Examples

Examples can be found here for:

How to Cite

If you use trixi in your project, we'd appreciate a citation, for example like this

@misc{trixi2017,
  author = {Zimmerer, David and Petersen, Jens and Köhler, Gregor and Wasserthal, Jakob and Adler, Tim and Wirkert, Sebastian and Ross, Tobias},
  title = {trixi - Training and Retrospective Insight eXperiment Infrastructure},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/MIC-DKFZ/trixi}},
  doi = {10.5281/zenodo.1345136}
}

trixi's People

Contributors

dzimmerer avatar dzimmm avatar elpequeno avatar emrys-merlin avatar gregorkoehler avatar justusschock avatar mzenk avatar orippler avatar schnobi1990 avatar swirkert avatar wasserth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trixi's Issues

Dependencies breaking

Hi,

I just noticed, that in your utils, you are using scipy.misc.imsave. This function was deprecated in scipy 1.0 and removed in newer releases, which causes imports to break with your requirement scipy>=0.19.1.

You'd have to adapt either your requirement or the import (the function was moved to imageio).

Best,
Justus

Missing Dependency warnings

If dependency modules are not installed, a warning is raised.

E.g. if torch is missing, the raised Warning is:

Could not import Pytorch related modules.
No module named 'torch'

I would like to make this a ImportWarning to make this filterable (e.g. the same warning is only shown once or completely ignored).

Is it okay, if I open a PR on this? On which branch should I start working?

Can't see data(scalars/images) in tensorboard

I use the develop branch of trixi and install it by pip install -e TRIXI_DIR.

I use the PytorchExperiment infrastructure and use self.add_result to add scalars into tensorboard logger.

The tfevents files indeed exist in the save/tensorboard folder, but they are not shown in the tensorbard webpage.

How can I inspect this problem?

Install Requirements

Hi,

Since trixi does not necessarily depend on PyTorch, would it be possible to leave it out of the dependencies inside the setup.py and on pip?

I'd like to use trixi in a project but don't want a hard PyTorch dependency. From what I've seen, an optional dependency should be sufficient.

Greetings,
Justus

Corrupted Installation

Hi,

the Installation via PyPi fails, because the requirements_full.txt cannot be found.

Just to let you know.

EDIT:
Installation from git was successful, but importing failed due to missing subpackages (git installation fixed in #5 )

urls must start with a leading slash

What I did:

System : Ubuntu 18.04 , Python 3.6.5

cd trixi/examples
python3 train_net_pytorchexperiment.py
python3 train_net_pytorchexperiment.py

This produced several subdirs and files in : MNIST_experiment

Then to vizualize the experiments, I started the trixi browser as:

python3 -m trixi.browser MNIST_experiment

which raised the exception

 File "/home/fix_jer/GIT/trixi/trixi/experiment_browser/browser.py", line 77, in start_browser
    app = create_flask_app(base_dir)
  File "/home/fix_jer/GIT/trixi/trixi/experiment_browser/browser.py", line 59, in create_flask_app
    app.register_blueprint(blueprint)
[......]
  File "/home/fix_jer/.local/lib/python3.6/site-packages/werkzeug/routing.py", line 606, in __init__
    raise ValueError('urls must start with a leading slash')
ValueError: urls must start with a leading slash

Trying to blindly investigate (because I do not know the underlying Flask and so stuff) what is going on , I saw that in

blueprint = Blueprint("data", __name__, static_url_path=base_dir, static_folder=base_dir)
:

blueprint = Blueprint("data", __name__, static_url_path=base_dir, static_folder=base_dir)

base_dir is the string "MNIST_experiment" and we follow the exception down to werkzeug/routing.py, the exception is raised because in the code https://github.com/pallets/werkzeug/blob/master/werkzeug/routing.py l606 , the string does not effectively begin with a leading '/' and is actually "MNIST_experiment/path:filename"

If I just add a leading '/' in the constructor call of Blueprint ,

blueprint = Blueprint("data", __name__, static_url_path='/' + base_dir, static_folder=base_dir)

It "works" in the sense that the leading '/' is appended down to the routing.py script and the exception is not raised...... but I just do not know if this is doing what it is supposed to do .

best;

Improve consistency of `counter` use for tensorboard logging

Hi all,

I think the consistency of tensorboard logging with regards to the counter argument could be improved. I would suggest using self.val_dict for all logging operations and using counter only to override the internal state of self.val_dict[key], similar as is used in show_value.

In show_image_grid, counter is directly used as the global_step, forcing the user to keep an external counter for every image_grid that is provided at every logging call.

Error of get_vars_from_sys_argv() in pytorchexperiment.py

In the get_vars_from_sys_argv() function within pytorchexperiment.py, you use

return param.get("config_path"), param.get("resume_path")

to get the config/resume path.

However, param here is an argparse.Namespace object, which does not have a get method. And if one specify parse_sys_argv=True during the initialization of a PytorchExperiment, he will get

AttributeError: 'Namespace' object has no attribute 'get'

a combi logger method failed

Hi, thanks for your excellent experiment framework!

When I begin to use this framework, everything works fine. But accidently my $HOME directory runs out of disk space (which cause one of my jupyter kernel running trixi experiment failed). After clean and free the disk space, I run a trixi experiment from command line but get the following error when I want to add a numerical result.

a combi logger method failed: <bound method ExperimentLogger.show_value of <trixi.logger.experiment.pytorchexperimentlogger.PytorchExperimentLogger object at 0x7f29552adba8>>

The same experiment worked fine yesterday, but when I re-run it, I get this weird result.

I'm struggling to understand elog and clog

Can I ask some more about clog and elog? I looked into it as much as I could, and I can see that clog is a 'combined' logger? I don't think I'm understanding the concept of what a logger is supposed to be, and also what the difference between self.elog.print() and self.clog.show_text() is, they seem to result in similar things

support for tensorflow

hey guys,

interesting project! I was wondering if you are considering to add tensorflow support at some point in the future?

cheers

Max

setting port

Is it possible to set the port for visdom in trixi? if so how? I can manually do "python -m visdom.server -port=8890" but if I use the "PytorchExperiment" class to automatically start visdom (like in your UNet Example) I do not know where to specify the port....

pytorch_experiment.py example not working

following up #14

There maybe some more update again. But it would be good to have pytorch_experiment.py updated also.

use loggers={"visdom": "visdom"} instead of loggers={"vlog": "visdom"}, which is because of pytorchexperiment.vlog(self) implementation.

 exp = MNIST_experiment(config=c, name='experiment', n_epochs=c.n_epochs,
                           seed=42, base_dir='./experiment_dir',
                           loggers={"visdom": "visdom"})

pytorch torchvision.utils.save_image method signature below, but pytorchplotfilelogger.save_image_grid_static

# pytorchplotfilelogger.save_image_grid_static
pytorchplotfilelogger.save_image_grid_static(...):
  tv_save_image(tensor=tensor, filename=img_file, **image_args) # this does not match save_image signature below.

# torchvision.utils.save_image method signature below
def save_image(
    tensor: Union[torch.Tensor, List[torch.Tensor]],
    fp: Union[Text, pathlib.Path, BinaryIO],
    format: Optional[str] = None,
    **kwargs
) -> None:

result in the error below:

Unhandled exception in thread started by <function PytorchPlotFileLogger.save_image_grid_static at 0x000002456711CAF8>
Traceback (most recent call last):
  File "C:\Users\...\lib\site-packages\trixi\logger\file\pytorchplotfilelogger.py", line 183, in save_image_grid_static
    tv_save_image(tensor=tensor, filename=img_file, **image_args)
  File "C:\Users\...t\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
TypeError: save_image() missing 1 required positional argument: 'fp'

Are you still working on the project?

Hi!

I found your library while working on the code from a paper about anomaly detection with VAEs. I was just wondering if you are still working on trixi? Or is the 2019 release the last release?

Thank you,
Valentina

No module named 'trixi.logger'

ubuntu 16.04 python 3.6
I succeeded to install trixi. And I want to try your "pytorch_example.ipynb", it shows No module named 'trixi.logger'? I don't know why?

multiprocessing crashing under windows

Seems like on windows some issue with spawning a thread on windows. There is a reference to this problem on pytorch documentation:

https://pytorch.org/docs/stable/notes/windows.html#usage-multiprocessing

Any chance there is a quickfix for this?

c:\dev\libs\trixi\trixi_init_.py:3: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was originally set to 'Qt5Agg' by the following code:
File "giana_dataloader.py", line 2, in
import matplotlib.pyplot as plt
File "C:\dev\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 71, in
from matplotlib.backends import pylab_setup
File "C:\dev\anaconda3\lib\site-packages\matplotlib\backends_init_.py", line 16, in
line for line in traceback.format_stack()

if use_agg: matplotlib.use("Agg")
Traceback (most recent call last):
File "giana_dataloader.py", line 218, in
train(image_gt_file_list_all)
File "giana_dataloader.py", line 164, in train
mylog = pvl(name='GIANA')
File "c:\dev\libs\trixi\trixi\logger\visdom\pytorchvisdomlogger.py", line 23, in init
super(PytorchVisdomLogger, self).init(*args, **kwargs)
File "c:\dev\libs\trixi\trixi\logger\visdom\numpyvisdomlogger.py", line 64, in init
self._process.start()
File "C:\dev\anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\dev\anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\dev\anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\dev\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\dev\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Visdom.setup_socket..run_socket'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "c:\dev\libs\trixi\trixi\logger\visdom\numpyvisdomlogger.py", line 852, in exit
self._process.terminate()
File "C:\dev\anaconda3\lib\multiprocessing\process.py", line 116, in terminate
self._popen.terminate()
AttributeError: 'NoneType' object has no attribute 'terminate'

c:\dev\libs\trixi\trixi_init_.py:3: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was originally set to 'Qt5Agg' by the following code:
File "", line 1, in
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\dev\anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\dev\anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\dev\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "c:\dev\endovis2018-challenge\giana\dataloader\giana_dataloader.py", line 2, in
import matplotlib.pyplot as plt
File "C:\dev\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 71, in
from matplotlib.backends import pylab_setup
File "C:\dev\anaconda3\lib\site-packages\matplotlib\backends_init
.py", line 16, in
line for line in traceback.format_stack()
if use_agg: matplotlib.use("Agg")
Traceback (most recent call last):
File "", line 1, in
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\dev\anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Unable to build because of dependency breaking

File "run_train_pipeline.py", line 28, in
from experiments.UNetExperiment import UNetExperiment
File "/data/Ramesh/Implementation/basic_unet_example/experiments/UNetExperiment.py", line 29, in
from trixi.experiment.pytorchexperiment import PytorchExperiment
File "/data/Ramesh/Implementation/segExp/lib/python3.6/site-packages/trixi/experiment/pytorchexperiment.py", line 18, in
from trixi.logger import CombinedLogger, PytorchExperimentLogger, PytorchVisdomLogger, TelegramMessageLogger
ImportError: cannot import name 'PytorchExperimentLogger'gm

Logging Frequency

Hi,
Is there a possibility to introduce a logging frequency to trixi's logging?

The problem is: Visdom caches everything that has been logged into the RAM which causes a memory overflow if training several networks on the same machine.

A logging frequency in combination with a logging_behavior mith be a good solution to this:
the logging frequency would define, how often values will be logged and the logging behavior would define, what kind of values would be logged (e.g. all values between the logged ones will be skipped, or the mean of all values since the last logged value will be logged).

This could also improve another problem: If logging many values, the background processes used for logging are really slow, causing the logging to be far behind the actual training state.

Do you plan to integrate such a feature or would you be open for a PR adressing this?

Best,
Justus

slackclient 2.X breaks trixi

slackclient released version 2.0.1 with a new package structure, and it breaks any new install of trixi:
running

pip install -e .

To work around #15:

pip install scipy==1.1.*

And then:

pytest

everything fails since pytorchexperiment can't be imported.

To fix this you need to:

pip install slackclient==1.*

vlog in pytorch_experiment.py

Hi,

I encounter a problem when I try to run pytorch_experiment.py from the examples dir. self.vlog seems to be None. I use trixi 0.1.2.1 via pip installation with visdom 0.1.8.8. This problem is not present in trixi 0.1.1.6. Here is the error message:

  File "~/.local/lib/python3.6/site-packages/trixi/experiment/experiment.py", line 71, in run
    self.setup()

  File "<ipython-input-8-fec987966c35>", line 34, in setup
    self.vlog.plot_model_structure(self.model,

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-a33583383f44> in <module>()
----> 1 exp.run()

~/.local/lib/python3.6/site-packages/trixi/experiment/experiment.py in run(self, setup)
    101             self._time_end = time.strftime("%y-%m-%d_%H:%M:%S", time.localtime(time.time()))
    102 
--> 103             raise e
    104 
    105     def run_test(self, setup=True):

~/.local/lib/python3.6/site-packages/trixi/experiment/experiment.py in run(self, setup)
     69 
     70             if setup:
---> 71                 self.setup()
     72                 self._setup_internal()
     73             self.prepare()

<ipython-input-8-fec987966c35> in setup(self)
     32 
     33         self.save_checkpoint(name="checkpoint_start")
---> 34         self.vlog.plot_model_structure(self.model,
     35                                        [self.config.batch_size, 1, 28, 28],
     36                                        name='Model Structure')

AttributeError: 'NoneType' object has no attribute 'plot_model_structure'

Since I did not change anything I guess this is a bug. Please let me know if you need more information.

Otherwise this is really cool project, thank you!

Best regards

Removing experients from trixi browser does not work

I was trying to remove some experiments from the trixi browser. If I select them and click remove, they disappear from the list as expected. If I then reload the page, they reappear. This is kind of inconvenient.

I tested it with trixi versions 0.1.2.1 and 0.1.2.2.

Easy way to store predictions?

Hi! Thanks for making such a great tool! I've had the pleasure of exploring it over the weekend. My question about whether it's easy to store the predictions made by the model? Or should I write my own code for this and make a new folder etc. Just want to check whether you've already made a framework for this

Regarding the Model Structure plot

Hi, How can i see the output created by this code: "self.vlog.plot_model_structure(self.model, [self.config.batch_size, 1, 28, 28], name='Model Structure')"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.