GithubHelp home page GithubHelp logo

labforcomputationalvision / plenoptic Goto Github PK

View Code? Open in Web Editor NEW
51.0 9.0 9.0 622.85 MB

Visualize/test models for visual representation by synthesizing images.

Home Page: https://plenoptic.readthedocs.io/en/latest/

License: MIT License

Python 99.95% Dockerfile 0.05%

plenoptic's Issues

Add `normalize_coefficients` method to Steerable Pyramid

This method would analytically normalize the steerable pyramid coefficients, which vary in their magnitude across scales. There are two reasons for this:

  1. Down-sampling between scales (when downsample=True). Because we downsample at each scale by a factor of two, the magnitude increases by four (two squared, two in each of the two dimensions).

  2. Natural images have more power in the lower frequencies than the higher frequencies, because they have 1/f power spectra. Therefore, we could up-weight the higher frequencies proportional to that. Should we correct for this? Make it an argument? When optimizing, you want them all to be approximately the same magnitude, but are there cases where you would want to correct for the first issue but not this?

Z-scoring the coefficients appears to be key for good metamer synthesis both for the Portilla-Simoncelli texture statistics and the PooledV1 model. If we do the above, that extra step might not be necessary.

Fix notebook tqdm

Our progress bar is created by tqdm, which has a separate version for working in notebooks: from tqdm.notebook import tqdm instead of from tqdm import tqdm. We would need to be able to tell whether the library is being imported from a notebook or not and change which tqdm we import.

It looks like tqdm also has an auto option, which should figure this out for us.

tools.rectangular_to_polar / tools.polar_to_rectangular fails tests

pytest for test_plenoptic.TestNonlinearities::test_coordinate_transform fails for unknown reason, likely to do with po.rescale.

Reproducible failed test (note torch.manual_seed(1) placed on line before second a = [...].

def test_coordinatetransform(self):
        a = torch.randn(10, 5, 256, 256)
        b = torch.randn(10, 5, 256, 256)

        A, B = po.polar_to_rectangular(*po.rectangular_to_polar(a, b))

        assert torch.norm(a - A) < 1e-3
        assert torch.norm(b - B) < 1e-3
        torch.manual_seed(1)
        a = torch.rand(10, 5, 256, 256)
        b = po.rescale(torch.randn(10, 5, 256, 256), -np.pi / 2, np.pi / 2)

        A, B = po.rectangular_to_polar(*po.polar_to_rectangular(a, b))

        assert torch.norm(a - A) < 1e-3
        assert torch.norm(b - B) < 1e-3

The last assert statement assert torch.norm(b - B) < 1e-3, comparisons of angles before and after what should be an identity transformation, is what failed.

Add color/channel support

Variety of stuff:

  • look for SSIM and color references
  • make sure synthesis methods can take and return multi-channel images (though what they do should depend on the model)
  • Integration with colour?

Fix RNG state when resuming synthesis

If you run synthesis twice in a row, you'll pick up more or less where you left off (assuming you set initial_image=None and learning_rate=None on the second call; this is pending the merger of the current ventral_model branch), with one major caveat: the state of the random number generator. We require a seed and always set it at the beginning of the synthesis call. If you resume synthesis with the same object in the same session, we can just allow the user to set seed=None and, if seed is None, don't set it.

However, if we save a metamer object, load it, and then resume synthesis (which is not uncommon when doing synthesis that takes a long period of time), currently we have no good way to resume the RNG state. Something like torch.random.fork_rng_state or what it does (I can't find an example code with how to use it) is probably what we want. But I'm not sure how to handle devices with this.

Grabbing the cpu state would be easy, my preference would be to do something like the following: at the end of synthesis, do self.cpu_rng_state = torch.get_rng_state(), make sure to save the cpu_rng_state attribute by adding it to the list of attributes in the save function and then, during load, call torch.set_rng_state(metamer.cpu_rng_state).

However, grabbing the GPU states apparently takes a long amount of time (see the warning in the function linked above) and we would only want to do it for the relevant devices. Currently the metamer object is not explicitly aware of what devices are relevant, which I prefer because it makes the code completely device-agnostic. However, it presents a problem here and I see three solutions:

  1. Don't try to resume GPU rng state (current situation)
  2. Grab RNG state from all available GPUs (as the fork_rng function linked above does if devices isn't specified), and set them all.
  3. Figure out what devices are being used. I think this is the best solution, and my preference for how to handle it would be to check initial_image.device and model.device. Currently, we do not require the model's device to be set and so it's very possible that there is no device attribute (my ventral stream models have device attribute). We could start encouraging it and default to 2 if it's not present.

If we do something like 2 (or do that as the default in 3), then we should probably require this option to be enabled, rather than always doing it, since it apparently takes time. And regardless of whether we do 2 or 3, it should happen at the end of the synthesis call.

Test not-downsampled pyramid

We currently only test the downsampled version of the pyramid against earlier implementations. Can we add a test of the not-downsampled version as well? We should be able to either up or down-sample, respectively, the coefficients in order to check against each other, and that should hopefully (if we do it in the same manner) account for the difference in magnitudes.

Overload batch dimension?

Geodesic and eigendistortion only work on inputs with a single element in the batch dimension and then overload it: eigendistortion makes use of it for the different eigenvectors, geodesic for different steps in the path between the two anchor images. Should Synthesis, MADCompetition, and Metamer do something similar?

Reorganize Documentation structure

In addition to docstrings, examples, and tutorials that we need, we need some good basic documentation that explains the idea behind this package, points to the associated papers, and lays out the basic ideas. Also should include stuff about basic API, how to use the various abstractions / more general functionality (coarse-to-fine optimization, plotting, etc). Those might not be necessary for final users, but are necessary for us while we work on the core.

Some potentially helpful info: open source guides from Github, Mozilla Science Working Open Workshop.

Add support for relative threshold

For Synthesis (and its subclasses), we have two stopping criteria: either you reach max_iter or your (absolute) loss decreases by less than loss_thresh over the past loss_change_iter iterations. But this is an absolute number, which is going to differ wildly depending on the magnitude of your loss.

Would like to add support for a relative threshold, rel_loss_thresh, which checks whether (absolute) loss has decreased by rel_loss_thresh * loss_prev over the past loss_change_iter iterations. On each iteration check if loss < loss - rel_loss_thresh * loss_prev and, if so, update loss_prev = loss. Keep going until there have been loss_change_iter iterations without that, and then break.

This would go into Synthesis._check_for_stabilization, and would need to do a similar check with coarse_to_fine

Add preprocess function

Similar to po.load_images (currently only in #38 branch) but also:

  • accepts any of paths, arrays, or tensors
  • can make differentiable or not (default yes)
  • can use full dynamic range or not (divide by e.g., np.iinfo(np.uint16).max
  • set output range (default [0, 1], other standard case [-1, 1]
  • like load_images, should make sure we return 4d tensor, optionally (and by default) convert to grayscale, and convert to torch.float32 (make the end dtype an option? not sure if we also need torch.float16 and torch.float64)

One of the difficulties of accepting arrays or tensors (rather than paths) is they are unlikely to still be in their original dtype (e.g., the Einstein image is stored as an 8 bit image on disk, but, depending on how it's loaded, could easily end up as a np.float32 array, but it might still have a max of 255). That's mainly an issue when it comes to determining what the max value is, though this probably isn't a huge issue: we're likely to receive either something that has been re-ranged or still has original values (it seems unlikely that someone would e.g., load an 8-bit image, multiply its values by 5, and then pass it to this function), so we might be able to just do a simple check: don't change anything with all pixel values within the output range, treat anything with all positive values and max between 1 and 255 as an 8-bit image, treat anything with all positive values and max between 255 and 65535 as a 16-bit image, and raise an exception for anything else.

Replace our autodiff jvp/vjp functions with PyTorch 1.5 built-in jvp/vjp?

Issue: The new stable release of PyTorch 1.5 includes autograd methods to compute vector-jacobian products (VJP) and jacobian-vector products (JVP). We rolled our own methods to compute these products to synthesize Eigendistortions. Should we replace our methods in favour of PyTorch's built-in functions to possibly reduce redundant code?

Short answer: No.

Long answer: We use the power method (and the Lanczos algorithm, which is a form of power method) to synthesize Eigendistortions. This requires calling VJP and JVP thousands of times. The way we do this now is to compute 1 forward pass of the model, maintain its graph, then iteratively use our functions to perform N backwards passes on that graph to compute N VJP/JVPs (i.e. N+1 operations to compute N products) This contrasts with PyTorch's implementation of VJP/JVP in that their methods perform a forward pass each time, thus requiring 2N operations to compute N products.

Add ability to move foveation location

ventral_stream.py models (and pooling.py windows) currently can only fixate at the center of the image, would be relatively simple to allow fixation location to be a parameter

Look into pytest-notebook

pytest-notebook looks like a good way to re-run notebooks and check that their output hasn't changed. Could be useful to add that to our tests, since we want to make sure that we don't break the tutorial notebooks with any changes we make (see if we can select some cells to have different outputs maybe? would we want to know anytime the output changes or just if you can't run the notebooks anymore?)

Abstract classes for model and synthesis objects?

In pyrtools, we had a pyramid class that we never wanted anyone to use, but that all the pyramid objects inherited. This made it easy to share relevant methods between them and make sure they had comparable attributes.

Should we do a similar thing for model and synthesis objects? I've written a whole bunch of code for the ventral stream models and for metamer that I feel could be relevant for other models and synthesis objects, and it would make standardization of the API easier. Using an abstract master class would make it easy to share these methods and make sure the attributes are consistent without requiring too much overhead (once the initial creation of the parent classes is finished...)

I've been meaning to abstract some of the stuff I've written for metamer and ventral stream models regardless. For example, the save and load methods (as well as the "reduced" version for the ventral stream model), and the display code. I've done a bit of work making the display code abstract already, but if we put it in a parent class, you'd have access to it for free.

Docs are broken

Maybe not surprising, but the docs are broken right now. Following the instructions outlined in CONTRIBUTING.md (with a fresh install of the plenoptic_docs environment), the make html command fails, The attached docs.log shows its output.

It's a bunch of errors, but probably the same thing over and over again.

OMP: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.

Describe the bug
Error:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.

To Reproduce
[Note: this will only be reproducible on Mac OSX and only sometimes.]

import matplotlib.pyplot as plt
import plenoptic as po
import torch

model = po.simul.Texture_Statistics([256,256])
image = plt.imread('../data/nuts.pgm').astype(float)/255.
im0 = torch.tensor(image, requires_grad=True, dtype = torch.float32).squeeze().unsqueeze(0).unsqueeze(0)
c = po.RangeClamper([image.min(), image.max()])
M = po.synth.Metamer(im0, model)

producing the following message:

OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Note: If this error occurs during the use of a jupyter notebook, then the kernel dies , producing the error message above in the Terminal and the following message in the Jupyter notebook:

image

System (please complete the following information):

  • OS: MacOSX 10.15.6
  • Python version 3.7
  • Pytorch version 1.6
  • Plenoptic version 0.1

Utilize symmetry of fft for real steerable pyramid

In the real steerable pyramid, utilize the benefits of the symmetric fft in rfft and irfft to make the computation more efficient. This will require utilizing onesided=True for these cases and then adjusting the mask sizes etc. to make the rest of the code compatible.

Look into napari

Seems like a useful way to visualize nD arrays: https://ilovesymposia.com/2019/10/24/introducing-napari-a-fast-n-dimensional-image-viewer-in-python/. Also has explicit support for image pyramids: http://napari.org/tutorials/image.html#image-pyramids

Want to check whether we can subclass it / extend it like we did with pyrtools.imshow, in order to make sure:

  1. There's no interpolation or smoothing in the display of an array
  2. Arrays are displayed as either zoomed out by a power of two or zoomed in by an integer, no intermediate values (which would lead to interpolation)

It looks like it's built on top of VisPy rather than matplotlib, don't know anything about the differences between them.

FourMomentsClamper failures

The FourMomentsClamper for metamer synthesis doesn't seem to be working. In order to get it working on gpus, I need to add a bunch of device=ch.device throughout it (see the ventral_stream branch), but now I'm running into CUDA error: an illegal memory access was encountered.. I'm not sure this happens every time?

I sometimes have gotten even stranger errors: MAGMA geev : Argument 4 : illegal value at /opt/conda/cond-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathMagma.cu:220

So I'm not sure what to make of that. I don't understand enough of what's happening in that function (specifically, it's the modkurt function), but would probably be worth cleaning it up so there's no so much creation of new tensors throughout.

Get set up with Software Heritage ID

For citing, we'll ideally have a paper in JOSS (which will help publicize the project and get us some feedback). But we'd also like something to allow people to specify exactly what version they used. Software heritage IDs seems like a good way to do that (requires putting together a codemeta.json), so look into it.

Idea from this blog post, which recommends it.

Add support for coarse_to_fine for steerable pyramid

Want the steerable pyramid to have support for coarse to fine optimization, which means that it should accept scales the way that the ventral stream models do. This will improve efficiency for those models and will help Portilla-Simoncelli coarse-to-fine as well.

Create display tutorial

Create a tutorial showing how to use all the Synthesis display code. Show basic usage, how you can customize the size of the plot and its contents, and the fine-grained control allowed by axes_idx. Also, how animate works pretty easily.

For advanced usage, discuss update_plot?

FrontEnd model produces artefactual eigendistortions near boundaries

In its current form, the FrontEnd model produces eigendistortions near the edges for several input images of varying crop size. We are currently using refletctionpad2d boundary handling. This issue could possibly be be resolved with a frequency domain implementation, as suggested in existing issue #23, which would in effect implement circular boundary handling. Alternatively, we could leave our convs in the spatial domain and try various other boundary handling options.

I tried applying a circular diskmask to the image during the forward() call. In this case the eigendistortions just ended up at the edges of the circular mask.

The 31x31 conv2d weights we're using are pre-trained using a model that was trained on images of dim 384x512. I tried using images of this size as well to synthesize eigendistortions and still got eigendistortions near the edge.

Check test coverage

Probably worth using something to check how complete our test coverage is (that is, are we not missing a test): see here for general discussion and pytest-cov for a library we could use.

Refactoring autodiff.py

Right now, these functions are just used in eigendistortion and are fine as written, but they're helpful in other contexts: e.g., if you want to use torch.backward(output), and output is not a scalar, you need to pass a gradient vector, and the Jacobian-vector product should be used (assuming I'm understanding the documentation). So we should make these functions easier to use in other contexts, such as our standard way of interacting with models: when input is 4d and output is 3d or 4d (with possibly multiple batches and channels)

Add plenoptic.imshow

When using our pyrtools.imshow, it's annoying to convert the tensors to arrays all the time (and call squeeze and all that), so let's create a wrapper around it that handles it automatically.

Should probably live in tools/display.py and have same call signature as pyrtools.imshow, should call plenoptic/tools/data.to_numpy on each image and .squeeze() them. Not sure if it would need anything else.

Add MS-SSIM

We had an implementation of this, but removed it because of difficulties getting it implemented. The function is pasted below as a starting point.

This repo contains the MATLAB MS-SSIM code from Zhou Wang's website that's referenced in the code below, and that should be used for generating values to match

Things to watch out for:

  • When comparing the images curie.pgm and einstein.pgm from this repo, some of the mcs values are negative, which leads to issues because pytorch doesn't support complex values right now (we have workarounds in our steerable pyramid, where we put the real and imaginary in a 5th dimension, at the end). In matlab, the returned value is complex (.0289+.0666i) and I don't know enough about MS-SSIM to know if this is reasonable
  • mcs and mssims will be (5, b, c) tensors, where b and c are the numbers of batches and channels that we get when comparing img1 and img2. weights is a 1d tensor with 5 elements, and so mcs**weights (or, equivalently, torch.pow(mcs, weights)) will be a (5,b,5) (not sure what happens with channels) tensor, from which we want the diagonal of each batch. This is a little clunky and there's probably a better way to do it.
  • Would we want ability to change levels? Where would weights come from then?
def msssim(img1, img2, dynamic_range=1, normalize=False):
    device = img1.device
    weights = torch.FloatTensor([0.0448, 0.2856, 0.3001, 0.2363, 0.1333]).to(device)
    levels = weights.size()[0]
    mssims = []
    mcs = []
    for _ in range(levels):
        ssim_map, contrast_map, _ = _ssim_parts(img1, img2, dynamic_range=dynamic_range)
        mssims.append(ssim_map.mean((-1, -2)))
        mcs.append(contrast_map.mean((-1, -2)))

        img1 = F.avg_pool2d(img1, (2, 2))
        img2 = F.avg_pool2d(img2, (2, 2))

    mssims = torch.stack(mssims)
    mcs = torch.stack(mcs)

    # Normalize (to avoid NaNs during training unstable models, not compliant with original definition)
    if normalize:
        mssims = (mssims + 1) / 2
        mcs = (mcs + 1) / 2

    # This does not work as written -- a tensor with 5 elements raised to
    # another tensor with five elements returns a 5x5 tensor, from which we
    # want the diagonals. And some values in mcs can be negative, which leads
    # to difficulties
    pow1 = mcs ** weights
    pow2 = mssims ** weights
    # From Matlab implementation https://ece.uwaterloo.ca/~z70wang/research/iwssim/
    output = torch.prod(pow1[:-1] * pow2[-1])
    return output

Reproduce SSIM/MSE MAD Competition example from paper

MAD Competition is working now, but in order to be certain about it, we want to synthesize some images that match examples from the MATLAB code (though they won't be identical, should be in the ballpark)

Additionally, I found a weird issue with the example in the Simple_MAD notebook: when using po.add_noise to generate the initial image, the image generated would always lie along the forward or reverse diagonal (e.g., from base image [.5, .5] to [.6, .4]), which gives you L1 and L2 loss contours such that the circle is completely circumscribed within the square:

Selection_010

In this case, MAD Competition (with the parameters set up in that notebook) completely failed to find any solution for it. Need to think about both why there appear to be such limited possible values and why MAD Competition has trouble here, but for now that means going back to the earlier way of adding noise. Should maybe add option to specify initial image?

Feature request: user-defined num_orientations in steerable_pyramid_freq

In simulate.canonical_computations.steerable_pyramid_freq.Steerable_Pyramid_Freq we should allow the user to define the number of filter orientations. This would obviate using steer_coeffs method, and would explicitly have responses of each oriented filter returned in the response tensor as additional channels.

Make public

We want to make public by July at the latest so Nikhil can share stuff related to his NeurIPS project.

Before that, we want to:

  • merge the open PR #19

(#38 may include breaking changes, #24 unnecessary)

Then we'll:

  • make a Github release (e.g., v0.1-neurips)
  • add alpha or WiP badge and language?
  • get a doi for that release (probably from zenodo)

Make FrontEnd more efficient

The FrontEnd model is a very useful one, and would be great to have in some examples, but it's right now so inefficient that synthesizing with it is very slow. How can we make it more efficient?

Discussed a bit with @pehf and my understanding is the main issue is that it's convolving with 31x31 kernels in the signal domain (I haven't profiled it to investigate). If that's so, will get slower as a function of image size. Could we not just take the Fourier transform of the kernel and the image, multiply together, and take the inverse Fourier transform (like the way our steerable pyramid implementation works)?

replace instances with torch.tensor when applicable

Instantiating a tensor via torch.tensor should be avoided when torch.from_numpy or torch.as_tensor can be used instead. This is bc torch.tensor always copies data, whereas the other to do not, or at the very least avoid when possible.

Add geodesics

The basics of this has been completed, but needs more work to finalize.

Make eigendistortion accept batch/channel images

We want all of our synthesize methods to expect 4d images: (batch, channel, height, width) and expect the model outputs to be 3d or 4d: (batch, channel, y_1) or (batch, channel, y_1, y_2). Eigendistortions right now does not.

Right now, the synthesis methods are probably too memory-intensive to make this way of doing things reasonable. It would require some more thinking about how to parallelize across batches / channels that none of us need right now.

Add MAD competition

  • add support for metrics
  • add tests
  • move more initialization to Synthesis superclass
  • check with coarse-to-fine, plot, and animate code

This is linked to #17, will be in same PR

Add GPU tests

We want to make sure that our code runs on GPUs with very little overheard.

Currently, there are two steps for that:

  1. Make sure everything runs on GPU in same manner. See metamer.py, steerable_pyramid_freq.py, pooling.py, and ventral_stream.py for my preferred way, but basically: none of our synthesis methods nor models should set the device anywhere:

    • Models and synthesis methods should have a .to method, which moves all tensor attributes over to given device/dtype, and then all of its methods should work regardless of which device they're on. This can be done by using things like torch.ones_like; if a new tensor needs to be created (and you can't use torch.ones_like or something like it), its device should be explicitly set to that of that method's input. If method has no input, check one of the tensor attributes.
  2. Figure out how to make Travis CI work with CUDA. There's an open issue on this, so it might not be trivial, but they link an existing project which has a .travis.yml file we could try modifying.

Pytorch-ify PooledVentralStream

PooledVentralStream models are not quite pytorch-ic: they should have the different computations as layers, each of which are torch.nn.modules, allowing for hooks (see here) and don't store memory-intensive attributes. Attributes should only be metadata, and have methods for converting the tensor output into the more structured representational form for visualization / understanding (but do not store it as an attribute)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.