labforcomputationalvision / plenoptic Goto Github PK
View Code? Open in Web Editor NEWVisualize/test models for visual representation by synthesizing images.
Home Page: https://plenoptic.readthedocs.io/en/latest/
License: MIT License
Visualize/test models for visual representation by synthesizing images.
Home Page: https://plenoptic.readthedocs.io/en/latest/
License: MIT License
This method would analytically normalize the steerable pyramid coefficients, which vary in their magnitude across scales. There are two reasons for this:
Down-sampling between scales (when downsample=True
). Because we downsample at each scale by a factor of two, the magnitude increases by four (two squared, two in each of the two dimensions).
Natural images have more power in the lower frequencies than the higher frequencies, because they have 1/f
power spectra. Therefore, we could up-weight the higher frequencies proportional to that. Should we correct for this? Make it an argument? When optimizing, you want them all to be approximately the same magnitude, but are there cases where you would want to correct for the first issue but not this?
Z-scoring the coefficients appears to be key for good metamer synthesis both for the Portilla-Simoncelli texture statistics and the PooledV1
model. If we do the above, that extra step might not be necessary.
Our progress bar is created by tqdm
, which has a separate version for working in notebooks: from tqdm.notebook import tqdm
instead of from tqdm import tqdm
. We would need to be able to tell whether the library is being imported from a notebook or not and change which tqdm
we import.
It looks like tqdm also has an auto option, which should figure this out for us.
pytest for test_plenoptic.TestNonlinearities::test_coordinate_transform fails for unknown reason, likely to do with po.rescale
.
Reproducible failed test (note torch.manual_seed(1)
placed on line before second a = [...]
.
def test_coordinatetransform(self):
a = torch.randn(10, 5, 256, 256)
b = torch.randn(10, 5, 256, 256)
A, B = po.polar_to_rectangular(*po.rectangular_to_polar(a, b))
assert torch.norm(a - A) < 1e-3
assert torch.norm(b - B) < 1e-3
torch.manual_seed(1)
a = torch.rand(10, 5, 256, 256)
b = po.rescale(torch.randn(10, 5, 256, 256), -np.pi / 2, np.pi / 2)
A, B = po.rectangular_to_polar(*po.polar_to_rectangular(a, b))
assert torch.norm(a - A) < 1e-3
assert torch.norm(b - B) < 1e-3
The last assert statement assert torch.norm(b - B) < 1e-3
, comparisons of angles before and after what should be an identity transformation, is what failed.
Variety of stuff:
If you run synthesis twice in a row, you'll pick up more or less where you left off (assuming you set initial_image=None
and learning_rate=None
on the second call; this is pending the merger of the current ventral_model
branch), with one major caveat: the state of the random number generator. We require a seed and always set it at the beginning of the synthesis call. If you resume synthesis with the same object in the same session, we can just allow the user to set seed=None
and, if seed is None
, don't set it.
However, if we save a metamer object, load it, and then resume synthesis (which is not uncommon when doing synthesis that takes a long period of time), currently we have no good way to resume the RNG state. Something like torch.random.fork_rng_state or what it does (I can't find an example code with how to use it) is probably what we want. But I'm not sure how to handle devices with this.
Grabbing the cpu state would be easy, my preference would be to do something like the following: at the end of synthesis, do self.cpu_rng_state = torch.get_rng_state()
, make sure to save the cpu_rng_state
attribute by adding it to the list of attributes in the save
function and then, during load
, call torch.set_rng_state(metamer.cpu_rng_state)
.
However, grabbing the GPU states apparently takes a long amount of time (see the warning in the function linked above) and we would only want to do it for the relevant devices. Currently the metamer
object is not explicitly aware of what devices are relevant, which I prefer because it makes the code completely device-agnostic. However, it presents a problem here and I see three solutions:
fork_rng
function linked above does if devices isn't specified), and set them all.initial_image.device
and model.device
. Currently, we do not require the model's device to be set and so it's very possible that there is no device
attribute (my ventral stream models have device attribute). We could start encouraging it and default to 2 if it's not present.If we do something like 2 (or do that as the default in 3), then we should probably require this option to be enabled, rather than always doing it, since it apparently takes time. And regardless of whether we do 2 or 3, it should happen at the end of the synthesis call.
pytorch 1.6 adds SWA, which may help us find a greater diversity of synthesized metamers.
We currently only test the downsampled version of the pyramid against earlier implementations. Can we add a test of the not-downsampled version as well? We should be able to either up or down-sample, respectively, the coefficients in order to check against each other, and that should hopefully (if we do it in the same manner) account for the difference in magnitudes.
Breaks w/ reshaping or with handling of multiple titles.
This is a placeholder issue. Will edit and fill this out later when I get a chance.
It is nested within the method that computes the exact eigendecomposition currently. Instead it would be useful to pull it out of that and have it be indepdendently callable, returning the Jacobian.
Using the implementations in torchvision, how to work with e.g., AlexNet, VGG16
Geodesic and eigendistortion only work on inputs with a single element in the batch dimension and then overload it: eigendistortion makes use of it for the different eigenvectors, geodesic for different steps in the path between the two anchor images. Should Synthesis
, MADCompetition
, and Metamer
do something similar?
In addition to docstrings, examples, and tutorials that we need, we need some good basic documentation that explains the idea behind this package, points to the associated papers, and lays out the basic ideas. Also should include stuff about basic API, how to use the various abstractions / more general functionality (coarse-to-fine optimization, plotting, etc). Those might not be necessary for final users, but are necessary for us while we work on the core.
Some potentially helpful info: open source guides from Github, Mozilla Science Working Open Workshop.
For Synthesis
(and its subclasses), we have two stopping criteria: either you reach max_iter
or your (absolute) loss decreases by less than loss_thresh
over the past loss_change_iter
iterations. But this is an absolute number, which is going to differ wildly depending on the magnitude of your loss.
Would like to add support for a relative threshold, rel_loss_thresh
, which checks whether (absolute) loss has decreased by rel_loss_thresh * loss_prev
over the past loss_change_iter
iterations. On each iteration check if loss < loss - rel_loss_thresh * loss_prev
and, if so, update loss_prev = loss
. Keep going until there have been loss_change_iter
iterations without that, and then break.
This would go into Synthesis._check_for_stabilization
, and would need to do a similar check with coarse_to_fine
There's a lot of repetition in our docstrings. one way to avoid that is to do what seaborn does and define a big dict of strs with shared docs and then format the docstrings as necessary.
Similar to po.load_images
(currently only in #38 branch) but also:
np.iinfo(np.uint16).max
[0, 1]
, other standard case [-1, 1]
load_images
, should make sure we return 4d tensor, optionally (and by default) convert to grayscale, and convert to torch.float32
(make the end dtype an option? not sure if we also need torch.float16
and torch.float64
)One of the difficulties of accepting arrays or tensors (rather than paths) is they are unlikely to still be in their original dtype (e.g., the Einstein image is stored as an 8 bit image on disk, but, depending on how it's loaded, could easily end up as a np.float32
array, but it might still have a max of 255). That's mainly an issue when it comes to determining what the max value is, though this probably isn't a huge issue: we're likely to receive either something that has been re-ranged or still has original values (it seems unlikely that someone would e.g., load an 8-bit image, multiply its values by 5, and then pass it to this function), so we might be able to just do a simple check: don't change anything with all pixel values within the output range, treat anything with all positive values and max between 1 and 255 as an 8-bit image, treat anything with all positive values and max between 255 and 65535 as a 16-bit image, and raise an exception for anything else.
Issue: The new stable release of PyTorch 1.5 includes autograd methods to compute vector-jacobian products (VJP) and jacobian-vector products (JVP). We rolled our own methods to compute these products to synthesize Eigendistortions. Should we replace our methods in favour of PyTorch's built-in functions to possibly reduce redundant code?
Short answer: No.
Long answer: We use the power method (and the Lanczos algorithm, which is a form of power method) to synthesize Eigendistortions. This requires calling VJP and JVP thousands of times. The way we do this now is to compute 1 forward pass of the model, maintain its graph, then iteratively use our functions to perform N backwards passes on that graph to compute N VJP/JVPs (i.e. N+1 operations to compute N products) This contrasts with PyTorch's implementation of VJP/JVP in that their methods perform a forward pass each time, thus requiring 2N operations to compute N products.
ventral_stream.py
models (and pooling.py
windows) currently can only fixate at the center of the image, would be relatively simple to allow fixation location to be a parameter
pytest-notebook looks like a good way to re-run notebooks and check that their output hasn't changed. Could be useful to add that to our tests, since we want to make sure that we don't break the tutorial notebooks with any changes we make (see if we can select some cells to have different outputs maybe? would we want to know anytime the output changes or just if you can't run the notebooks anymore?)
In pyrtools
, we had a pyramid
class that we never wanted anyone to use, but that all the pyramid objects inherited. This made it easy to share relevant methods between them and make sure they had comparable attributes.
Should we do a similar thing for model and synthesis objects? I've written a whole bunch of code for the ventral stream models and for metamer that I feel could be relevant for other models and synthesis objects, and it would make standardization of the API easier. Using an abstract master class would make it easy to share these methods and make sure the attributes are consistent without requiring too much overhead (once the initial creation of the parent classes is finished...)
I've been meaning to abstract some of the stuff I've written for metamer and ventral stream models regardless. For example, the save and load methods (as well as the "reduced" version for the ventral stream model), and the display code. I've done a bit of work making the display code abstract already, but if we put it in a parent class, you'd have access to it for free.
Maybe not surprising, but the docs are broken right now. Following the instructions outlined in CONTRIBUTING.md
(with a fresh install of the plenoptic_docs
environment), the make html
command fails, The attached docs.log shows its output.
It's a bunch of errors, but probably the same thing over and over again.
Given that we have different authors with different styles, our formatting/conventions are not consistent. We should look into using something like yapf or black to make our code consistent and maybe pydocstyle for our documentation.
Tutorials here: https://packaging.python.org/
Describe the bug
Error:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
To Reproduce
[Note: this will only be reproducible on Mac OSX and only sometimes.]
import matplotlib.pyplot as plt
import plenoptic as po
import torch
model = po.simul.Texture_Statistics([256,256])
image = plt.imread('../data/nuts.pgm').astype(float)/255.
im0 = torch.tensor(image, requires_grad=True, dtype = torch.float32).squeeze().unsqueeze(0).unsqueeze(0)
c = po.RangeClamper([image.min(), image.max()])
M = po.synth.Metamer(im0, model)
producing the following message:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Note: If this error occurs during the use of a jupyter notebook, then the kernel dies , producing the error message above in the Terminal and the following message in the Jupyter notebook:
System (please complete the following information):
In the real steerable pyramid, utilize the benefits of the symmetric fft in rfft
and irfft
to make the computation more efficient. This will require utilizing onesided=True
for these cases and then adjusting the mask sizes etc. to make the rest of the code compatible.
Seems like a useful way to visualize nD arrays: https://ilovesymposia.com/2019/10/24/introducing-napari-a-fast-n-dimensional-image-viewer-in-python/. Also has explicit support for image pyramids: http://napari.org/tutorials/image.html#image-pyramids
Want to check whether we can subclass it / extend it like we did with pyrtools.imshow
, in order to make sure:
It looks like it's built on top of VisPy
rather than matplotlib
, don't know anything about the differences between them.
The FourMomentsClamper
for metamer synthesis doesn't seem to be working. In order to get it working on gpus, I need to add a bunch of device=ch.device
throughout it (see the ventral_stream
branch), but now I'm running into CUDA error: an illegal memory access was encountered.
. I'm not sure this happens every time?
I sometimes have gotten even stranger errors: MAGMA geev : Argument 4 : illegal value at /opt/conda/cond-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathMagma.cu:220
So I'm not sure what to make of that. I don't understand enough of what's happening in that function (specifically, it's the modkurt
function), but would probably be worth cleaning it up so there's no so much creation of new tensors throughout.
For citing, we'll ideally have a paper in JOSS (which will help publicize the project and get us some feedback). But we'd also like something to allow people to specify exactly what version they used. Software heritage IDs seems like a good way to do that (requires putting together a codemeta.json
), so look into it.
Idea from this blog post, which recommends it.
Look into this, looks like it might be useful: https://github.com/syrusakbary/snapshottest
Inserting matlab code here that used to be in the perceptual distance file just in case- can be translated to python at a later time if we need thi
spectral_resid.txt
Want the steerable pyramid to have support for coarse to fine optimization, which means that it should accept scales the way that the ventral stream models do. This will improve efficiency for those models and will help Portilla-Simoncelli coarse-to-fine as well.
Create a tutorial showing how to use all the Synthesis display code. Show basic usage, how you can customize the size of the plot and its contents, and the fine-grained control allowed by axes_idx
. Also, how animate works pretty easily.
For advanced usage, discuss update_plot
?
In its current form, the FrontEnd
model produces eigendistortions near the edges for several input images of varying crop size. We are currently using refletctionpad2d boundary handling. This issue could possibly be be resolved with a frequency domain implementation, as suggested in existing issue #23, which would in effect implement circular boundary handling. Alternatively, we could leave our convs in the spatial domain and try various other boundary handling options.
I tried applying a circular diskmask to the image during the forward()
call. In this case the eigendistortions just ended up at the edges of the circular mask.
The 31x31 conv2d weights we're using are pre-trained using a model that was trained on images of dim 384x512. I tried using images of this size as well to synthesize eigendistortions and still got eigendistortions near the edge.
Since that's the operating system that the LCV machines and the NYU cluster use.
Probably worth using something to check how complete our test coverage is (that is, are we not missing a test): see here for general discussion and pytest-cov for a library we could use.
For each of the synthesis methods:
Also for models?
Right now, these functions are just used in eigendistortion and are fine as written, but they're helpful in other contexts: e.g., if you want to use torch.backward(output)
, and output
is not a scalar, you need to pass a gradient vector, and the Jacobian-vector product should be used (assuming I'm understanding the documentation). So we should make these functions easier to use in other contexts, such as our standard way of interacting with models: when input is 4d and output is 3d or 4d (with possibly multiple batches and channels)
Currently it fails because saved_image
is not a list, and if you don't understand the code, it's not clear how to fix it.
Make that message clearer
When using our pyrtools.imshow
, it's annoying to convert the tensors to arrays all the time (and call squeeze and all that), so let's create a wrapper around it that handles it automatically.
Should probably live in tools/display.py
and have same call signature as pyrtools.imshow
, should call plenoptic/tools/data.to_numpy
on each image and .squeeze()
them. Not sure if it would need anything else.
We had an implementation of this, but removed it because of difficulties getting it implemented. The function is pasted below as a starting point.
This repo contains the MATLAB MS-SSIM code from Zhou Wang's website that's referenced in the code below, and that should be used for generating values to match
Things to watch out for:
curie.pgm
and einstein.pgm
from this repo, some of the mcs
values are negative, which leads to issues because pytorch doesn't support complex values right now (we have workarounds in our steerable pyramid, where we put the real and imaginary in a 5th dimension, at the end). In matlab, the returned value is complex (.0289+.0666i
) and I don't know enough about MS-SSIM to know if this is reasonablemcs
and mssims
will be (5, b, c)
tensors, where b
and c
are the numbers of batches and channels that we get when comparing img1
and img2
. weights
is a 1d tensor with 5 elements, and so mcs**weights
(or, equivalently, torch.pow(mcs, weights)
) will be a (5,b,5)
(not sure what happens with channels) tensor, from which we want the diagonal of each batch. This is a little clunky and there's probably a better way to do it.def msssim(img1, img2, dynamic_range=1, normalize=False):
device = img1.device
weights = torch.FloatTensor([0.0448, 0.2856, 0.3001, 0.2363, 0.1333]).to(device)
levels = weights.size()[0]
mssims = []
mcs = []
for _ in range(levels):
ssim_map, contrast_map, _ = _ssim_parts(img1, img2, dynamic_range=dynamic_range)
mssims.append(ssim_map.mean((-1, -2)))
mcs.append(contrast_map.mean((-1, -2)))
img1 = F.avg_pool2d(img1, (2, 2))
img2 = F.avg_pool2d(img2, (2, 2))
mssims = torch.stack(mssims)
mcs = torch.stack(mcs)
# Normalize (to avoid NaNs during training unstable models, not compliant with original definition)
if normalize:
mssims = (mssims + 1) / 2
mcs = (mcs + 1) / 2
# This does not work as written -- a tensor with 5 elements raised to
# another tensor with five elements returns a 5x5 tensor, from which we
# want the diagonals. And some values in mcs can be negative, which leads
# to difficulties
pow1 = mcs ** weights
pow2 = mssims ** weights
# From Matlab implementation https://ece.uwaterloo.ca/~z70wang/research/iwssim/
output = torch.prod(pow1[:-1] * pow2[-1])
return output
MAD Competition is working now, but in order to be certain about it, we want to synthesize some images that match examples from the MATLAB code (though they won't be identical, should be in the ballpark)
Additionally, I found a weird issue with the example in the Simple_MAD
notebook: when using po.add_noise
to generate the initial image, the image generated would always lie along the forward or reverse diagonal (e.g., from base image [.5, .5]
to [.6, .4]
), which gives you L1 and L2 loss contours such that the circle is completely circumscribed within the square:
In this case, MAD Competition (with the parameters set up in that notebook) completely failed to find any solution for it. Need to think about both why there appear to be such limited possible values and why MAD Competition has trouble here, but for now that means going back to the earlier way of adding noise. Should maybe add option to specify initial image?
In simulate.canonical_computations.steerable_pyramid_freq.Steerable_Pyramid_Freq
we should allow the user to define the number of filter orientations. This would obviate using steer_coeffs method, and would explicitly have responses of each oriented filter returned in the response tensor as additional channels.
The FrontEnd model is a very useful one, and would be great to have in some examples, but it's right now so inefficient that synthesizing with it is very slow. How can we make it more efficient?
Discussed a bit with @pehf and my understanding is the main issue is that it's convolving with 31x31 kernels in the signal domain (I haven't profiled it to investigate). If that's so, will get slower as a function of image size. Could we not just take the Fourier transform of the kernel and the image, multiply together, and take the inverse Fourier transform (like the way our steerable pyramid implementation works)?
Instantiating a tensor via torch.tensor should be avoided when torch.from_numpy or torch.as_tensor can be used instead. This is bc torch.tensor always copies data, whereas the other to do not, or at the very least avoid when possible.
The basics of this has been completed, but needs more work to finalize.
We want all of our synthesize methods to expect 4d images: (batch, channel, height, width)
and expect the model outputs to be 3d or 4d: (batch, channel, y_1)
or (batch, channel, y_1, y_2)
. Eigendistortions right now does not.
Right now, the synthesis methods are probably too memory-intensive to make this way of doing things reasonable. It would require some more thinking about how to parallelize across batches / channels that none of us need right now.
Synthesis
superclassThis is linked to #17, will be in same PR
We want to make sure that our code runs on GPUs with very little overheard.
Currently, there are two steps for that:
Make sure everything runs on GPU in same manner. See metamer.py
, steerable_pyramid_freq.py
, pooling.py
, and ventral_stream.py
for my preferred way, but basically: none of our synthesis methods nor models should set the device anywhere:
.to
method, which moves all tensor attributes over to given device/dtype, and then all of its methods should work regardless of which device they're on. This can be done by using things like torch.ones_like
; if a new tensor needs to be created (and you can't use torch.ones_like
or something like it), its device should be explicitly set to that of that method's input. If method has no input, check one of the tensor attributes.Figure out how to make Travis CI work with CUDA. There's an open issue on this, so it might not be trivial, but they link an existing project which has a .travis.yml
file we could try modifying.
PooledVentralStream models are not quite pytorch-ic: they should have the different computations as layers, each of which are torch.nn.modules
, allowing for hooks (see here) and don't store memory-intensive attributes. Attributes should only be metadata, and have methods for converting the tensor output into the more structured representational form for visualization / understanding (but do not store it as an attribute)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.