GithubHelp home page GithubHelp logo

distillpub / post--differentiable-parameterizations Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 12.0 108.31 MB

A powerful, under-explored tool for neural network visualizations and art.

Home Page: https://distill.pub/2018/differentiable-parameterizations

License: Creative Commons Attribution 4.0 International

HTML 16.87% JavaScript 82.20% TeX 0.77% CSS 0.11% Python 0.05%

post--differentiable-parameterizations's Introduction

Post -- Exploring Bayesian Optimization

Breaking Bayesian Optimization into small, sizable chunks.

To view the rendered version of the post, visit: https://distill.pub/2020/bayesian-optimization/

Authors

Apoorv Agnihotri and Nipun Batra (both IIT Gandhinagar)

Offline viewing

Open public/index.html in your browser.

NB - the citations may not appear correctly in the offline render

post--differentiable-parameterizations's People

Contributors

cberner avatar colah avatar darabos avatar ludwigschubert avatar zanarmstrong avatar znah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

post--differentiable-parameterizations's Issues

Revise the end of 3d style transfer section

[ ] I propose dropping this:

The resulting textures combine elements models of the desired style, while preserving the characteristics of the original texture.
Take as an example the model created by imposing Van Gogh's starry night as style image.
The resulting texture contains the repetitive and vigorous brush strokes that characterize Van Gogh's work.
However, despite the style image contains only cold tones, the resulting fur has a warm orange undertone as it is preserved from the original texture.
Even more interesting is how the eyes of the bunny are preserved when different styles are transfered.
For example, when the style is obtained from the Van Gogh's painting, the eyes are transformed in a star-like swirl, while if Kandinsky's work is used, they become abstract patterns that still resemble the original eyes.

because:

  • Observations described here can change from run to run
  • I tend to explain content color preservation to using rather low layer of GoogleNet for content loss

[ ] Need to say more about weighted gram matrices over iterations, and it's implementation with tf.stop_gradients

[ ] Footnote 13 should either link to shapeways material description, or be removed.

Improve Threejs lifecycle/performance issues

We're currently not reusing renderers, which results in the browser dropping contexts:
image
I believe using only one renderer, probably stored in the shared Svelte store, should help with this.

Review #2

The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.

The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to write the review.


This article presents an interesting perspective on differentiable image transformations and their underappreciated usefulness for neural network visualization and neural art. Several neural art generators like Deep dream use the fact that the neural network is a differentiable function of the image and therefore one can backprop into the pixel space to maximize some desired objective. However, the authors point out that an RGB description is not the only way to parametrize an image and using alternative parametrizations can often have unexpected benefits!

The first area that the authors study that can benefit from alternative parametrization is neural network visualization. One way to get a deeper understanding of the inner workings of a network is to optimize convex combinations of neurons. This was not completely clear to me and the reference that the authors give here just points to a wikipedia article about what a convex combination means. Perhaps a better reference here might be to a network visualization paper. The authors here point out that using a shared parametrization between different frames can help the features remained aligned between interpolations thereby helping visualization.

The other area benefiting from alternate parametrizations is neural art. The authors mention an interesting observation that I was not aware of: that style transfer works mostly with the VGG network even though other classification networks perform at par or better than the VGG in classification. The authors then show that by parametrizing the image in the Fourier basis, one can get similar results on style transfer using the Google LeNet. This is a very interesting result and one that I would love to see analyzed more (perhaps for future work). The authors then study Compositional Pattern Producing Networks (CPPNs), where the parametrization is a differentiable parametrization using a neural network. Several beautiful pictures generated using this method are presented. The authors then move on to 3d visual art using rendering, where the parametrization is a Fourier parametrization for the textures. The authors also describe the UV mapping, where every vertex of the triangulation gets a coordinate in the mesh.

The gradients are applied in a two stage process, once to the mesh to get the desired style transfer, and then back to the renderer description. Using this the authors are able to generate nice 3d style transfers which look very cool!

Typos and suggestions etc:

  • The transition from the introduction to the first section is a little abrupt. Also the first section is a little confusing (maybe because I
    am not familiar with convex combination feature visualization)
    "adjustign" should be "adjusting"

Rotation on scroll

Instead of unfolding the mesh during scroll, it would be cool to do a small rotation, just to show that you can interact with the model

Diagram font issue

I think the SVGs for the diagrams are using a font that not all users have. In the fallback, bold vs normal fonts have different sizes, leaving a small gap:

image

Positioning relative to prior work

For a couple of the parameterizations we present, similar things have been done before. While we're already citing all of this, I think it might be worth thinking a bit more about positioning our article.

  • One possibility would be to talk in the introduction about how this direction has started to be explored, but that we think it's powerful think about this as a space instead of individual techniques.

    • There may be something to say about how this is broader than just images and art. It's a more general version of preconditioning and can be used in all kinds of places!
  • For some examples, depending on timing, we may be able to describe our work as contemporaneous by citing the workshop talk. (I'm not sure that this is a route worth going down, but I wanted to note it.)

Additional Acknowledgments

  • Justin Gilmer - for reviewing article
  • Anonymous reviewers
  • redblobgames - for pointing out bug (did we fix it?) Yes!
  • Others?

Should we tell more about original motivation behind Weighted Fourier param?

The original motivation came from the fact, that InceptionV1 gradients look similar to white noise for some reason, while natural images have 1/f amplitude spectrum decay.[google: natural image spectrum] At first I was using Lapacian pyramid normalization to fix the gradient spectrum, and then together with Chris we formulated it as optimization in different parameters space (or as Chris states, optimization under a different vector norm).

cmd-f "parameterization"

Let's try to rephrase some sentences to reduce "parameterization" word density. I'd aim for <4 per paragraph, <2 per line.

Sub Title / Lead

Presently, we have the following read in:

Six examples of differentiable image parameterizations that allow solving previously intractable optimization tasks:

I worry that:

  • "intractable" isn't quite right -- except for style transfer, these are largely things we didn't have a way to try, not things we were trying that didn't work
  • It doesn't really capture the feeling of unexplored terrain and untapped potential that feels emotionally salient to me about this article.

Some alternatives:

  • Six examples of the artistic potential of alternate image parameterizations.
  • A neglected dimension of neural art.
  • A powerful set of tools for neural network visualizations and art.

A family of alternatives: "A [1] [2] for [3]" where:

  • [1a] neglected / underexplored / untapped
  • [1b] powerful
  • [2] set of tools / dimension
  • [3] neural art / neural network visualizations and art

Synchronize 3D model rotation

In figure #BunnyModelTextureSpaceOptimization ideally the two models would be controlled simultaneously.

The best implementation strategy I can come up with would expose the state of the OrbitControls object, so we could bind to it from a parent svelte component. That seems more general purpose than, say, creating a special 3D scene with both models in them linked to the same transform.

Review #3

The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.

The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to write such a thorough review.


Overall:

I love the subject material in the article. I wish it educated me more.

Currently the article advocates a viewpoint, that is, that image generation algorithms often should work in non-pixel-spaces. However, the article feels like it would be stronger and more useful if it were written from the point of view of teaching me how to do it rather than just convincing me that it should be done.

In particular, most of the examples in the article omit key details that I would want to understand if I were to want to try to apply the ideas. In general, the simpler the example, the more explicit I wish the details were, because then I could try them out more quickly.

I think the article would be better if, for each algorithm it:

  • Writes the specific transformation (as math, or as pseudocode) instead of only describing it in words.
  • Writes what the loss would have been before transformation (if applicable), and write down the new loss on the transformed space (showing the sort of adjustments needed).

Even though this might add a few formulas, I suspect that with the right notation, it would actually make the article more readable.

Feedback and questions on each section:

(1) The aligned neuron visualization example describes the parameterization as N(P[shared] + P[unique]), where N is the sigmoid and P[unique] is "high resolution".
A few extra details might make it much easier to understand what is happening:

  • What's the loss? In particular, how is P[unique] constrained or regularized to be a "high resolution" component? Is an extra term needed in the loss when optimizing P[unique], or does it happen naturally without any extra terms?
  • How is P[shared] chosen? Is the choice important? The example illustrated looks like a smoothed version of a single random result.
  • The text says that usually N is a sigmoid. If there are other good choices, what are they? If the sigmoid is the best choice, just say N is a sigmoid.

(2) On style transfer, it is asserted that optimizing the learned image in Fourier space yields better-looking results on non-VGG architectures, but again it would be easier to read if you were more explicit on exactly how the process is different. Here the puzzle is how the loss is affected.

  • Rather than just citing Olaf, go ahead and write down their loss function as a starting point.
  • What's the loss in your new setup? In your switch to Fourier space, is the content loss still computed in pixel space, or is the content loss now in Fourier space? Are there any tricky considerations when doing the loss in the frequency domain?
  • In your switch to Googlenet, which layers go into your style loss (not necessarily obvious, considering the nontrivial topology)? Are the results very sensitive to the by-layer weights?
  • As a non-expert in style transfer who doesn't have experience in the difficulty of getting non-VGG networks to work, I wasn't super-convinced by the negative examples, because I wondered if other factors might have caused them to not work well. Low-level details seem to be overemphasized in the negative examples, but it leaves me wondering if that could have been repaired other ways, by changing the weighting of high-level vs low-level layers in the style loss. Is there other evidence that the improvement is due to the switch to frequency space?
  • The interesting assertion is made that the difficulty of style transfer on other architectures is due to checkerboard patterns in gradients. Can that be visualized? Why would we expect the move to Fourier space fix the problem - and is there any illustration that can show how or why that particular problem is fixed?

(3) On transparent neuron visualizations.

The simplicity of this example+idea is really nice.

  • In this idea I think it would be particularly instructive to explicitly write down the loss and transformation here, showing where the random term comes in, and if+how it interacts with the learned alpha channel. For example, I imagine that (1-alpha) multiplies by the random noise; and that you add a regularizer based on either (1-alpha) or (1-alpha)^2.

(4) CPPN example.
I like the CPPN visualizations, but it left me with a number of questions and unsure how to achieve good results with CPPNs.

  • To make your CPPN differentiable, I assume you're fixing the CPPN network architecture and fixing the activation functions?
  • Can the CPPN network you use be described in more detail? What is the set of activation functions?
  • Can you give a sense for how many parameter are used to lead to the level of complexity in visualizations you shown?
  • Are the results very sensitive to these choices (CPPN network architecture, activation functions, and number of free parameters)?

(5) Bunny examples #1

  • The transformation was pretty clearly described. It sounds like the actual transformation you use is 2d projection lit head-on, and the loss is calculated in this 2d projection space.

(6) Bunny example #2

  • The rendered example didn't work in the draft I saw. Maybe still under construction.
  • The following detail seemed like it was important but what it actually meant for an implementor seemed unclear to me: "To this end, the style loss is computed as the weighted average of the loss at the current an[d] previous iteration." Does this mean that you are actually optimizing the texture base on two views at once? Or if the previous view's texture is fixed, then why isn't the weighted average of the two losses just equivalent to adding a constant number to the loss (i.e., it wouldn't change anything being optimized?) Or perhaps this means "the loss is computed based on the weighted average of the feature vectors of the current and previous views."
  • Again, maybe writing down the loss more explicitly would clarify this.

Link used 3d models and style images

models

bunny http://alice.loria.fr/index.php/software/7-data/37-unwrapped-meshes.html
skull https://sketchfab.com/models/1a9db900738d44298b0bc59f68123393
horse https://sketchfab.com/models/864497a206024c8e832b5127e9e23f2f
david https://sketchfab.com/models/3a8f65d7db8e4ba7a0ea886e2b636128

images

starry https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/606px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
onwhite https://upload.wikimedia.org/wikipedia/commons/c/c4/Vassily_Kandinsky%2C_1923_-_On_White_II.jpg
mosaic https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Fernand_L%C3%A9ger_-_Grand_parade_with_red_background_%28mosaic%29_1958_made.jpg/637px-Fernand_L%C3%A9ger_-_Grand_parade_with_red_background_%28mosaic%29_1958_made.jpg
points https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Robert_Delaunay%2C_1906%2C_Portrait_de_Metzinger%2C_oil_on_canvas%2C_55_x_43_cm%2C_DSC08255.jpg/449px-Robert_Delaunay%2C_1906%2C_Portrait_de_Metzinger%2C_oil_on_canvas%2C_55_x_43_cm%2C_DSC08255.jpg
scream https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/The_Scream.jpg/471px-The_Scream.jpg
noodles https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Noodles_and_eggs20170520_1035.jpg/526px-Noodles_and_eggs20170520_1035.jpg
newspaper https://upload.wikimedia.org/wikipedia/commons/d/db/RIAN_archive_409362_Literaturnaya_Gazeta_article_about_YuriGagarin%2C_first_man_in_space.jpg
birds https://canyouseedotca.files.wordpress.com/2016/01/mce-birds.jpg
cross https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Cross_stitch_detail.jpg/640px-Cross_stitch_detail.jpg
galaxy https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/NGC_4414_%28NASA-med%29.jpg/582px-NGC_4414_%28NASA-med%29.jpg
cd https://upload.wikimedia.org/wikipedia/commons/thumb/d/d5/CD_autolev_crop.jpg/480px-CD_autolev_crop.jpg

Figure/section numbering and linking

I often miss the ability to link individual sections or figures from distill articles, or refer them in conversation by number. What does Distill think about this?

Review #1

The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.

The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, Pang Wei Koh, for taking the time to write such a thorough review.


High-level

I found the article interesting and thought-provoking, and the visualizations were eye-catching and very helpful. Thanks to the authors for putting in the effort to write this article and make all of the associated notebooks and visualizations!

There are two main ways that I think the article could be improved:

  1. Providing more overall context and motivation beyond "let's use different parameterizations", and
  2. Taking care to explain all of the concepts invoked (especially since this is a pedagogical article).

Here are more details on these.

1)

I think the biggest missing thing is a big-picture view about why different parameterizations might lead to different results, and why we might prefer one type of parameterization over another. For example, after reading the intro, I was still not sure about the motivation for the work. The argument went something like, we should use different parameterizations because we can. But what are examples of different parameterizations and why would we expect them to work better/differently?

The most persuasive argument (to me) was the one advanced in the CPPN section: that parameterizations impose constraints upon the optimized image that fit more with the kinds of pictures we'd like to see. This could be: pictures that are more realistic (CPPN); pictures that obey some sort of 3D smoothness (style transfer 3D section); etc. A variant of this argument can also be applied to the shared parameterization section. So perhaps this intuition could be given at the start of the article, together with more signposting of the kinds of parameterizations that the rest of the article would consider.

2)

I found it hard to follow some parts of the article. The argument roughly makes sense, but it was difficult for me to precisely understand what the authors were trying to convey. For example, take the first paragraph of the second section (Aligned Neuron Interpolation):

Sometimes we’d like to visualize how two neurons interact.
This is the first sentence after the intro, and I didn't understand how it was related to what we'd just read in the intro. For example, in the intro, the goal seems to be to "describe the properties we want an image to have (e.g. style), and then optimize the input image to have those properties." Where did neurons come from and why do we care about how they interact? What does it even mean to visualize how they interact?

We can do this by optimizing convex combinations of two neurons.
Why does optimizing convex combinations of two neurons allow us to visualize how two neurons interact? The link wasn't obvious to me.

If we do this naively, different frames of the resulting visualization will be unaligned — visual landmarks such as eyes appear in different locations in each frame.
At this point I was quite confused: What's a frame? Where did frames come from?

This is because the optimization process that creates the visualization is stochastic: even optimizing for the same objective will lead to the visualization being laid out different each time.
This is the first mention of stochasticity, and it's not clear how that's related to parameterizations, since different parameterizations would presumably have equally stochastic optimizations. Why is this a problem for RGB parameterizations and not others? At this point, I thought that the article was going to be mainly about how the default parameterization is non-convex, and perhaps that different parameterizations could lead to convexity.

Unfortunately, this randomness can make it harder to compare slightly different objectives.
What different objectives are we considering? I'm guessing it's different convex combinations? Why would I want to compare them?

Similarly, the article talks about a "decorrelated parameterization" that somehow works better, but doesn't explain why (except by a brief reference to checkerboard gradients, which I'm guessing a decorrelated parameterization doesn't suffer from, but I'm not sure why that would be the case).

I'd suggest going through the article carefully and making sure that every sentence clearly follows from the previous one, especially for someone with only the minimum level of background knowledge.

Comments on figures

First figure: I was initially a bit confused by why the RGB representation was in the middle of the figure, instead of on the left. (I realized later that you're using neural networks that still operate on the RGB representation; so perhaps it's worth clarifying that you're only considering different parameterizations for the visualization, instead of the training.)

Second/third figures: These were broken for me (see attached screenshot). I only saw grey blocks.

Fourth figure: For some choices of style/content, including the first/default one, the decorrelated space picture looked exactly the same as the image space picture (and both looked bad; see attached screenshot). Is this a bug?

CPPN figure: I can't see the last figure of this section (there's just a big blank space). I'm also not sure what objective you're optimizing for in this section -- how are the pictures being generated?

Typos

"to fuel a small artistic movement based neural art." -> "to fuel a small artistic movement based on neural art."
"each frame is parameterized as a combination of it’s own unique parameterization, and" -> "each frame is parameterized as a combination of its own unique parameterization and"
"despite it’s remarkable success" -> "despite its remarkable success"
"By iteratively adjustign the weights, our immaginary prisms" -> "By iteratively adjusting the weights, our imaginary prisms"
"as a mean to generate" -> "as a means to generate"
"But it’s certain possible to" -> "But it’s certainly possible to"
"This kind more general use" -> "This kind of more general use"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.