GithubHelp home page GithubHelp logo

distillpub / post--circuits--frequency-edges Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 2.0 25.64 MB

High/Low Frequency Detectors for Circuits article

License: Creative Commons Attribution 4.0 International

HTML 87.74% JavaScript 0.14% TeX 1.20% CSS 0.46% Python 0.02% Makefile 0.01% Svelte 0.40% EJS 10.03%

post--circuits--frequency-edges's Introduction

Post -- Exploring Bayesian Optimization

Breaking Bayesian Optimization into small, sizable chunks.

To view the rendered version of the post, visit: https://distill.pub/2020/bayesian-optimization/

Authors

Apoorv Agnihotri and Nipun Batra (both IIT Gandhinagar)

Offline viewing

Open public/index.html in your browser.

NB - the citations may not appear correctly in the offline render

post--circuits--frequency-edges's People

Contributors

colah avatar csvoss avatar ludwigschubert avatar ncammarata avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

post--circuits--frequency-edges's Issues

Names for Hypothesis 1 vs Hypothesis 2?

I'd like to both go into a bit more detail on the hypotheses, provide the examples from other circuit work also as images, and maybe give them catchier names so we can refer back to them with a little more signal than asking readers to remember numbers.

Here's the current state for reference:

  • Hypothesis 1. The detectors combine together low frequency detection and high frequency detection in a specific spatial arrangement in order to produce high-low frequency detection. This would be like how InceptionV1 constructs a car detector by looking for windows on top, a car body in the middle, and wheels on the bottom.
  • Hypothesis 2. These detectors are built up from lower-level features which are roughly similar to high-low frequency detectors, but more primitive and less reliable. This would be like how edge detection is built up from simple Gabor filters.

Maybe Hypothesis 1 could be called along the lines of: "functional specialization + spatially meaningful weights"? Then Hypothesis 2 could be called along the lines of: "purely hierarchical features". In short: "spatially meaningful" vs "purely hierarchical". We could refer to them as: hypothesis 1 ("purely hierarchical") vs hypothesis 2 ("spatially meaningful").

WDYT? Did I even understand these correctly? 🙃 @csvoss

Diagram todos

  • Unify all line widths and colors
  • Unify all spacing, maybe with a central CSS variable, or by measuring in rems.
  • Test all three viewports in Chrome and Safari!

HF Factor / LF Factor synthetic stimuli – reconstitute the code

Artifacts that may help

Overall thrust of the approach

  1. Acquire directions for the HF Factor and the LF Factor. I paired with Chris on this, and we used some code that we extracted from here: Weight Visualization Demo
  2. Create synthetic stimuli. I took some of the code from the High/Low Frequency Detectors Colab and modified it.
  3. Measure the responses of the HF factor and LF factor to the synthetic stimuli, and plot it in a line plot averaged over angle.

Screenshots generated in the interim

2-upstream-synthetic-stimuli (1)
2-upstream-synthetic-stimuli-responses-hf
2-upstream-synthetic-stimuli-responses-lf
Screen Shot 2020-08-07 at 1 12 27 PM

According to my notes, I had applied a multiplicative factor of 0.5 to the natural range I was getting just by copying the existing sinusoidal checkerboard code in order to shift the frequencies to get exactly these ones in particular.

Also according to my notes, I decided to only rotate the stimuli images through 0.125 of tau, down from the original value of 0.5 of tau, in order to reduce some symmetries which seemed to distract from the main point.

Progress I'd made towards extracting out helper functions from the Colab

This is just a big long Python file that I had been referring to as lib.py and using as a common place to import stuff from in my Jupyter notebook. The first half is stuff I extracted from the Colab, and then a few functions from the end are from pairing with Chris to get the HF-factor and LF-factor synthetic stimuli going. What was lost in the devbox crash was the code in the Jupyter notebook itself: some code to actually invoke these functions, create the synthetic stimuli, and run the analysis.

# -*- coding: utf-8 -*-
"""Circuits -- High-Low Frequency Detectors.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1N2sK3cOaKoTHqXLOirYDUKMC-2iNS84A

# Install / Import / Load

This code depends on [Lucid](https://github.com/tensorflow/lucid) (our visualization library).
The following cell will install it plus dependencies such as TensorFlow, and import them as appropriate.
"""

import functools
from collections import namedtuple
from dataclasses import dataclass, field
from functools import lru_cache
from typing import List

import numpy as np
import scipy.ndimage as nd
import tensorflow as tf
from more_itertools import chunked
from scipy.ndimage import zoom

import lucid.optvis.objectives as objectives
import lucid.optvis.param as param
import lucid.optvis.render as render
import lucid.optvis.transform as transform
import matplotlib as mpl
import matplotlib.pyplot as plt
from cycler import cycler
from lucid.misc.channel_reducer import ChannelReducer
from lucid.misc.gradient_override import gradient_override_map
from lucid.misc.io import collapse_channels, save, scoping, show
from lucid.misc.io.showing import _display_html, _image_url
from lucid.modelzoo import vision_models
from lucid.optvis.overrides import linearization_overrides

scoping.set_io_scopes(["gs://fls/schubert/circuits--hilo"])

should_save = False

assert tf.test.is_gpu_available()

# "Connections" between adjacent layers are simply their weights.
# When we want to inspect connections across multiple layers, we compute
# the gradient between them on a linearized version of the model:
# ReLU gradients are replaced with the identity, and MaxPool layers are
# treated as average pooling layers.


@lru_cache()
def linearized_connection(model, layer1, layer2, W=5, offset=2):
    with tf.Graph().as_default(), tf.Session(), linearization_overrides():
        t_input = tf.placeholder_with_default(
            tf.zeros(model.image_shape), model.image_shape
        )
        T = render.import_model(model, t_input, t_input)

        # Compute activations
        acts1 = T(layer1).eval()
        acts2 = T(layer2).eval()
        print(acts1.shape, acts2.shape)
        n_chan2 = tf.placeholder("int32", [])

        gradients = tf.gradients(T(layer2), [T(layer1)])
        t_grad = gradients[0]
        # print( T(layer2).eval({T(layer1): acts1[:,0:W,0:W]}).shape)

        return np.stack(
            [
                t_grad.eval({n_chan2: i, T(layer1): acts1[:, 0:W, 0:W]})[0]
                for i in range(acts2.shape[-1])
            ],
            -1,
        )


def positive_negative_image(array):
    """Splits a single-channel array into positive and negative components.
    Used for visualization, mapping:
      positive -> orange (= red + .5* green)
      negative -> cyan (= blue + .5* green)
      zero -> white
    """
    p, n = np.maximum(0, array), np.maximum(0, -array)
    return np.stack([1 - 2 * n, 1 - (p + n), 1 - 2 * p], axis=-1)

    # Alternative: map 0 to medium green if 0 is important to see
    # return np.stack([p, .5*np.ones_like(p), n], axis=-1)


# Colab matplotlib styles 2020-01-27 schubert@


mpl.rcParams["lines.linewidth"] = 1.5
# Viridis. Not really meant as a categorical scale.
# colors = reversed(("#440154","#46327e","#365c8d","#277f8e","#1fa187","#4ac16d","#a0da39","#fde725"))
mpl.rcParams["axes.prop_cycle"] = cycler(color=mpl.colors.TABLEAU_COLORS)


mpl.rcParams["figure.facecolor"] = "383838"  # background inside plot
mpl.rcParams["axes.facecolor"] = "383838"  # background around plot
mpl.rcParams["axes.edgecolor"] = "EEEEEEAA"  # axis itself
mpl.rcParams["xtick.color"] = "EEEEEEAA"
mpl.rcParams["ytick.color"] = "EEEEEEAA"
mpl.rcParams["text.color"] = "EEEEEEAA"
mpl.rcParams["axes.labelcolor"] = "EEEEEEAA"
mpl.rcParams["legend.edgecolor"] = "383838"

mpl.rcParams["axes.spines.right"] = False
mpl.rcParams["axes.spines.top"] = False

mpl.rcParams["axes.grid"] = True
mpl.rcParams["grid.alpha"] = 0.5
mpl.rcParams["grid.color"] = "666666"
mpl.rcParams["grid.linestyle"] = "solid"
mpl.rcParams["grid.linewidth"] = 1


def render_units(
    model,
    layer,
    units,
    w=96,
    objective_f=objectives.neuron,
    regularize_l1=False,
    transforms=None,
    num_steps=1024,
    alpha=False,
):
    objective = sum(
        [
            objective_f(layer, index, batch=batch_index)
            for batch_index, index in enumerate(units)
        ]
    )
    param_f = lambda: param.image(w, batch=len(units), alpha=alpha)
    trans = transforms or transform.standard_transforms.copy()
    if regularize_l1:
        objective -= 1e-3 * objectives.L1(constant=0.5)
    if alpha:
        objective += 1e-3 * objectives.blur_alpha_each_step()
        trans += [
            transform.crop_or_pad_to(w, w),
            transform.collapse_alpha_random(sd=0.015),
        ]
    return render.render_vis(
        model,
        objective,
        param_f,
        transforms=trans,
        thresholds=[num_steps],
        verbose=False,
    )[0]


"""# Analysis"""


@dataclass(frozen=True)
class HiLo:
    """Class for keeping track of a model's High-Low Frequency Detectors"""

    model: vision_models.Model
    layer_name: str
    channel_indices: List[int] = field(hash=False)
    weight_node_name: str
    previous_layer_name: str

    @property
    @lru_cache()
    def incoming_weights(self):
        (weight_node,) = [
            node
            for node in self.model.graph_def.node
            if node.name == self.weight_node_name
        ]
        weights = tf.make_ndarray(weight_node.attr["value"].tensor)
        if self.model.__class__ == vision_models.AlexNet:
            result = weights[..., -128:]  # weights are split...
        elif self.model.__class__ == vision_models.InceptionV1:
            # weights are multiplied by a 1x1 conv...
            (bn_weight_node,) = [
                node
                for node in self.model.graph_def.node
                if node.name == "mixed3a_3x3_bottleneck_w"
            ]
            bn_weights = tf.make_ndarray(bn_weight_node.attr["value"].tensor)
            result = np.einsum("abij,cdjk->cdik", bn_weights, weights)
            return result
        else:
            result = weights
        return result

    @lru_cache()
    def factorize_weights(self, return_components=False, subset_to_candidates=True):

        # factorize all weights at once, or:
        # subset to the channel_indices we believe to be hl frequency detectors
        if subset_to_candidates:
            connection = self.incoming_weights[..., self.channel_indices].transpose(
                -1, 0, 1, 2
            )
        else:
            connection = self.incoming_weights.transpose(-1, 0, 1, 2)

        # split into positive and negative components
        pn_components = [np.maximum(0, connection), np.maximum(0, -connection)]
        pn_connection = np.concatenate(pn_components, axis=0)

        # NMF dimensionality reduction
        reducer = ChannelReducer(2)
        pn_nmf_factors = reducer.fit_transform(pn_connection)

        # Recombine positive and negative components
        nmf_factors_split = np.split(pn_nmf_factors, 2, axis=0)
        nmf_factors = nmf_factors_split[0] - nmf_factors_split[1]
        # print(f"Factors: {nmf_factors.shape}, Components: {reducer.components.shape}")
        if return_components:
            return nmf_factors, reducer.components
        else:
            return nmf_factors

    @lru_cache()
    def render_components(self):
        _, components = self.factorize_weights(return_components=True)
        if self.model.__class__ == vision_models.AlexNet:  # weights are split...
            components = np.concatenate(
                [np.zeros_like(components), components], axis=-1
            )
        param_f = lambda: param.image(112, batch=len(components))
        objective = sum(
            objectives.direction(self.previous_layer_name, component, batch=i)
            for i, component in enumerate(components)
        )
        return render.render_vis(
            self.model, objective, param_f=param_f, verbose=False, thresholds=[128]
        )[0]

    @lru_cache()
    def render_units(self, channel=False):
        obj_f = objectives.channel if channel else objectives.neuron
        return render_units(
            self.model,
            self.layer_name,
            self.channel_indices,
            num_steps=512,
            objective_f=obj_f,
        )

    def render_factors(self):
        factors = self.factorize_weights()
        results = []
        for factor_index in range(factors.shape[-1]):
            factor = factors[..., factor_index]
            factor = (factor - np.min(factor)) / (np.max(factor) - np.min(factor))
            factor = factor - 0.5
            image = positive_negative_image(factor)
            padded = np.pad(
                image,
                ((0, 0), (1, 1), (1, 1), (0, 0)),
                mode="constant",
                constant_values=1,
            )
            results.append(padded)
        return results

    def save_assets(self):
        results = {}
        with scoping.io_scope(f"4.1-universality/{self.model.name}"):
            results["components"] = [
                save(component, f"component-{index}.png")
                for index, component in enumerate(self.render_components())
            ]
            results["units"] = [
                save(fv, f"unit-{index}.png")
                for index, fv in enumerate(self.render_units())
            ]
            results["unit_channels"] = [
                save(fv, f"unit-channel-{index}.png")
                for index, fv in enumerate(self.render_units(channel=True))
            ]
            results["factors"] = [
                save(unit, f"factor-{index}-unit-{unit_index}.png")
                for index, factors in enumerate(self.render_factors())
                for unit_index, unit in enumerate(factors)
            ]
            save(results, "manifest.json")


def show_factors(factors):
    for factor_index in range(factors.shape[-1]):
        factor = factors[..., factor_index]
        factor = (factor - np.min(factor)) / (np.max(factor) - np.min(factor))
        factor = factor - 0.5
        print(factor.min(), factor.max())
        factor = np.pad(factor, ((0, 0), (1, 1), (1, 1), (0, 0)), mode="constant")
        show(positive_negative_image(factor), w=80)


def fv_url(
    layer_name, channel_index, objective_name="channel", model_name="InceptionV1"
):
    if model_name == "InceptionV1":
        return f"https://openai-clarity.storage.googleapis.com/model-visualizer%2F1556758232%2F{model_name}%2Ffeature_visualization%2Falpha%3DFalse%26layer_name%3D{layer_name}%26negative%3DFalse%26objective_name%3D{objective_name}%2Fchannel_index={channel_index}.png"
    if model_name == "AlexNet":
        return f"https://openai-clarity.storage.googleapis.com/model-visualizer/{model_name}/feature_visualization/alpha%3DFalse%26layer%3D{layer_name}%26negative%3DFalse%26objective_name%3D{objective_name}/channel-{channel_index}.png"
    else:
        ...


def reorder_as_hf_lf(component):
    assert component.shape[0] == 2
    coeffs_hf = component[0][known_hf[layer_name]]
    coeffs_lf = component[0][known_lf[layer_name]]
    is_lf_factor = sum(coeffs_lf) > sum(coeffs_hf)
    if is_lf_factor:
        print(
            f"{layer_name}'s factor 0 seems to be {'LF' if is_lf_factor else 'HF'} factor. (HF: {sum(coeffs_hf)}, LF: {sum(coeffs_lf)})"
        )
        print(
            f"Swapping order of {layer_name}'s component 0 and 1 so HF is component 0."
        )
        return np.flip(component, axis=0)
    else:
        return component


def cos_aliased(X, eps):
    return (np.sin(X + eps) - np.sin(X - eps)) / (2 * eps)


def sinusoidal_grating(ang, freq, N=60, phase=0):
    X = np.linspace(-1, 1, num=N, endpoint=False)[None]
    eps = 1 / (N)  # 2/4N
    Y = X.T
    ang *= 6.283
    phase *= 6.283
    #   our_cos = lambda X: cos_aliased(X, eps) # np.cos
    our_cos = (
        lambda X: sum(np.cos(X + eps * rand) for rand in np.random.randn(10)) / 10.0
    )
    X_, Y_ = np.cos(ang) * X + np.sin(ang) * Y, -np.sin(ang) * X + np.cos(ang) * Y

    a = 0.5 + 0.3 * (our_cos(6.28 * X_ * freq + phase))  # +np.cos(6.28*Y_*freq))
    b = 0.5 + 0.3 * (our_cos(6.28 * X_ * freq + phase))  # +np.cos(6.28*Y_*freq))

    cut = np.clip(0.5 + 5 * X_, 0, 1)
    return cut * a + (1 - cut) * b


def high_low_freq(ang, r=2, N=60):
    X, Y = axes(N)
    ang *= 6.283
    X_, Y_ = np.cos(ang) * X + np.sin(ang) * Y, -np.sin(ang) * X + np.cos(ang) * Y

    ka, kb = 2 * r, 2 / r
    a = 0.5 + 0.3 * (np.cos(6.28 * X_ * ka) + np.cos(6.28 * Y_ * ka))
    b = 0.5 + 0.3 * (np.cos(6.28 * X_ * kb) + np.cos(6.28 * Y_ * kb))

    cut = np.clip(0.5 + 5 * X_, 0, 1)
    return cut * a + (1 - cut) * b


def stimuli_iter(
    iterator=high_low_freq,
    num_frequencies=60,
    num_angles=360,
    w=60,
    oversample=1,
    max_angle=1.0,
    phase=0,
    **kwargs,
):
    for freq in np.logspace(0, 1, num=num_frequencies):
        for angle in np.linspace(0, max_angle, num=num_angles):
            oversampled = iterator(angle, freq, oversample * w, phase, **kwargs)[
                ..., None
            ] + np.zeros([1, 1, 3])
            if oversample == 1:
                yield oversampled
            else:
                yield zoom(oversampled, (1 / oversample, 1 / oversample, 1), order=3)


def stimuli_array(
    iterator=high_low_freq,
    num_frequencies=60,
    num_angles=360,
    w=60,
    oversample=1,
    max_angle=1.0,
    phase=0,
    **kwargs,
):
    array = np.array(
        list(
            stimuli_iter(
                iterator,
                num_frequencies,
                num_angles,
                w,
                oversample,
                max_angle,
                phase,
                **kwargs,
            )
        )
    )
    return array.reshape((num_frequencies, num_angles, w, w, 3))


def high_low_freq2(ang, f1, f2, N=60):
    X, Y = axes(N)
    ang *= 6.283
    X_, Y_ = np.cos(ang) * X + np.sin(ang) * Y, -np.sin(ang) * X + np.cos(ang) * Y

    ka, kb = f1, 1 / f2
    a = 0.5 + 0.3 * np.cos(6.28 * X_ * ka) * np.cos(6.28 * Y_ * ka)
    b = 0.5 + 0.3 * np.cos(6.28 * X_ * kb) * np.cos(6.28 * Y_ * kb)

    cut = np.clip(0.5 + 5 * X_, 0, 1)
    return cut * a + (1 - cut) * b


def hilo_stimuli_two_frequencies(num_freq1=60, num_freq2=60, num_angles=360, w=60):
    for freq1 in np.logspace(0, 1.0, num=num_freq1):
        for freq2 in np.logspace(0, 1.0, num=num_freq2):
            for angle in np.linspace(0, 1.0, num=num_angles):
                yield high_low_freq2(angle, freq1, freq2, N=w)[..., None] + np.zeros(
                    [1, 1, 3]
                )


def get_responses(model, layer, sample_iterator):
    with tf.Graph().as_default(), tf.Session() as sess:
        t_img = tf.placeholder("float32", [None, None, None, 3])

        T = render.import_model(model, t_img, t_img)
        t_layer = T(layer)

        responses = []
        for batch in chunked(sample_iterator, 64):
            for acts_ in t_layer.eval({t_img: batch}):
                D = (acts_.shape[0] + 1) // 2 - 1
                if D > 0:
                    acts_ = acts_[D:-D, D:-D]
                resp = acts_.max(0).max(0)
                responses.append(resp)

    return np.asarray(responses)


def rotated_stimuli(stimulus, N=360, W=96):
    for angle in range(-N // 2, N // 2):
        rotated = nd.rotate(stimulus, angle)
        rotated_width = rotated.shape[0]
        D = int((rotated_width - W) / 2)
        yield rotated[D : D + W, D : D + W]


def vis_html(layer_name, n, W=120):
    a_url = "https://storage.googleapis.com/inceptionv1-weight-explorer/%s_%s.html" % (
        layer_name,
        n,
    )
    img_url = (
        "https://storage.googleapis.com/inceptionv1-weight-explorer/images/neuron/%s_%s.jpg"
        % (layer_name, n)
    )
    img = "<img style='width: %spx;' src='%s'>" % (W, img_url)
    return "<a href='%s'>%s</a>" % (a_url, img)


def inline_block(html, margin=0):
    return (
        "<div style='display: inline-block; margin-right: %spx; margin-bottom: -20px;'>%s</div>"
        % (margin, html)
    )


H = lambda S: int(S, 16) / 255.0
C = lambda X: np.asarray([H(X[0:2]), H(X[2:4]), H(X[4:6])])


def weight_color_scale(x):
    if x < 0:
        x = -x
        if x < 0.5:
            x = x * 2
            return (1 - x) * C("f7f7f7") + x * C("92c5de")
        else:
            x = (x - 0.5) * 2
            return (1 - x) * C("92c5de") + x * C("0571b0")
    else:
        if x < 0.5:
            x = x * 2
            return (1 - x) * C("f7f7f7") + x * C("f4a582")
        else:
            x = (x - 0.5) * 2
            return (1 - x) * C("f4a582") + x * C("ca0020")


weight_heatmap = lambda X: np.asarray([[weight_color_scale(x) for x in X_] for X_ in X])


def W_image(x, width=60):
    W_img = weight_heatmap(x)
    img2 = (
        "<img src='%s' style='width: %spx; height: %spx; image-rendering: pixelated;'>"
        % (_image_url(W_img, domain=[0, 1]), width, width)
    )
    return img2


### For Synthetic Stimuli


H = lambda S: int(S, 16) / 255.0
C = lambda X: np.asarray([H(X[0:2]), H(X[2:4]), H(X[4:6])])


def weight_color_scale(x):
    if x < 0:
        x = -x
        if x < 0.5:
            x = x * 2
            return (1 - x) * C("f7f7f7") + x * C("92c5de")
        else:
            x = (x - 0.5) * 2
            return (1 - x) * C("92c5de") + x * C("0571b0")
    else:
        if x < 0.5:
            x = x * 2
            return (1 - x) * C("f7f7f7") + x * C("f4a582")
        else:
            x = (x - 0.5) * 2
            return (1 - x) * C("f4a582") + x * C("ca0020")


weight_heatmap = lambda X: np.asarray([[weight_color_scale(x) for x in X_] for X_ in X])


def precomputed_featurevis_html(layer_name, n, W=None):

    W_dict = {
        "mixed3a": 60,
        "mixed3b": 60,
        "mixed4a": 100,
        "mixed4b": 110,
        "mixed4c": 120,
        "mixed4d": 130,
    }
    if W is None:
        if W in W_dict:
            W = W_dict
        else:
            W = 60

    img_url = (
        "https://storage.googleapis.com/inceptionv1-weight-explorer/images/neuron/%s_%s.jpg"
        % (layer_name, n)
    )
    img = "<img  src='%s'>" % (img_url)
    a_url = "https://storage.googleapis.com/inceptionv1-weight-explorer/%s_%s.html" % (
        layer_name,
        n,
    )
    img = "<a href='%s'>%s</a>" % (a_url, img)

    return img


def neuron_with_weight(layer, unit, W):
    width_dict = {
        "mixed3a": 60,
        "mixed3b": 60,
        "mixed4a": 100,
        "mixed4b": 110,
        "mixed4c": 120,
        "mixed4d": 130,
        "mixed4e": 130,
        "mixed5a": 150,
        "mixed5b": 150,
    }
    if layer in width_dict:
        width = width_dict[layer]
    else:
        width = 60
    if width < 70:
        width = 70

    assert W.min() >= -1.01 and W.max() <= 1.01
    if len(W.shape) == 2:
        W_img = weight_heatmap(W)
    else:
        W_img = W
    url = _image_url(W_img, domain=(0, 1))

    return f"""
      <div style="display: inline-block; margin-right: 2px; margin-bottom: 4px;">
        <div style="image-rendering:pixelated; display: flex; flex-direction: column;">
        {precomputed_featurevis_html(layer, unit, W=width)}
        <img src="{url}" style="width:{width-2}px; height: {width-2}px;" class="weight">
        </div>
      </div>
    """


def neuron_with_weight_row(layer, units, W):
    inner_html = " ".join(neuron_with_weight(layer, n, W[..., n]) for n in units)
    return f"""<div style="width: 5000px; margin-top: 2px;">{inner_html}</div>"""


def neuron_with_weight_row_preselected(layer, units, W):
    inner_html = " ".join(
        neuron_with_weight(layer, n, W[i]) for i, n in enumerate(units)
    )
    return f"""<div style="width: 5000px; margin-top: 2px;">{inner_html}</div>"""


def ForceAvgPoolGrad(op, grad):
    inp = op.inputs[0]

    op_args = [op.get_attr("ksize"), op.get_attr("strides"), op.get_attr("padding")]
    smooth_out = tf.nn.avg_pool(inp, *op_args)
    inp_smooth_grad = tf.gradients(smooth_out, [inp], grad)[0]

    return inp_smooth_grad


def MaxAsAvgPoolGrad(op, grad):
    inp = op.inputs[0]

    op_args = [op.get_attr("ksize"), op.get_attr("strides"), op.get_attr("padding")]
    smooth_out = tf.nn.avg_pool(inp, *op_args)
    inp_smooth_grad = tf.gradients(smooth_out, [inp], grad)[0]

    return inp_smooth_grad


@functools.lru_cache(128)
def get_expanded_weights(model, layer1, layer2, W=5):

    """Get the "expanded weights" between two layers.

  Arguments:
    model: model to get expanded weights from
    layer1: earlier layer to expand weights between
    layer2: later layer to expand weights between
    W: spatial width of expanded weigths

  Returns:
    Expanded weights as numpy array of shape
    [W, W, layer1 channels, layer2 chanels]


  Discussion:

  Sometimes the meaningful weight interactions are between neurons which aren’t
  literally adjacent in a neural network, or where the weights aren’t directly
  represented in a single weight tensor. A few examples:

  * In a residual network, the output of one neuron can pass through the
    additive residual stream and linearly interact with a neuron much later
    in the network.
  * In a separable convolution, weights are stored as two or more factors,
    and need to be expanded to link neurons.
  * In a bottleneck architecture, neurons in the bottleneck may primarily be
    a low-rank projection of neurons from the previous layer.
  * The behavior of an intermediate layer simply doesn’t introduce much
    non-linear behavior, leaving two neurons in non-adjacent layers with a
    significant linear interaction.

  As a result, we often work with “expanded weights” -- that is, the result
  of multiplying adjacent weight matrices, potentially ignoring non-linearities.
  We generally implement expanded weights by taking gradients through our model,
  ignoring or replacing all non-linear operations with the closest linear one.

  These expanded weights have the following properties:

  * If two layers interact linearly, the expanded weights will give the true
    linear map, even if the model doesn’t explicitly represent the weights in a
    single weight matrix.
  * If two layers interact non-linearly, the expanded weights can be seen as
    the expected value of the gradient up to a constant factor, under the
    assumption that all neurons have an equal (and independent) probability of
    firing.

  They also have one additional benefit, which is more of  an implementation
  detail: because they’re implemented in terms of gradients, you don’t need to
  know how the weights are represented. For example, in TensorFlow, you don’t
  need to know which variable object represents the weights. This can be a
  significant convenience when you’re working with unfamiliar models!

  """

    # Set up a graph for doing attribution...
    with tf.Graph().as_default(), tf.Session(), gradient_override_map(
        {"Relu": lambda op, grad: grad, "MaxPool": MaxAsAvgPoolGrad}
    ):
        t_input = tf.placeholder_with_default(
            tf.zeros([1, 224, 224, 3]), [None, None, None, 3]
        )
        T = render.import_model(model, t_input, t_input)

        # Compute activations; this gives us numpy arrrays with the right number of channels
        acts1 = T(layer1).eval()
        acts2 = T(layer2).eval()

        # Compute gradient from center; due to overrides this just multiplies out the weights
        t_offset = (tf.shape(T(layer2))[1] - 1) // 2
        t_center = T(layer2)[0, t_offset, t_offset]
        n_chan2 = tf.placeholder("int32", [])
        t_grad = tf.gradients(t_center[n_chan2], [T(layer1)])[0]
        arr = np.stack(
            [
                t_grad.eval({n_chan2: i, T(layer1): acts1[:, 0:W, 0:W]})[0]
                for i in range(acts2.shape[-1])
            ],
            -1,
        )

        return arr

New hero image / title section

Our current hero:
image

I wonder if we can re-use a different diagram that shows a bit more about the detectors. E.g. @csvoss 's factor diagram would be really nice:

image

But I'm still experimenting with similar approaches that show both weights and feature vis. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.