Texture De-lighting

Project Managers: Zhongxia Yan, Michael Zhang

Team Members: Quinn Tran, Gefen Kohavi, Murtaza Dalal, Tracy Lou, Varsha Ramakrishnan

Machine Learning at Berkeley partnered with Unity Technologies to apply ML methods toward de-lighting surface texture.

Background

Realistic in-game objects are often captured from the real world through a process called photogrammetry. Photogrammetry involves capturing images of all angles of an object (e.g. a rock) and then reconstructing the object from those images with 3D reconstruction techniques. De-lighting is necessary to remove the effect of non-uniform real world lighting and shadow on the object, so that the object can be re-lit by lighting within the game environment. Currently, de-lighting is manually done by artists.

Data

We aimed to build models that operated on surface texture maps (i.e. the unwrapped surface of a 3D object), instead of operating on meshes directly. Our model takes in the lighted texture map and seeks to generate a de-lit texture map. Unity Technologies has several de-lit texture maps already (done by artists), so this serves as our desired output. To generate the lighted inputs into our models, Unity Technologies placed the de-lit meshes in various lighting conditions and generated the lit texture maps. Our dataset is entirely consisted of textures of rocks.

Below: left is de-lit by artist (our ground truth), right is lighted.

Model

Our core model consists of a 4 layer encoder followed by a 4 layer decoder, with residual connections between the layers. The model is fully convolutional, so it takes in 32x32 randomly cropped / rotated / flipped patches as input during training time (this speeds up training dramatically and produces better output), but takes in the entire texture map during test time.

Loss Function

We experimented with several loss functions on top of our core model.

L2

This is the pixelwise L2 loss between predicted texture map and desired texture map.

Gradient Difference

Let I(0, 0) be an image and let I(1, 0) be the image shifted one pixel to the right, and I(0, 1) be the image shifted one pixel up. We call the horizontal gradient I(0, 0) - I(1, 0), and the vertical gradient I(0, 0) - I(0, 1). We compute the horizontal and vertical gradients for the model output and desired output, then take the L2 loss between these two outputs for our gradient difference loss.

This particular loss enforces relative changes from pixel to pixel, so this allows the output to keep more of the fine details from the input.

Adversarial

We train a convolutional discriminator to predict whether a texture map is a de-lit ground truth (vs generated by our generator).

Alpha Mask

Since our inputs have regions where alpha = 0 (see examples above), we tried applying a mask that only take the alpha > 0 regions into account when calculating the losses above.

Results

We show results on the test set (ground truth has never been seen by model before). All result series are lit texture (model input), model output, ground truth. Title is type of loss function we used.

L2 + Gradient Difference + Adversarial + Scaled Input + Alpha Mask (Best)

This has the three losses in the title, and we directly add a scaled (factor is trainable) input to the output. We hoped that this would better directly transfer the fine details to the output, but this transfered too much of the lighting to the output as well. In addition we use a mask to ignore contributions to the loss from regions where alpha = 0.

The lighting and shadow effects are removed, and the resolution is better than all the other models that we tried. Some mid-level details (e.g. dark regions of ~ 3-5 pixels in radius) are lost.

L2 with Full Image Input (instead of 32x32 patches)

Output is much blurrier than our best, training time was significantly long, not all lighting effects are removed, and the color of the red rim is off.

L2 (32x32 Patches)

Output resolution is better and the color is closer (especially the rim), but still not all lighting effects are removed and the color of the red rim is off.

L2 + Gradient Difference

The gradient difference loss greatly improves the amount of fine details kept.

L2 + Gradient Difference + Adversarial

Adversarial loss didn't seem to help too much beyond just L2 + Gradient Difference, but we didn't get to do more tuning.

The adversarial component of generator loss never plateau'ed, so we probably should spend more time tuning hyperparameters.

L2 + Gradient Difference + Adversarial + Scaled Input

This has the same loss as above, but we directly add a scaled (factor is trainable) input to the output. We hoped that this would better directly transfer the fine details to the output, but this transfered too much of the lighting to the output as well.

zhongxiayan / mlb_unity_delighting Goto Github PK

mlb_unity_delighting's Introduction

Texture De-lighting

Project Managers: Zhongxia Yan, Michael Zhang

Team Members: Quinn Tran, Gefen Kohavi, Murtaza Dalal, Tracy Lou, Varsha Ramakrishnan

Background

Data

Model

Loss Function

L2

Gradient Difference

Adversarial

Alpha Mask

Results

L2 + Gradient Difference + Adversarial + Scaled Input + Alpha Mask (Best)

L2 with Full Image Input (instead of 32x32 patches)

L2 (32x32 Patches)

L2 + Gradient Difference

L2 + Gradient Difference + Adversarial

L2 + Gradient Difference + Adversarial + Scaled Input

mlb_unity_delighting's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

Jobs