lengstrom / fast-style-transfer Goto Github PK

View Code? Open in Web Editor NEW

10.9K 324.0 2.6K 11.02 MB

TensorFlow CNN for fast style transfer ⚡🖥🎨🖼

Python 99.26% Shell 0.74%

style-transfer neural-style neural-networks deep-learning

fast-style-transfer's Introduction

Fast Style Transfer in TensorFlow

Add styles from famous paintings to any photo in a fraction of a second! You can even style videos!

It takes 100ms on a 2015 Titan X to style the MIT Stata Center (1024×680) like Udnie, by Francis Picabia.

Our implementation is based off of a combination of Gatys' A Neural Algorithm of Artistic Style, Johnson's Perceptual Losses for Real-Time Style Transfer and Super-Resolution, and Ulyanov's Instance Normalization.

Sponsorship

Please consider sponsoring my work on this project!

License

Copyright (c) 2016 Logan Engstrom. Contact me for commercial use (or rather any use that is not academic research) (email: engstrom at my university's domain dot edu). Free for research use, as long as proper attribution is given and this copyright notice is retained.

Video Stylization

Here we transformed every frame in a video, then combined the results. Click to go to the full demo on YouTube! The style here is Udnie, as above.

Stylized fox video. Click to go to YouTube!

See how to generate these videos here!

Image Stylization

We added styles from various paintings to a photo of Chicago. Click on thumbnails to see full applied style images.

Implementation Details

Our implementation uses TensorFlow to train a fast style transfer network. We use roughly the same transformation network as described in Johnson, except that batch normalization is replaced with Ulyanov's instance normalization, and the scaling/offset of the output tanh layer is slightly different. We use a loss function close to the one described in Gatys, using VGG19 instead of VGG16 and typically using "shallower" layers than in Johnson's implementation (e.g. we use relu1_1 rather than relu1_2). Empirically, this results in larger scale style features in transformations.

Virtual Environment Setup (Anaconda) - Windows/Linux

Tested on

Spec
Operating System	Windows 10 Home
GPU	Nvidia GTX 2080 TI
CUDA Version	11.0
Driver Version	445.75

Step 1：Install Anaconda

https://docs.anaconda.com/anaconda/install/

Step 2：Build a virtual environment

Run the following commands in sequence in Anaconda Prompt:

conda create -n tf-gpu tensorflow-gpu=2.1.0
conda activate tf-gpu
conda install jupyterlab
jupyter lab

Run the following command in the notebook or just conda install the package:

!pip install moviepy==1.0.2

Follow the commands below to use fast-style-transfer

Documentation

Training Style Transfer Networks

Use style.py to train a new style transfer network. Run python style.py to view all the possible parameters. Training takes 4-6 hours on a Maxwell Titan X. More detailed documentation here. Before you run this, you should run setup.sh. Example usage:

python style.py --style path/to/style/img.jpg \
  --checkpoint-dir checkpoint/path \
  --test path/to/test/img.jpg \
  --test-dir path/to/test/dir \
  --content-weight 1.5e1 \
  --checkpoint-iterations 1000 \
  --batch-size 20

Evaluating Style Transfer Networks

Use evaluate.py to evaluate a style transfer network. Run python evaluate.py to view all the possible parameters. Evaluation takes 100 ms per frame (when batch size is 1) on a Maxwell Titan X. More detailed documentation here. Takes several seconds per frame on a CPU. Models for evaluation are located here. Example usage:

python evaluate.py --checkpoint path/to/style/model.ckpt \
  --in-path dir/of/test/imgs/ \
  --out-path dir/for/results/

Stylizing Video

Use transform_video.py to transfer style into a video. Run python transform_video.py to view all the possible parameters. Requires ffmpeg. More detailed documentation here. Example usage:

python transform_video.py --in-path path/to/input/vid.mp4 \
  --checkpoint path/to/style/model.ckpt \
  --out-path out/video.mp4 \
  --device /gpu:0 \
  --batch-size 4

Requirements

You will need the following to run the above:

TensorFlow 0.11.0
Python 2.7.9, Pillow 3.4.2, scipy 0.18.1, numpy 1.11.2
If you want to train (and don't want to wait for 4 months):
- A decent GPU
- All the required NVIDIA software to run TF on a GPU (cuda, etc)
ffmpeg 3.1.3 if you want to stylize video

Citation

  @misc{engstrom2016faststyletransfer,
    author = {Logan Engstrom},
    title = {Fast Style Transfer},
    year = {2016},
    howpublished = {\url{https://github.com/lengstrom/fast-style-transfer/}},
    note = {commit xxxxxxx}
  }

Attributions/Thanks

This project could not have happened without the advice (and GPU access) given by Anish Athalye.
- The project also borrowed some code from Anish's Neural Style
Some readme/docs formatting was borrowed from Justin Johnson's Fast Neural Style
The image of the Stata Center at the very beginning of the README was taken by Juan Paulo

Related Work

Michael Ramos ported this network to use CoreML on iOS

fast-style-transfer's People

Contributors

Stargazers

Watchers

Forkers

vqc fervorarc tspannhw jemiks tianlongwang marius92mc minimaxir kmcodes kapyar shahrezjan stoeckley jondot gaspard gauravtiwari5050 ryanmcgary tibbon kobihcmomanyi mbijon cvertex gopigrip7 piskvorky ghosthamlet bigdig woshilaozhang nitinreddy3 cgiogkarakis hoardboard akansal1 stevenlol jjjjohnson techscientist rhythm92 chenboy3 templeblock mylxiaoyi eos21 meihai monicazhao ml-lab reefaq zed9 sam2015 johndpope benjamesbabala vyraun cbonitz rutgershan modulexcite hnarayanan vibhudadhichi maniacs-ops curtiszimmerman hrobot clcarwin amoliu ruofeidu jnaulty mvpcom conqs isumitg ismaelga jithsjoy a75c6 savourylie allisterke sergiogarciadev-forks abtinsetyani liuguoyou chenmoshushi srowhani ityancs zjuwangfei antonini rockystevejobs govi2010 dorniwang zhang5555 elchapoo alexisylchan huleg geduo15 luzhouheng chirayukong wfxiang08 sunhwan ahn19 yangwao globetro robustfengbin yo-maxwill jalused zhouqingping gzzgz mogaio cqrfs erkhemee liboan sydneylauren erikawho eabrash

fast-style-transfer's Issues

evaluate.py: "IOError: cannot identify image file '.DS_Store'"

It may be worth it to add a file extension check to files loaded, if possible, from the in-path specified in evaluate.py to reduce problems in OSX. (in the meantime, rm .DS_Store in the Terminal fixes the issue)

Add requirements.txt

Currently installing this is a bear otherwise.

OOM on GTX 1080

Similar to the issue #9 I'm hitting a ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[20,65536,64]

This is with GTX1080 card (8gb, 7.4GB available to TF) Cuda 8, CUDNN 5. Tryed training with a smaller image (100k and a small style 50k), will add logs below.

Running on iOS

Since TensorFlow works on iOS, is it possible to run your fast-style on iOS?

Thanks!

Is it possible to use tensorflow serving to serve this model?

I'm wondering if it's possible to use serving to serve the model with regression method. Thanks.

I mean that if we can export the model like exporting mnist to TensorFlow serving, and use GRPC server to serve it. Do you think if it's possible?

More than one input image error(?)

You can change the title to something more fitting if this isn't what's going on, but I get the following whenever there's more than one jpg in the given input dir, but everything works fine if there's only one.

Salient output dump, Ubuntu 16.10, 64bit:

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: GeForce GTX 950
major: 5 minor: 2 memoryClockRate (GHz) 1.3165
pciBusID 0000:03:00.0
Total memory: 1.95GiB
Free memory: 1.64GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950, pci bus id: 0000:03:00.0)
Traceback (most recent call last):
  File "evaluate.py", line 125, in <module>
    main()
  File "evaluate.py", line 122, in main
    batch_size=opts.batch_size)
  File "evaluate.py", line 54, in ffwd
    assert img.shape == img_shape
AssertionError

transform_video.py doesn't detect framerate, assumes that every video has FPS 30

How are the VGG weights fixed?

Hi, lengstrom, thank you for your code.
In my opinion, the VGG weights are fixed during the optimization process.
Where is the code that controls this?

Training seems to be very slow on Tesla K40

I am trying to generate a model file using the following,

python style.py --style examples/style/la_muse.jpg --checkpoint-dir checkpoint/check2 --content-weight 1.5e1 --checkpoint-iterations 1 --batch-size 20

At very beginning of the training process, it took about 20 seconds to finish one iteration. However, as the training process goes on, the time of one iteration becomes much longer. Currently the training process has been going on for more than 14 hours and only 621 iterations have finished.

The followings are the screen output during the training process.

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Train set has been trimmed slightly..
(1, 512, 512, 3)

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:01:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x2a55840
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:02:00.0)
UID: 11
Epoch 0, Iteration: 1, Loss: 1.60785e+08
style: 1.4523e+08, content:8.83568e+06, tv: 6.71974e+06
Epoch 0, Iteration: 2, Loss: 1.08762e+08
style: 9.43464e+07, content:7.88181e+06, tv: 6.5336e+06
.............................................
Epoch 0, Iteration: 620, Loss: 1.08479e+07
style: 3.26121e+06, content:6.84982e+06, tv: 736917.0
Epoch 0, Iteration: 621, Loss: 1.12474e+07
style: 3.23448e+06, content:7.27868e+06, tv: 734276.0

**Only 621 iteratons have finished after more than 14 hours. Currently, one iteration takes a couple of minutes.

Any idea why the training process is so slow on Tealsa K40?
In addition, how can I adjust the parameters (e.g., batch-size) in order to speedup the training process without compromising the quality of the model file too much? Thanks.**

Handle Transparency in evaluate.py

I've got fast-style-transfer running and it's so awesome to see the results! I've made my own checkpoint and love tweaking the values. I was wondering, is there a way to handle transparent PNGs or GIFs so that when alpha = 0 the style transfer isn't applied?

Noise is generated in the picture.

I have been training and testing several style images.
However, noise is generated as shown below, where is the problem?

Not an issue - Typo in docs.md

In docs.md the first flag for style.py should be '--checkpoint-dir', not '--checkpoint'. Also getting a whine for line 91 of src/optimize.py that initialize.all.variables needs to be updated to tf.global.variables.initializer. Haven't tried it myself. My slow system is working on stylizing a Van Gogh painting. About 18 hours into it on epoch 0, 10000 iterations. Not much stylizing on the test image as yet. I am using the defaults in style.py.

error on training images

The setup.sh script did not work so I manually created the directories and downloaded the VGG network and training images into the default file paths. However when I run style.py I get this error:

Traceback (most recent call last):
  File "style.py", line 166, in <module>
    main()
  File "style.py", line 146, in main
    for preds, losses, i, epoch in optimize(*args, **kwargs):
  File "src\optimize.py", line 18, in optimize
    mod = len(content_targets) % batch_size
TypeError: object of type 'map' has no len()

I believe content_targets is supposed to take the individual images into an xrange, but it is not happening. Do you know why it's becoming a 'map' object type instead?

iOS App

I have developed an iOS App based on fast neural style. It can process both image and video. And all computation is 100% on your iPhone. The App is called "Artly" on AppStore.

Memory runs out when training a network under python 2.7, cuda 8 on a GTX 1070 with 8GB of memory.

tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                  6510195508
InUse:                  6470140160
MaxInUse:               6501474560
NumAllocs:                    1156
MaxAllocSize:           4132770816

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 20.00MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[20,512,512]
Traceback (most recent call last):
  File "style.py", line 167, in <module>
    main()
  File "style.py", line 147, in main
    for preds, losses, i, epoch in optimize(*args, **kwargs):
  File "src/optimize.py", line 114, in optimize
    train_step.run(feed_dict=feed_dict)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1550, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3764, in _run_using_default_session
    session.run(operation, feed_dict)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[20,32,32,512]
	 [[Node: sub_18 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Relu_35, Relu_9)]]

Caused by op u'sub_18', defined at:
  File "style.py", line 167, in <module>
    main()
  File "style.py", line 147, in main
    for preds, losses, i, epoch in optimize(*args, **kwargs):
  File "src/optimize.py", line 65, in optimize
    net[CONTENT_LAYER] - content_features[CONTENT_LAYER]) / content_size
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 794, in binary_op_wrapper
    return func(x, y, name=name)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2775, in _sub
    result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/opt/anaconda3/envs/style-transfer/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[20,32,32,512]
	 [[Node: sub_18 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Relu_35, Relu_9)]]

preds = tf.nn.tanh(conv_t3) * 150 + 255./2

preds = tf.nn.tanh(conv_t3) * 150 + 255./2
what does this mean?

The defaut configuration used by examples

The trained model(ckpt) used by examples performance very well. I'm wondering what configuration it used, may you list one of the detailed commands you used please.

transform_video.py doesn't duplicate audio

Feature: Allow style transfer to images of different dimensions

The current implementation creates one TF session with fixed input shape, and thus requires the input folder to only contain images of the same input size.

It would be useful to batch-process a folder with different input dimensions, using the minimum amount of TF sessions.

Were the pretrained models trained with the default values of the script?

KeyError: 'normalization'

Traceback (most recent call last):
File "style.py", line 166, in
main()
File "style.py", line 146, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src/optimize.py", line 33, in optimize
net = vgg.net(vgg_path, style_image_pre)
File "src/vgg.py", line 27, in net
mean = data['normalization'][0][0][0]
KeyError: 'normalization'

Video sharing

This is not an issue. I just didn't know how to share my video with you.
Here is the link:
https://vimeo.com/207744132

Thanks a lot for your brilliant work!

'module' object has no attribute 'stack' on latest TensorFlow

I followed the advice on this problem and updated my tensorflow to the one listed here: https://www.tensorflow.org/install/install_linux#the_url_of_the_tensorflow_python_package
with GPU support, If I install tensorflow via "pip install tensorflow/tensorflow-gpu it is unable to find the CUDA libs so installing the latest mentioned version via a wheel package finds the CUDA files properly but I still get the AttributeError: 'module' object has no attribute 'stack' error.

Yes, I've tried setting environment variables for CUDA to make Tensorflow detect it but it just does not work. Here's the log: https://pastebin.com/Pqca0QzM

Any ideas?

Lets share calculated Evaluation Models here?

Please share links to your *.ckpt files here and please add sources from what image it were calculated

Generate grey output after training the style image with epochs=1

I trained the following style image using the below command on Nvidia Tesla K40.
python style.py --style examples/style/the_scream.jpg --epochs 1 --checkpoint-dir checkpoint/ --batch-size 20

Then I used the generated model file to stylize the image of Chicago landscape. The output is grey image as follows.

When training the style image with epochs=1, is the output image always grey? Do we have to train the style image for epochs=2?

variable "train" in _instance_norm layer has never been used

Just wonder if we need to do something special in the testing/validation stage?

after 5000 iteration， the test image turn to black

when i train a new style, i use a test image, at first the style transfer doing well, but after 5000 iteration, all my transfer style image turn to black, after training, i use the checkpoint to style new image, it turn to black too, am i doing something wrong？

Question

Can't we create own model file ? checkpoint folder is empty after running style script.

Reflection Padding?

In Johnson's paper, the input is padded with via reflection padding.

In this repo, reflection padding is not implemented, and padding is also used in the residual blocks.

Is there a reason the implementation is different from Johnson's paper? Or am I looking at the code in the wrong way?

Thanks.
Gyuri.

Training is slow on Tesla K80

Thanks for sharing the code.
As the document says "Training takes 4-6 hours on a Maxwell Titan X." I use Tesla K80 for training. However, it needs about 11 hour to complete the training process. I wonder if i miss some import tricks for training, can you give me some suggestions? Thank you.

My command is:
python style.py --style examples/style/rain_princess.jpg --checkpoint-dir checkpoint3/ --content-weight 1.5e1 --checkpoint-iterations 1000 --batch-size 20

After training checkpoint can't be used

I trained a new style and get this error during evaluation:

Unable to open table file /home/peter/checkpoints/pertsstyle.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

The file is 20MB same as the others.

Alternative Hosting for Evaluation Models

Dropbox Public Sharing has a bandwidth limit of 20GB or 200GB bandwidth per day; with the six 20MB models included with the repo, that limit will be hit very quickly if the repo gains traction (which I think it will).

I believe Google Drive has generous bandwidth limits.

What are the good choices for those parameters in the style.py?

Because the training process by running style.py may take 4-6 hours on even a decent GPU, say Tesla K40, I want to get some ideas regarding the good choices for those parameters in the style.py.

lengstrom defines the default values of the parameters in style.py. Are the default values are generally good for general cases? In other words, if using the default values, will the generated model file be good enough to stylize images using the evaluate.py?

I also have a couple additional questions regarding some specific parameters.

What is the good choice of the "checkpoint-iterations" parameter? The default value is 2000. In the example given by lengstrom in the README.md, it is 1000. Are 1000 iterations good enough to generate a good model file, or the training step has to go through 2000 iterations to generate a good model file?
The "batch-size" parameter. The default value is 20. The value used in the example in the README.md is 4. For the 12 GB memory available on K40, what is a good value for batch-size?
The "content-weight" parameter. Similarly, the default value (7.5e0) is different from the value (1.5e1) in the example in the README.md. Which value is better?

Thanks in advance for any help!

Any pretrained models to share?

Would you like to share some pretrained models ? Many thanks.

can we port this into mobile?

After finishing training model, can we run this transfer on iOS app?

OOM on Titan X Pascal

Any tips for making this fit on 12G? I get the following oom using evaluate.py:

I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 9.18GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                 11990125773
InUse:                  9853348352
MaxInUse:              11713353216
NumAllocs:                     456
MaxAllocSize:           4294967296

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *_********************************xxxx**********************xxxxxxxxxxxxxx***********_______________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 2.34GiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1,32,3728,5264]
Traceback (most recent call last):
  File "evaluate.py", line 125, in <module>
    main()
  File "evaluate.py", line 122, in main
    batch_size=opts.batch_size)
  File "evaluate.py", line 59, in ffwd
    _preds = sess.run(preds, feed_dict={img_placeholder:X})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[1,32,3728,5264]
         [[Node: conv2d_transpose_1 = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pack_1, Variable_42/read, Relu_8)]]

Can't run evaluate.py on gpu

I run: python evaluate.py --batch-size 1 --device /gpu:0 --checkpoint models/wave.ckpt --in-pat <img_path> --out-path <img_out_path>

It takes more than 3 seconds, I monitor the loading and only the processor is loaded.
It looks like it fallback to using the cpu.
I have an Intel i7 as cpu and gpu: Titan X

How can I deal with 'AttributeError: 'module' object has no attribute 'stack''

Hi I am really newbie for linux so I need your help.

Now I am working on ubuntu 14.04 and I have already installed tensorflow, numpy, scipy, pillow and anaconda2. I was trying to test how evaluate.py works but got an message like

/home/sj/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py:44:       DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

"This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
File "evaluate.py", line 266, in
main()
File "evaluate.py", line 253, in main
device=opts.device)
File "evaluate.py", line 186, in ffwd_to_img
ffwd(paths_in, paths_out, checkpoint_dir, batch_size=1, device_t=device)
File "evaluate.py", line 147, in ffwd
preds = transform.net(img_placeholder)
File "src/transform.py", line 14, in net
conv_t1 = _conv_tranpose_layer(resid5, 64, 3, 2)
File "src/transform.py", line 38, in _conv_tranpose_layer
tf_shape = tf.stack(new_shape)
AttributeError: 'module' object has no attribute 'stack'

Please let me know how to deal with this error.
Any help, comments or advices are appreciated.

Thanks in advance

Can i know what is the function (transform.net(image) (./src/transform.py line5) doing?

Can i know what is the function (transform.net(image) (./src/transform.py line5)) doing?
Its network structure looks not same as VGG network.
It seems training with VGG network and feed-forwarding (stylizing phase) to other(transform.net) network.
And i can't understand how it works.
Thank you.

Does the training run on multiple GPUs in the same computer system?

The style.py does not provide the argument "--device" to specify on which device the training will run.

If there are multiple GPUs in the same computer system,
(1) Will a single training task run on multiple GPUs distributively?
(2) If a single training task runs on a single GPU and I run multiple training tasks in parallel, are they all going to run on the same GPU device?

No shuffling after each epoch

Am I wrong or the optimizer code should shuffle the whole dataset after each epoch? At the moment this is not done and it could be quite bad for big batches training

The program automatically killed

After I wrote "python style.py --style path/to/style/img.jpg ......
--batch-size 20 '' Then it shows "Train set has been trimmed slighty..(1,514,928,3) killed."

Memory management/slicing

I have a budget GTX 950 with ~750 CUDA cores, it's pretty snappy and has enough CUDA support for some tensorflow experiments, but it only has 2GB of memory stock, and only about 1.7 free. This is gone pretty much instantly with any image over a certain size.

I do have 8 cores I can max with 16GB of system ram, and this has worked wonders using neural-style-af for anything that needs more ram, though it does take ~12 hours to generate a 1024px wide image. What I want to do is create an image high enough resolution to actually be printed (175-300 DPI, > 5in), but unless the problem can actually split up into enough pieces, this is going to take impossibly long, or impossibly too many resources.

Can we get a command line switch to use all available CPU cores instead of CUDA if I know I don't have enough memory? Or what would be amazing is some kind of solution to the memory problem, but my guess is that if you were do split an image into small slices each iteration would fail to match on the edges as the slices are non-continuous. I'd love any input on this :)

This is a good example image that fails with ResourceExhaustedError, it's about 8" @ 300DPI

TypeError: unsupported operand type(s) for -: 'JpegImageFile' and 'JpegImageFile'

I have been trying to train a new model using python style.py --checkpoint-dir checkpoint --style examples/style/test.jpg --train-path train2014 --content-weight 1.5e1 --checkpoint-iterations 1000 --batch-size 15 and counted this issue,

Latest version of tensorflow 1.0.0
Python 2.7.12
Pillow 4.0.0

Traceback` (most recent call last):
File "style.py", line 167, in
main()
File "style.py", line 147, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src/optimize.py", line 105, in optimize
X_batch[j] = get_img(img_p, (256,256,3)).astype(np.float32)
File "src/utils.py", line 20, in get_img
img = scipy.misc.imresize(img, img_size)
File "/home/malic/ml/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 478, in imresize
im = toimage(arr, mode=mode)
File "/home/malic/ml/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 341, in toimage
bytedata = bytescale(data, high=high, low=low, cmin=cmin, cmax=cmax)
File "/home/malic/ml/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 90, in bytescale
cscale = cmax - cmin
TypeError: unsupported operand type(s) for -: 'JpegImageFile' and 'JpegImageFile'

It seems like Pillow is not working correctly when doing imresize, which version should I be using or try to use the original version of PIL

got map has no len error when training

D:\Python35\fast-style-transfer>python style.py --style style\4154001_72-48.jpg
--checkpoint-dir check --test in/IMG_1132_s.JPG --test-dir test
Traceback (most recent call last):
File "style.py", line 166, in
main()
File "style.py", line 146, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src\optimize.py", line 18, in optimize
mod = len(content_targets) % batch_size
TypeError: object of type 'map' has no len()

Input image preprocessing

In optimize.py the input content image is divided by 255 before passing it though the transform net. We don't do something similar in evaluate.py. Also, we never do unprocess the output images. Is this by design? Or the scaling and shifting after tanh takes care of these two things.

Ran out of memory trying to allocate 3.27GiB.

I'm trying to run a sample styling and have a problem when running on GPU. I have GTX 760 with 2 GB RAM. Here are my logs:

python evaluate.py --checkpoint data/models/udnie.ckpt --in-path examples/content/ --out-path examples/ --allow-different-dimensions --device /gpu:0 --batch-size 1 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally in ['examples/content/chicago.jpg'] out ['examples/chicago.jpg'] Processing images of shape 474x712x3 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 760 major: 3 minor: 0 memoryClockRate (GHz) 1.15 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.65GiB I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760, pci bus id: 0000:01:00.0) E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support. W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\bfc_allocator.cc:217] Ran out of memory trying to allocate 3.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

Is there anything I can do except buying a new graphics card?

Are there ways to speedup the training process?

It seems that the models that are generated using "epochs=2" can produce very good output images. I tried to train the models using "epochs=1" and found the quality of the output images (produced by evaluate.py) is not as good as with the models using "epochs=2".

With "epochs=2", Titan-X (Maxwell) can finish the training process in about 5 hours. Tesla K40/K80 will need almost 2 times the period, i.e., 10 hours. This is due to two reasons. (1) Only single-precision floating point operations are used in the training process. The good double-precision floating point performance on K40/K80 does not help. (2) The clock frequency of GPU cores on K40/K80 is only half of the clock frequency of GPU cores on the Maxwell architecture.

5 hours on Titan X (or 10 hours on K40/K80) are very long periods. Are there ways to optimize the algorithm so that good model files can be generated in short periods?

debugging loss function error

Seems there's some code missed in "style.py".
When tried to run it with "--slow" flag got this error:
Traceback (most recent call last): File "style.py", line 170, in <module> main() File "style.py", line 164, in main save_img(preds_path, img) NameError: global name 'img' is not defined

No checkpoint model

Hello, thank you for your amazing work.
After I run the style.py.
It shows Training complete.
But I can't find the .ckpt file at the directory.

Any suggestions ?
Thank you.