hzwer / iccv2019-learningtopaint Goto Github PK

View Code? Open in Web Editor NEW

2.2K 2.2K 315.0 18.67 MB

ICCV2019 - Learning to Paint With Model-based Deep Reinforcement Learning

License: MIT License

Python 89.95% Jupyter Notebook 10.05%

computer-vision deep-learning painting pytorch reinforcement-learning

iccv2019-learningtopaint's Introduction

Hi there 👋

I used to be an algorithm contest player NOI🥈, ICPC-regional🏅️.
I worked at MEGVII Research From 2017 to 2023. Currently I work at StepFun. I received my B.S. degree from Peking Univerisity in 2020.

Google Scholar, 知乎, 算法博客, Email, CV

Main Projects:

Cooperation Projects:

Service: CVPR22-24/ECCV22-24/ICCV23/AAAI23-24/NeurIPS23-24/ICLR24/ICML24/WACV24/ACCV24/SIGGRAPH/TIP/TPAMI/TOMM

iccv2019-learningtopaint's People

Contributors

Stargazers

Watchers

Forkers

faketomatoes amoliu edbeeching wh-forker leo-xxx joechenrh python-z ak9250 ametavaibhav hhy5277 burakakrishna moranqingchen team-honglou ruinnight zsc levinkkk nguyenducnhaty chaoyue729 awesome-archive chenyouxin113 lijinhuai truehaolix kissthink adolfoeliazat vaanye xujiaming1997 glchaos sabogeng amath0312 lubaoyilang alphonsetai dbdoer wnheng peterzhousz cclauss tvturnhout gitcontainer hyzcn betai18n wangyuescream kaozhub justgohead howardyan93 pranavcode helloexception veikin samimideksa adamlouisky avatarworld faruba waitingfy 526326991 sicolas yhtian01 exitna haifengzeng parety abnerhu sa757 cndavy fashtimedotcom anoceanapart lin1github kocwei batermj pandinosaurus opptimus sjl421 victor8733 hogwartsrico arthur-null fancyerii ryanliu0808 shuidong benikaba fakegit chenokay jbnhandsome w121211 koryako wenhuach lotapp jsshaojinjie willnevermore gracedgl xiaolangsong czibula belial2010 jizongfox xiongyaokun levindong littleserendipity mahsamrm 24kb-star aciuvu robvcc hunterhawk rosssong 19ai que-yue

iccv2019-learningtopaint's Issues

Would you please provide download link in Baidu netdisk?

Hi, Hzwer,
Would you please provide download link of these render.pkl and actor.pkl with a Baidu netdisk share? As you know, google is not easy to access from China. Thanks.

Role of constant coordinates in merged state?

Hi,

Can you please clarify why the merged state for actor, critic takes the constant coordinate images as input?

Thanks!

Training parameters

Hi !

I am trying to train the paint agent in my GPU. In the paper I could read that the training time was about 2 days in your case.

Can you tell me what parameters used you to train the paint agent? In my case the training time is more than 1 week (I am training the agent in a GPU too but I think that there is a lot of time difference).

Thanks so much!

stroke sequence learning

@hzwer
Can this method be trained not only to pain, but also to pain in a certain sequence?
i am interested in training a network to learn the sequence and order of the drawing and strokes.
any suggestions

Question about reward

I find the reward save to Replay buffer https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/env.py#L105 is different from the reward calculate in training process https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/DRL/ddpg.py#L102 ，one is divide by initial distance and one is not, is it a bug? or it's just ok

notrans renderer weights loading error

Hello @hzwer
i have a problem loading notrans renderer weights while using test.py file
here is a screenshot of the error

Do i need to change anything in the code?

How to change strokes?

the readme.md said A step contains 5 strokes in default,when I train another model,where i can change strokes?

Why using weight norm rather than batch norm in discriminator?

as title

Critic and discriminator

Hi!
I am trying to understand the Deep Reinforcement Learning part. I know that the actor outputs is a set of stroke parameters based on the canvas status and target image and the discriminator give (to the actor) a reward at each step . But what about critic? What is the input and the output for the actor? I am reading the paper but I do not understand this part.
thank you so much

The differences between `env_batch` and `batch_size`

Hi, I dived into the code of your paper and I'm confused of the two variables env_batch and batch_size, which seems to be the same according to your implementation.

Could you give me some hints to help me figure it out? Thank you very much

confused by the update policy.

in the update_policy() :
cur_q, step_reward = self.evaluate(state, action) target_q += step_reward.detach() value_loss = criterion(cur_q, target_q)

it's quite confusing .. so the value_loss = discount*(self.critic(St+1)+reward(St+1)) -self.critic(St) ??

shouldn't it be : Value_loss = discount*(self.critic(St+1)) + reward(St) - self.critic(St) ?

how wgan is trained??

Thank you for sharing your awesome work!

In your paper, you mentioned that using wgan discriminator loss to define the reward.

But how wgan is trained in your work?(pre-train to some extent beforehand and using in cal_dis??)

only support the pictures size of 128x128?

How were straight strokes, circles and triangles generated?

Thanks for your nice work,

I am just wondering, for simple strokes like (flat) circles, triangles, rectangles, do we really need the renderer since we already have simpler state representation? For example, the circle only needs a center and a radius instead of a 10-value state vector.

`python3 baseline/train_renderer.py` call fails

Run out of memory

When I ran train. py with a GPU. It seems that RAM has run out. My computer has 46G of RAM, including 30G virtual memory.

$ python3 baseline/train.py --max_step=200 --debug --batch_size=96
mkdir: cannot create directory â€˜./modelâ€™: File exists
loaded 10000 images
loaded 20000 images
loaded 30000 images
loaded 40000 images
loaded 50000 images
loaded 60000 images
loaded 70000 images
loaded 80000 images
loaded 90000 images
loaded 100000 images
loaded 110000 images
loaded 120000 images
loaded 130000 images
loaded 140000 images
loaded 150000 images
loaded 160000 images
loaded 170000 images
loaded 180000 images
loaded 190000 images
loaded 200000 images
finish loading data, 197999 training images, 2001 testing images
observation_space (96, 128, 128, 7) action_space 13
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:157: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s0 =torch.tensor(self.state, device='cpu')
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:163: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s1 =torch.tensor(state, device='cpu')
 #0: steps:200 interval_time:9.08 train_time:0.00
 #1: steps:400 interval_time:22.40 train_time:0.00
 #2: steps:600 interval_time:19.66 train_time:6.90
 #3: steps:800 interval_time:20.01 train_time:5.28
 #4: steps:1000 interval_time:20.89 train_time:6.01
 #5: steps:1200 interval_time:20.52 train_time:6.34
 #6: steps:1400 interval_time:18.20 train_time:7.01
Killed

Here's the memory footprint

              total        used        free      shared  buff/cache   available
Mem:          15892       15627         139          11         125          81
Swap:         30273       30273           0

So glad to see that I'm credited.

because I'm a simple man.

the rl-painter idea was from another earlier adventure:
https://github.com/ctmakro/opencv_playground
which does not use RL but local gradient descent for greedy optimality.

hope you could put a link to that too :)

does your code include action batching?

does your code include action batching mentioned in your paper? From reading your code, I don't think it has action batching implemented.

Question about brush stroke texture

This is a terrific project. Being able to generate so few strokes is quite an achievement.

Is it possible to use a textured brush stroke?

What I mean is using a grayscale picture of a real brush stroke. The grayscale value in the picture gives the transparency.

See for example this blog post
http://3dstereophoto.blogspot.com/2018/07/non-photorealistic-rendering-software.html
that describes a "classic" (not AI) Stroke Based Renderer

Output strokes per iteration only.

Hi,
thank you for your work.

I am struggling to modify the code so that, when running python3 baseline/test.py --max_step=80 --actor=actor.pkl --renderer=renderer.pkl --img=path_to_image --divide=5, it generates images with only the strokes added during the latest iteration - instead of the sum of all strokes.

Do you have an idea on this?

Thank you

What is the meaning of training the Neural Renderer？

trained model and example how to use?

will the pretrained model be provided and how to use it?

How to make L2 rewards work?

I have tried to use L2 reward in ddpg.py line 102 and cancel WGAN optimization, but after the same iterations, this painter is not as good as WGAN reward.
Kindly, how do you make L2 rewards work?

some typos

noticed some typos in your paper:

equation 3 has a hanging paranthesis in the very right

V(s_t) = r(s_t, a_t) + γV(s_t1))

suggested fix:

V(s_t) = r(s_t, a_t) + γV(s_t1)

on page 5, the first sentence of the last paragraph,

The neural renderer network is consisting of several fully connect layers and convolution layers

suggested fix:

The neural renderer network is consisting of several fully connected layers and convolution layers

Hope it helps :)

Any other reward function

Very cool project! It seems using GAN loss here is a natural choice to compare the drawing and images. Have you ever tried other losses like the perceptual loss? Thank you!

hard_update

Hello :)

Could you tell me why is necessary this function and what it do exactly?

def hard_update(target, source):
for m1, m2 in zip(target.modules(), source.modules()):
m1._buffers = m2.buffers.copy()
for target_param, param in zip(target.parameters(), source.parameters()):
target_param.data.copy(param.data)

I do not understand! Thanks so much!

Decoding of strokes

The strokes are rendered from parameters to strokes and added to a canvas in the decode function

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 26 in 5f62ffc

def decode(x, canvas): # b * (10 + 3)

I've got a couple of questions regarding the procedure. Why does the decoder return

ICCV2019-LearningToPaint/baseline/Renderer/model.py

Line 34 in 5f62ffc

return 1 - x.view(-1, 128, 128)

? It is trained by comparing to the ground truth, why should it learn the inverse, instead of the actual image?

Why is the stroke then

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 28 in 5f62ffc

stroke = 1 - Decoder(x[:, :10])

And why is it added to the canvas via

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 36 in 5f62ffc

canvas = canvas * (1 - stroke[:, i]) + color_stroke[:, i]

I don't understand why you would do the 1 - stroke at every step in this chain. Also the canvas is initialized to all zeros. Is the canvas * (1 - stroke[:, k]) in canvas = canvas * (1 - stroke[:, k]) + color_stroke[:, k] really necessary? stroke is included in color_stroke anyway.

Am I missing something? Thanks for any help!

Parameter Doubts

Few doubts on parameters :

Q1. Here, what is the difference between max_steps, train_times, and episode_train_times? Can you please define them?

Q2. What happens during the warmup stage? ( Is there any issue if we keep warmup step=0)

Divide parameter and k=5

Hello :)

I have some doubts...

I have seen that in the algorithm a "divide" parameter is defined which divides the Canvas into mini canvas in order to improve the agent accuracy. But.... I would like to understand when this action is performed during the training (what are the steps). when the actor is going to make a stroke, the canvas is divided and then it is reconstructed?

Also I have seen that for each state the actor performs 5 actions (brush strokes), I understand that the discriminate gives the reward to the actor. But what about with respect to the critic? update q for each of the five actions?

Thank you very much in advance

What is the effect that add reward or not on Q value?

Hi hzwer, I have a question with update_policy function, In modelbased code, the return value of Q add gan_reward, but in modelfree code, it is not add gan_reward. Does it have any effect

spectral normalization GAN

Have you tried spectral normalization GAN & adding L1 distance to WGAN loss? I wonder how these two changes could impact the performance:

1. Replacing WGAN-GP with spectral normalization

Spectral normalization has two main advantages:

Slight performance improvement relative to WGAN-GP on ResNet. The inception score of spectral normalization had a slight upper hand — approximately 0.16 — with less deviation compared to WGAN-GP.
Spectral normalization is ~30% more computationally efficient.
Since both actors and critics use ResNet as the backbone, replacing WGAN-GP with spectral normalization can potentially yield meaningful results.

2. Combining WGAN-GP with spectral normalization

The authors of the spectral normalization paper suggest that combining WGAN-GP with spectral normalization can further improve the results compared to the baseline WGA-GP and spectral normalization GAN.

Stroke gen

Hi, looking at the draw() function it seems like the generator creates greyscale brushstrokes. Where do the colour parameters get inputted?

cleanup

while I was reviewing your code to better understand your paper, I found some dead code. Would you mind if I clean up some code, add some instructive comments ( for people like me ), and send a PR?

Renderer input features

Hi @hzwer ,
Could you clarify the input feature of the neural renderer as it is 10-value vector or 13-value vector (+RGB).
If training with 10-value vector, how the painter can generate color pictures?

Bests,

can't work out what versions I need of the dependencies

For example, pyTorch: is it 0.4.1 or 1.1.0? what version of tensorboardX do I need to use? And do you have a requirements.txt file I could see?

training

when running train.py with celebA it automatically gets interrupted

loaded 200000 images finish loading data, 197999 training images, 2001 testing images observation_space (96, 128, 128, 7) action_space 13 /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") /content/LearningToPaint/baseline/DRL/ddpg.py:158: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s0 = torch.tensor(self.state, device='cpu') /content/LearningToPaint/baseline/DRL/ddpg.py:161: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s1 = torch.tensor(state, device='cpu') ^C

also I am on gpu

About stroke generation

In stroke_gen.py you use Quadratic Bezier Curve to generate stroke. I wonder why (x1, y1) is calculated by (x0, y0) and (x2, y2)

x1 = x0 + (x2 - x0) * x1
y1 = y0 + (y2 - y0) * y1

What would happen if I comment this 2 line?

Question about Q value

I love this amazing project. I'm surprised that neural networks can do such incredible thing.
There is a small problem about Q value. In the paper cur_q = reward + γ * target_q, so normally it should be "return Q, gan_reward" in evaluate(). This is actually the case in model-free method. But in model-based method it's "return (Q+gan_reward), gan_reward", this makes me confused. Why does the Q value need to be added with the reward of the same step?

The default setting only supports L2 loss as reward

Hi, it seems the default training setting only supports L2 loss as reward, modifying it to wgan loss is slightly non-trivial.

关于 load_data 里的 img_test 疑问？

def load_data(self):
# CelebA
global train_num, test_num
for i in range(200000):
img_id = '%06d' % (i + 1)
try:
img = cv2.imread('/data/CelebA/celeba/img_align_celeba/' + img_id + '.jpg', cv2.IMREAD_UNCHANGED)
img = cv2.resize(img, (width, width))
if i > 2000:
train_num += 1
img_train.append(img)
else:
test_num += 1
img_test.append(img)
finally:
if (i + 1) % 10000 == 0:
print('loaded {} images'.format(i + 1))
print('finish loading data, {} training images, {} testing images'.format(str(train_num), str(test_num)))
请问在 env.py 文件 load_data 函数中，0~1999 张图片被 append 到 img_test 列表中，请问测试图片在哪里被用到了呢？我想使用这 2000 张图片对模型进行测试定量分析，该怎么用呢？test.py 只是对单张图片进行测试。

Possibility to output SVG instead of PNG

Can you point me in the right direction what I have to modify in order to generate an SVG with the generated strokes?

Renderer Training Doubts

Hi, I have an issue with the way how the Neural Renderer is trained.
Let's consider a generic ML/DL training procedure: We fix a train/validation set of fixed size and on the same train set we do backpropagation and then we evaluate on the final validation set. But here, we are randomly generating batchSize of 64, in both train and valid (after every 1000th iteration afair) parts and perform training for 5,00,000 epochs. I find this confusing, the randomly generated samples could vary drastically across the epochs, how are you ensuring model improvement? Are you simply trying to overfit the model to all possible combinations of co-ordinates in the canvas? I want to understand why you have taken this approach.

Thanks
Niharika

Undefined name 'init' in actor.py

flake8 testing of https://github.com/hzwer/LearningToPaint on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./baseline/DRL/actor.py:17:9: F821 undefined name 'init'
        init.xavier_uniform(m.weight, gain=np.sqrt(2))
        ^
./baseline/DRL/actor.py:18:9: F821 undefined name 'init'
        init.constant(m.bias, 0)
        ^
2     F821 undefined name 'init'
2

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

stroke

I want to get the final stroke parameters! What should I do? please! Thank you!

关于其他数据集的问题

您好！我在使用CUB Birds 和 Stanford cars数据集进行训练时，图片只显示一个颜色，随着训练过程进行也没有其他变化，我对代码的修改仅有load_data(), 为什么会造成这种情况呢？

Stroke opacity

Hi,
I noticed that each stroke is transparent, so that layers over layers of color will add up over time to form the target picture.
Is there a possibility to adjust the opacity to simulate the painting of a picture using a opaque palette? I guess for that the training of a new model would be necessary.

Thanks in advance.

Neural Renderer

Hello !

I want to understand the Neural renderer (DL network) part. How did you train this neural renderer?
If there is a dataset, please provide a link for it.
Have you used a traditional rendering algorithm in this case? (If so how ?)

Thank you

Different Neural Renderer

Hello @hzwer,
Kindly, I have 2 questions:-

I noticed you provided extra renderers in the README file. What modifications did you apply to the stroke_gen file so that you could train those renderers?
What bezierwotrans.pkl --- actor_notrans.pkl files names stand for?

Thanks in advance

What is the difference between `baseline` and `baseline_modelfree`?

Hi, I have a question about your directory structure.

What is the difference between baseline and baseline_modelfree?

I think that they look the same.

Would you teach me the difference?

hzwer / iccv2019-learningtopaint Goto Github PK

iccv2019-learningtopaint's Introduction

Hi there 👋

iccv2019-learningtopaint's People

Contributors

Stargazers

Watchers

Forkers

iccv2019-learningtopaint's Issues

1. Replacing WGAN-GP with spectral normalization

2. Combining WGAN-GP with spectral normalization

Recommend Projects

Recommend Topics

Recommend Org

Jobs