melodyguan / enas Goto Github PK
View Code? Open in Web Editor NEWTensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"
Home Page: https://arxiv.org/abs/1802.03268
License: Apache License 2.0
TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"
Home Page: https://arxiv.org/abs/1802.03268
License: Apache License 2.0
Thank you for providing us with the code.
I'm running the cifar10_macro_search.sh
The code, data, hyper parameter were taken as is and not modified.
tail of cifar10_macro_search.sh(in the stdout file) looks like :
Now I think I'm supposed to take the architecture the best validation, which is:
[2]
[1 0]
[1 0 0]
[5 0 0 1]
[0 0 1 0 0]
[2 0 1 0 0 0]
[2 1 1 0 0 0 1]
[3 0 0 0 1 0 1 1]
[3 0 1 0 0 0 1 0 1]
[5 1 0 0 0 0 0 0 0 0]
[0 1 0 1 1 0 0 0 1 0 1]
[1 1 0 0 0 1 1 1 0 1 1 1]
val_acc=0.8672
Is that right? Is this result the end of the discovered network(optimal one) using macro search?
Validation accuracy of selected architecture is 86.72%. This result is lower than 96.1%.
What's the reason? Do I have to modify parameters ( num_epochs, batch_size,...)?
Hi,
I was able to reproduce most of your paper's experiments with minor modifications to the code such as changing the hyper-parameters and implementing pooling functions in the macro_search_final.
So thank you for open-sourcing this code!
In the case of the macro search I can run both search and final_search with any architecture. For the micro search the final version with your given architecture also seems to work.
However if I try to conduct the micro search on CIFAR10 I get the following errors after graph construction when attempting to run the session with more than 4 branches, i.e. including pooling operations:
[[Node: child/layer_0/cell_0/x/strided_slice_3 = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=30, ellipsis_mask=0, end_mask=30, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child/layer_0/cell_0/x/stack, child/layer_7/cell_0/x/strided_slice_3/stack, child/layer_7/cell_0/x/strided_slice_3/stack_1, child/layer_7/cell_4/x/strided_slice_2/stack_2)]]
2018-06-20 14:21:34.518579: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: slice index 5 of dimension 0 out of bounds.
[[Node: child/layer_0/cell_0/x/strided_slice_3 = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=30, ellipsis_mask=0, end_mask=30, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child/layer_0/cell_0/x/stack, child/layer_7/cell_0/x/strided_slice_3/stack, child/layer_7/cell_0/x/strided_slice_3/stack_1, child/layer_7/cell_4/x/strided_slice_2/stack_2)]]
followed by the error message of the tf-graph crashing.
I downgraded my TF version to 1.4, as well as using Python 2.7 on Ubuntu 16.04 as suggested in one of the closed issues.
I think I roughly get where this is pointing to in the code: enas_cell of the micro child where there seems to be some indexing mismatch. Given that none of the tensorflow variable scopes give a name of "strided_slice" I am unable to pinpoint and thus fix the problem though. (If I understand correctly the strided_slice could also just be the graphs version of extended slicing like [:,0:5] etc.?)
I would greatly appreciate any help in fixing this. If you wish I can send a PR with all the other changes once fixed as well.
Hi --
Thanks again for releasing the code. Are you able to explain a little more what _apply_drop_path
is doing in cifar10/micro_child.py
, and what the motivation for this regularization is?
Thanks
Ben
We got the error while running the cifar10_micro_final.sh
with TensorFlow 1.6
.
bash scripts/cifar10_micro_final.sh
......
Build train graph
Layer 0: Tensor("child/layer_0/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer 1: Tensor("child/layer_1/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer 2: Tensor("child/layer_2/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer 3: Tensor("child/layer_3/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer 4: Tensor("child/layer_4/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 190, in get_ops
child_model.connect_controller(None)
File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 819, in connect_controller
self._build_train()
File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 702, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 283, in _model
normal_or_reduction_cell="reduction")
File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 430, in _fixed_layer
x_id = arc[4 * cell_id]
IndexError: index 0 is out of bounds for axis 0 with size 0
In the following line, why use layers = [layers[-1], x] instead of layers = [layers[0], x]?
enas/src/cifar10/micro_child.py
Line 277 in d1a90ac
In my opinion, if use your implementation, the model in the search stage is very different from that in the final stage.
Thank you for your answer.
I'm running the cifar10_macro_final.sh.
tail of cifar10_macro_final.sh(in the stdout file) looks like :
epoch=306 ch_step=153050 loss=0.001949 lr=0.0011 |g|=0.2311 tr_acc=100/100 mins=2119.42
epoch=306 ch_step=153100 loss=0.001135 lr=0.0011 |g|=0.0480 tr_acc=100/100 mins=2120.06
epoch=306 ch_step=153150 loss=0.001102 lr=0.0011 |g|=0.0725 tr_acc=100/100 mins=2120.71
epoch=306 ch_step=153200 loss=0.001061 lr=0.0011 |g|=0.0932 tr_acc=100/100 mins=2121.35
epoch=306 ch_step=153250 loss=0.001357 lr=0.0011 |g|=0.0557 tr_acc=100/100 mins=2122.00
epoch=306 ch_step=153300 loss=0.001437 lr=0.0011 |g|=0.1014 tr_acc=100/100 mins=2122.64
epoch=306 ch_step=153350 loss=0.001134 lr=0.0011 |g|=0.0374 tr_acc=100/100 mins=2123.29
epoch=306 ch_step=153400 loss=0.001238 lr=0.0011 |g|=0.0612 tr_acc=100/100 mins=2123.93
epoch=306 ch_step=153450 loss=0.001358 lr=0.0011 |g|=0.0592 tr_acc=100/100 mins=2124.58
epoch=307 ch_step=153500 loss=0.000936 lr=0.0011 |g|=0.0476 tr_acc=100/100 mins=2125.22
Epoch 307: Eval
Eval at 153500
test_accuracy: 0.9601
epoch=307 ch_step=153550 loss=0.001563 lr=0.0010 |g|=0.1156 tr_acc=100/100 mins=2126.32
epoch=307 ch_step=153600 loss=0.002924 lr=0.0010 |g|=0.4933 tr_acc=100/100 mins=2126.97
epoch=307 ch_step=153650 loss=0.001787 lr=0.0010 |g|=0.1428 tr_acc=100/100 mins=2127.61
epoch=307 ch_step=153700 loss=0.001629 lr=0.0010 |g|=0.1128 tr_acc=100/100 mins=2128.26
epoch=307 ch_step=153750 loss=0.001239 lr=0.0010 |g|=0.1101 tr_acc=100/100 mins=2128.90
epoch=307 ch_step=153800 loss=0.001421 lr=0.0010 |g|=0.0812 tr_acc=100/100 mins=2129.55
epoch=307 ch_step=153850 loss=0.001244 lr=0.0010 |g|=0.0784 tr_acc=100/100 mins=2130.19
epoch=307 ch_step=153900 loss=0.001778 lr=0.0010 |g|=0.1270 tr_acc=100/100 mins=2130.84
epoch=307 ch_step=153950 loss=0.001900 lr=0.0010 |g|=0.1715 tr_acc=100/100 mins=2131.48
epoch=308 ch_step=154000 loss=0.001303 lr=0.0010 |g|=0.0811 tr_acc=100/100 mins=2132.13
Epoch 308: Eval
Eval at 154000
test_accuracy: 0.9605
epoch=308 ch_step=154050 loss=0.008866 lr=0.0010 |g|=1.3467 tr_acc=100/100 mins=2133.23
epoch=308 ch_step=154100 loss=0.001046 lr=0.0010 |g|=0.0328 tr_acc=100/100 mins=2133.88
epoch=308 ch_step=154150 loss=0.001344 lr=0.0010 |g|=0.0558 tr_acc=100/100 mins=2134.52
epoch=308 ch_step=154200 loss=0.001324 lr=0.0010 |g|=0.0485 tr_acc=100/100 mins=2135.17
epoch=308 ch_step=154250 loss=0.001197 lr=0.0010 |g|=0.0587 tr_acc=100/100 mins=2135.81
epoch=308 ch_step=154300 loss=0.001323 lr=0.0010 |g|=0.0478 tr_acc=100/100 mins=2136.46
epoch=308 ch_step=154350 loss=0.000928 lr=0.0010 |g|=0.0559 tr_acc=100/100 mins=2137.10
epoch=308 ch_step=154400 loss=0.000729 lr=0.0010 |g|=0.0184 tr_acc=100/100 mins=2137.75
epoch=308 ch_step=154450 loss=0.001168 lr=0.0010 |g|=0.0731 tr_acc=100/100 mins=2138.39
epoch=309 ch_step=154500 loss=0.000755 lr=0.0010 |g|=0.0169 tr_acc=100/100 mins=2139.04
Epoch 309: Eval
Eval at 154500
test_accuracy: 0.9589
epoch=309 ch_step=154550 loss=0.000859 lr=0.0010 |g|=0.0329 tr_acc=100/100 mins=2140.14
epoch=309 ch_step=154600 loss=0.003031 lr=0.0010 |g|=1.0015 tr_acc=100/100 mins=2140.79
epoch=309 ch_step=154650 loss=0.001678 lr=0.0010 |g|=0.2013 tr_acc=100/100 mins=2141.44
epoch=309 ch_step=154700 loss=0.000810 lr=0.0010 |g|=0.0335 tr_acc=100/100 mins=2142.08
epoch=309 ch_step=154750 loss=0.001312 lr=0.0010 |g|=0.1542 tr_acc=100/100 mins=2142.73
epoch=309 ch_step=154800 loss=0.001046 lr=0.0010 |g|=0.0383 tr_acc=100/100 mins=2143.37
epoch=309 ch_step=154850 loss=0.001397 lr=0.0010 |g|=0.1267 tr_acc=100/100 mins=2144.02
epoch=309 ch_step=154900 loss=0.001565 lr=0.0010 |g|=0.0715 tr_acc=100/100 mins=2144.66
epoch=309 ch_step=154950 loss=0.001706 lr=0.0010 |g|=0.1621 tr_acc=100/100 mins=2145.31
epoch=310 ch_step=155000 loss=0.001614 lr=0.0010 |g|=0.0683 tr_acc=100/100 mins=2145.95
Epoch 310: Eval
Eval at 155000
test_accuracy: 0.9602
Now I think I'm supposed to outputs has the highest test accuracy at epoch 308, which is:
Epoch 308: Eval
Eval at 154000
test_accuracy: 0.9605
Is this outputs the test accuracy which is applied test data with the optimal parameters(set in macro_final.sh) and architecture you found?
Where are the classification results stored for the test data?
Thanks to your kind answers, I'm trying to apply source code to other images.
I am trying to apply a 256 * 256 image with two classes by modifying the source code micro search related to cifar10 . After center cropping image to 128 size, I setted batchsize = 10 , num_layer = 8 and reduction cell = 2,4,6,8. However, I do not think that learning is not good because loss does not fall. Is there any other solution to apply something.
Should I adjust my batch size or learning rate?
epoch=4 ch_step=4850 loss=0.584814 lr=0.0329 |g|=1.3465 tr_acc=8 / 10 mins=260.01
epoch=4 ch_step=4900 loss=0.382023 lr=0.0329 |g|=0.7714 tr_acc=9 / 10 mins=262.52
epoch=4 ch_step=4950 loss=0.266939 lr=0.0329 |g|=1.2139 tr_acc=10 / 10 mins=265.03
epoch=4 ch_step=5000 loss=0.685643 lr=0.0329 |g|=1.0374 tr_acc=7 / 10 mins=267.54
epoch=4 ch_step=5050 loss=0.754228 lr=0.0329 |g|=2.7833 tr_acc=5 / 10 mins=270.06
epoch=4 ch_step=5100 loss=0.688855 lr=0.0329 |g|=3.4854 tr_acc=5 / 10 mins=272.57
epoch=4 ch_step=5150 loss=0.648872 lr=0.0329 |g|=2.3838 tr_acc=7 / 10 mins=275.07
epoch=4 ch_step=5200 loss=0.744257 lr=0.0329 |g|=1.2733 tr_acc=3 / 10 mins=277.59
epoch=4 ch_step=5250 loss=0.706121 lr=0.0329 |g|=0.9046 tr_acc=4 / 10 mins=280.10
epoch=4 ch_step=5300 loss=0.472144 lr=0.0329 |g|=0.9058 tr_acc=10 / 10 mins=282.60
epoch=4 ch_step=5350 loss=0.726746 lr=0.0329 |g|=10.5671 tr_acc=5 / 10 mins=285.10
epoch=4 ch_step=5400 loss=0.563478 lr=0.0329 |g|=1.7984 tr_acc=7 / 10 mins=287.61
epoch=4 ch_step=5450 loss=0.496238 lr=0.0329 |g|=3.2853 tr_acc=9 / 10 mins=290.12
epoch=4 ch_step=5500 loss=0.651356 lr=0.0329 |g|=1.9665 tr_acc=7 / 10 mins=292.62
epoch=4 ch_step=5550 loss=0.666979 lr=0.0329 |g|=1.9109 tr_acc=7 / 10 mins=295.13
epoch=4 ch_step=5600 loss=0.634621 lr=0.0329 |g|=2.5509 tr_acc=7 / 10 mins=297.63
epoch=4 ch_step=5650 loss=0.769902 lr=0.0329 |g|=1.3555 tr_acc=6 / 10 mins=300.15
epoch=4 ch_step=5700 loss=0.387144 lr=0.0329 |g|=1.3194 tr_acc=9 / 10 mins=302.69
epoch=4 ch_step=5750 loss=0.436951 lr=0.0329 |g|=1.1916 tr_acc=9 / 10 mins=305.22
epoch=4 ch_step=5800 loss=0.544716 lr=0.0329 |g|=3.0641 tr_acc=8 / 10 mins=307.76
epoch=4 ch_step=5850 loss=0.489396 lr=0.0329 |g|=0.8843 tr_acc=8 / 10 mins=310.29
epoch=4 ch_step=5900 loss=0.389174 lr=0.0329 |g|=3.2995 tr_acc=9 / 10 mins=312.81
epoch=4 ch_step=5950 loss=0.372266 lr=0.0329 |g|=1.1757 tr_acc=9 / 10 mins=315.35
epoch=5 ch_step=6000 loss=0.338371 lr=0.0329 |g|=14.8412 tr_acc=9 / 10 mins=317.88
Epoch 5: Training controller
ctrl_step=200 loss=18.138 ent=57.21 lr=0.0035 |g|=0.4447 acc=0.7000 bl=0.38 mins=317.88
ctrl_step=205 loss=-0.033 ent=57.21 lr=0.0035 |g|=0.0006 acc=0.5000 bl=0.51 mins=318.07
ctrl_step=210 loss=8.164 ent=57.20 lr=0.0035 |g|=0.1327 acc=0.7000 bl=0.56 mins=318.26
ctrl_step=215 loss=-3.504 ent=57.20 lr=0.0035 |g|=0.0650 acc=0.5000 bl=0.57 mins=318.45
ctrl_step=220 loss=19.004 ent=57.20 lr=0.0035 |g|=0.3491 acc=0.9000 bl=0.59 mins=318.64
ctrl_step=225 loss=17.364 ent=57.21 lr=0.0035 |g|=0.3164 acc=0.9000 bl=0.59 mins=318.83
ctrl_step=230 loss=-0.830 ent=57.21 lr=0.0035 |g|=0.0080 acc=0.6000 bl=0.62 mins=319.02
ctrl_step=235 loss=9.372 ent=57.21 lr=0.0035 |g|=0.2280 acc=0.8000 bl=0.64 mins=319.21
ctrl_step=240 loss=-12.632 ent=57.21 lr=0.0035 |g|=0.3495 acc=0.4000 bl=0.63 mins=319.40
ctrl_step=245 loss=17.016 ent=57.22 lr=0.0035 |g|=0.2418 acc=0.9000 bl=0.60 mins=319.59
Here are 10 architectures
[0 3 0 0 0 1 0 0 2 0 2 3 2 0 3 1 0 1 0 1]
[1 0 0 2 2 4 0 2 2 1 0 0 0 3 3 3 0 0 3 4]
val_acc=0.6000
--------------------------------------------------------------------------------
[1 3 1 0 0 1 2 3 0 1 2 3 4 4 3 3 5 0 1 0]
[0 1 1 0 0 4 1 2 2 3 3 2 1 1 4 0 2 0 3 1]
val_acc=0.3000
--------------------------------------------------------------------------------
[1 1 1 1 2 0 1 0 2 1 3 1 0 0 0 0 2 1 5 0]
[1 0 0 4 0 2 0 0 1 4 0 0 4 3 1 1 5 0 2 0]
val_acc=0.6000
--------------------------------------------------------------------------------
[0 0 1 4 1 4 2 4 2 0 3 0 1 0 3 2 1 1 2 0]
[0 2 0 1 2 0 1 1 3 2 3 2 4 0 2 4 0 4 1 2]
val_acc=0.5000
--------------------------------------------------------------------------------
[0 1 1 1 1 1 1 1 3 1 2 0 1 0 3 0 3 1 3 0]
[0 3 0 0 2 1 1 0 0 0 1 1 2 4 1 1 2 2 0 1]
val_acc=0.6000
--------------------------------------------------------------------------------
[0 2 0 1 1 2 1 0 3 0 0 1 2 3 0 1 2 0 4 2]
[0 1 0 2 2 1 0 1 1 0 0 0 0 2 4 3 2 4 5 2]
val_acc=0.6000
--------------------------------------------------------------------------------
[1 1 1 3 0 3 2 2 1 2 3 1 1 0 0 1 1 0 3 2]
[1 2 1 0 1 0 1 4 1 1 0 0 4 1 3 0 3 1 2 2]
val_acc=0.5000
--------------------------------------------------------------------------------
[1 2 1 3 0 1 2 0 2 1 0 2 0 0 4 0 3 4 4 0]
[1 3 1 0 0 1 1 3 2 4 3 4 4 4 4 0 3 2 2 1]
val_acc=0.5000
--------------------------------------------------------------------------------
[0 0 0 1 2 3 0 0 1 0 0 0 3 1 1 0 1 1 5 1]
[1 4 1 0 0 1 2 4 3 3 0 1 2 3 4 3 3 1 3 4]
val_acc=0.4000
--------------------------------------------------------------------------------
[0 0 0 0 1 4 0 3 0 1 1 0 4 1 4 0 0 4 3 1]
[0 0 0 1 2 3 1 3 2 3 2 0 0 4 1 4 1 4 5 4]
val_acc=0.5000
--------------------------------------------------------------------------------
Epoch 5: Eval
Eval at 6000
[[1589 0]
[1411 0]]
Total Accuracy: 52.97 %
There are a number of differences between _fixed_layer
and _enas_layer
in cifar10/micro_child.py
.
Are you able to give some insight on why the code works like this? It seems that when a fixed architecture is specified, the resulting model is not necessarily exactly the same as during the RL training. It seems to me like the easiest way to fix the child architecture is to have an alternate "dummy controller", that just keeps normal_arc
and reduce_arc
fixed at the desired architecture.
Thanks
Ben
directory: src/utils.py
error par: from line 141 to 151.
when the system perform clip_mode == "norm", the original variable "g" is assigned to "grads", not the wanted variable "c_g". This may be a bug. Please check it.
In this part of the code:
logits = tf.matmul(next_h[-1], self.w_soft) + self.b_soft
if self.temperature is not None:
logits /= self.temperature
if self.tanh_constant is not None:
op_tanh = self.tanh_constant / self.op_tanh_reduce
logits = op_tanh * tf.tanh(logits)
if use_bias:
logits += self.b_soft_no_learn
b_soft
and b_soft_no_learn
are initialised as:
with tf.variable_scope("softmax"):
self.w_soft = tf.get_variable("w", [self.lstm_size, self.num_branches])
b_init = np.array([10.0, 10.0] + [0] * (self.num_branches - 2),
dtype=np.float32)
self.b_soft = tf.get_variable(
"b", [1, self.num_branches],
initializer=tf.constant_initializer(b_init))
b_soft_no_learn = np.array(
[0.25, 0.25] + [-0.25] * (self.num_branches - 2), dtype=np.float32)
b_soft_no_learn = np.reshape(b_soft_no_learn, [1, self.num_branches])
self.b_soft_no_learn = tf.constant(b_soft_no_learn, dtype=tf.float32)
So the controller is biased (both in initialization and permanently) to choose depthwise separable convolution.
Is that correct? Was this an important addition for the results (because this isn't mentioned in the paper).
I notice that layer base is used in the final stage but no in the search stage. Can you explain why you use this? Thanks very much.
enas/src/cifar10/micro_child.py
Line 416 in d1a90ac
Hi, could you briefly explain the meaning of:
"child_out_filters", "child_num_branches", "child_num_cell_layers", and "child_keep_prob"
I am fairly new to ML and am trying to figure out how to properly train the outputted fixed architectures.
Thank you so much! This really is a wonderful tool.
I am attempting to run the macro search on the cifar10 dataset and am getting the error:
"Executor failed to create kernel. Invalid argument: Default AvgPoolingOp only supports NHWC."
...
...
...
"InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.
[[Node: child/layer_3/pool_at_3/from_0/AvgPool = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task$"
Any Ideas? The cifar10 data is in its original format. I am running this on GCP currently, but got the error on my local VM as well. I am using tensorflow 1.4.
Hi @melodyguan, thanks for the great paper!
But I still can't reproduce results from your paper in finding the RNN cell on PTB dataset.
After approx. 24 hours of training (~22 epochs), the best validation ppl is still 400 (and it's also very unstable, ranging from 1000 to 400 between epochs) and training ppl is around 250, which isn't even close to 55.8 as in your paper. The code and data were taken as is and not modified.
Prior to that, I also had similar problems with reproducing these results with https://github.com/carpedm20/ENAS-pytorch code.
Are there some problems with hyperparameters selection or maybe some bugs in a code?
The docstring for cifar10.micro_child._factorized_reduction
says
"""Reduces the shape of x without information loss due to striding."""
Could you explain what that means?
When stride=2
,
path1 = tf.nn.avg_pool(x, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)
and
path2 = tf.nn.avg_pool(path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)
each select 1/4 of the spatial locations, so you end up ignoring half of the spatial locations (specifically, any (i,j) where i % 2 != j % 2
). Is that right?
~ Ben
now i know that the model result is just a sequence number,if i want to use the model to classify or to visualize the network architecture,do you have the tool?
Build train graph
Tensor("child/layer_0/case/cond/Merge:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_1/skip/bn/Identity:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_2/skip/bn/Identity:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_3/pool_at_3/from_4/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_4/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_5/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_6/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_7/pool_at_7/from_8/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_8/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_9/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_10/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_11/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Model has 697860 params
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 171, in get_ops
child_model.connect_controller(controller_model)
File "/home/nikhil/google_enas/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/nikhil/google_enas/src/cifar10/general_child.py", line 633, in _build_train
num_replicas=self.num_replicas)
File "/home/nikhil/google_enas/src/utils.py", line 125, in get_train_ops
grads = tf.gradients(loss, tf_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 641, in gradients
(op.name, i, t_in.shape, in_grad.shape))
ValueError: Incompatible shapes between op input and calculated input gradient. Forward operation: child/layer_11/case/cond/cond/cond/cond/cond/cond/Merge. Input index: 0. Original input shape: (). Calculated input gradient shape: (?, 36, 8, 8)
Have you tried testing the performance of the normal/reduction cell sampled from cifar10 in ImageNet?
In your code , when the controller has been trained, you samples 10 architectures and they are all different. In my understanding, at this phase, the controller is fixed, and the inputs are all the same, then why can we get different sample_arcs?
I noticed that in your controller, the prev_c and prev_h never changed, are zeros all the time. For the LSTM, only the chosen action's embedding is feed into the next step as inputs.
Is it intended or a bug?
If it is intended, is it necessary to use a 2 layer lstm rather than a simple dense layer with tanh activation? Because output of the cell is only related to the chosen embedding, not related to the whole process of the decisions.
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./src/utils.py:34:11: E999 SyntaxError: invalid syntax
print "-" * 80
^
./src/cifar10/controller.py:39:13: E999 SyntaxError: invalid syntax
print "-" * 80
^
./src/cifar10/data_utils.py:17:19: E999 SyntaxError: invalid syntax
print file_name
^
./src/cifar10/general_child.py:510:74: F821 undefined name 'ch_mul'
"w_depth", [self.filter_size, self.filter_size, out_filters, ch_mul])
^
./src/cifar10/general_child.py:511:67: F821 undefined name 'ch_mul'
w_point = create_weight("w_point", [1, 1, out_filters * ch_mul, count])
^
./src/cifar10/general_child.py:582:51: F821 undefined name 'avg_or_pool'
raise ValueError("Unknown pool {}".format(avg_or_pool))
^
./src/cifar10/general_controller.py:46:13: E999 SyntaxError: invalid syntax
print "-" * 80
^
./src/cifar10/models.py:45:13: E999 SyntaxError: invalid syntax
print "-" * 80
^
./src/ptb/main.py:310:30: F821 undefined name 'xrange'
for ct_step in xrange(FLAGS.controller_train_steps *
^
./src/ptb/main.py:339:24: F821 undefined name 'xrange'
for _ in xrange(10):
^
./src/ptb/ptb_enas_child.py:170:44: F821 undefined name 'eval_set'
print("{}_total_loss: {:<6.2f}".format(eval_set, total_loss))
^
./src/ptb/ptb_enas_child.py:171:41: F821 undefined name 'eval_set'
print("{}_log_ppl: {:<6.2f}".format(eval_set, log_ppl))
^
./src/ptb/ptb_enas_child.py:172:37: F821 undefined name 'eval_set'
print("{}_ppl: {:<6.2f}".format(eval_set, ppl))
^
./src/ptb/ptb_enas_controller.py:74:25: F821 undefined name 'xrange'
for layer_id in xrange(self.lstm_num_layers):
^
./src/ptb/ptb_enas_controller.py:104:14: F821 undefined name 'xrange'
for _ in xrange(self.lstm_num_layers):
^
./src/ptb/ptb_enas_controller.py:109:21: F821 undefined name 'xrange'
for layer_id in xrange(self.rhn_depth):
^
./src/ptb/ptb_enas_controller.py:217:47: F821 undefined name 'critic_train_op'
self.train_op = tf.group(self.train_op, critic_train_op)
^
5 E999 SyntaxError: invalid syntax
12 F821 undefined name 'ch_mul'
17
How does the child_fixed_arc are created on the final scripts
Is that use the controller trained on the search phase to output the child_fixed_arc?
I am attempting to run the cifar10 macro search on a set of images that I have converted into the same format as for cifar10. During the child training phase, the loss and |g| are always "nan," because the scripts are expecting only 10 classes, while I am using many more. Does anyone know where in the cifar10 scripts I can specify the number of classes?
When I used an architecture different from fixed_arc in cifar10_macro_final.sh, I got the same error as @ShenghaiRong and @zeus7777777 mentioned in Question about output value (perhaps due to pooling operation is not implemented). Furthermore, it seems number 1 and 3 (corresponding to separable_conv_3x3 and separable_conv_5x5) are implemented as normal convolution operations in _fixed_layer function not separable_conv2d in _enas_layer function.
Is the _fixed_layer function still not completed? Or do I misunderstand?
Hello,
Thanks for open sourcing the code.
After your commit:
2734eb2
I get 63.26 in ppl and not the 55.6 stated in the paper. However before this commit I get 55.6. Is there something I am missing?
Thanks
Hello,
I've been experiencing difficulties trying to get tensorflow-gpu 1.4 to work with my setup. Which OS, CUDA toolkit version, and CUDNN version are you using? I've attempted to do this on GCP as well as my local machines.
If we run the three experiments from the README:
# Exp. 1
./scripts/ptb_search.sh
./scripts/ptb_final.sh
# Exp. 2
./scripts/cifar10_macro_search.sh
./scripts/cifar10_macro_final.sh
# Exp 3.
./scripts/cifar10_micro_search.sh
./scripts/cifar10_micro_final.sh
what should we expect the final performance metrics to be? Are you able to post the expected results either here or in the README?
Thanks
Hi,
When to try to run: /scripts/cifar10_macro_search.sh. I got the below exception:
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/home//.envs/py27/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 169, in get_ops
num_replicas=FLAGS.controller_num_replicas)
File "/home//Downloads/enas/src/cifar10/general_controller.py", line 81, in init
self._build_sampler()
File "/home/***/Downloads/enas/src/cifar10/general_controller.py", line 163, in _build_sampler
raise ValueError("Unknown search_for {}".format(self.search_for))
ValueError: Unknown search_for macro
I am with Ubuntu 14.04, Python 2.7 and Tensorflow 1.3.0.
I am attempting to use an architecture from cifar10_macro_search and am getting this error. Please advise.
Build train graph
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/home/zachary_swartz/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 190, in get_ops
child_model.connect_controller(None)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 595, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 212, in _model
x = self._fixed_layer(layer_id, layers, start_idx, out_filters, is_training)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 481, in _fixed_layer
return out
UnboundLocalError: local variable 'out' referenced before assignment
Hi,
I am wondering how the training procedure is, as in your code, once controller is connected to child_model, it confuses me.
May I interpret like pseudocode below?
for epoch in range(epoches):
arc_seq = controller.sample_arc_seq()
child_model.use_arc(arc_seq)
for train_data in train_data_set:
loss = train(child_model, train_data)
for step in range(ctr_steps):
arc_seq = controller.sample_arc_seq()
child_model.use_arc(arc_seq)
accuracy = eval(child_model, eval_data_set)
controller.update(accuracy, arc_seq)
Or it should be something like this:
for epoch in range(epoches):
for train_data in train_data_set:
arc_seq = controller.sample_arc_seq()
child_model.use_arc(arc_seq)
loss = train(child_model, train_data)
for step in range(ctr_steps):
arc_seq = controller.sample_arc_seq()
child_model.use_arc(arc_seq)
accuracy = eval(child_model, eval_data_set)
controller.update(accuracy, arc_seq)
May I know which one is correct?
Best,
zmonoid
Thank you for your work.
After I tried some experiments,I met met some questions:
For ciafr10,there are two search space and corresponding experiments:
----./scripts/cifar10_macro_search.sh ./scripts/cifar10_macro_final.sh
----./scripts/cifar10_micro_search.sh ./scripts/cifar10_micro_final.sh
1> ----I want to confirm the difference between the architectures produced by cifar10_macro_search.sh and
cifar10_micro_search.sh. (one is )
2> I found the architectures produced by cifar10_macro_search.sh like
[1]
[1 1]
[5 0 0]
[5 0 0 0]
[0 0 1 1 0]
[1 1 0 0 0 0]
[1 1 0 1 1 1 0]
[3 0 0 1 0 1 1 1]
[5 0 0 1 0 0 1 0 0]
[1 1 1 0 0 0 0 1 0 0]
[0 1 1 0 0 0 0 1 1 1 1]
[0 0 1 1 1 1 0 1 0 0 1 1], which has 12 cells,while in cifar10_macro_final.sh,the architecture is
fixed_arc="0"
fixed_arc="$fixed_arc 3 0"
fixed_arc="$fixed_arc 0 1 0"
fixed_arc="$fixed_arc 2 0 0 1"
fixed_arc="$fixed_arc 2 0 0 0 0"
fixed_arc="$fixed_arc 3 1 1 0 1 0"
fixed_arc="$fixed_arc 2 0 0 0 0 0 1"
fixed_arc="$fixed_arc 2 0 1 1 0 1 1 1"
fixed_arc="$fixed_arc 1 0 1 1 1 0 1 0 1"
fixed_arc="$fixed_arc 0 0 0 0 0 0 0 0 0 0"
fixed_arc="$fixed_arc 2 0 0 0 0 0 1 0 0 0 0"
fixed_arc="$fixed_arc 0 1 0 0 1 1 0 0 0 0 1 1"
fixed_arc="$fixed_arc 2 0 1 0 0 0 0 0 1 0 1 1 0"
fixed_arc="$fixed_arc 1 0 0 1 0 0 0 1 1 1 0 1 0 1"
fixed_arc="$fixed_arc 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0"
fixed_arc="$fixed_arc 2 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1"
fixed_arc="$fixed_arc 2 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0"
fixed_arc="$fixed_arc 2 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1"
fixed_arc="$fixed_arc 3 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0"
fixed_arc="$fixed_arc 3 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1"
fixed_arc="$fixed_arc 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0"
fixed_arc="$fixed_arc 3 0 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0"
fixed_arc="$fixed_arc 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0"
fixed_arc="$fixed_arc 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 0 0", which has 24 cells.
So I want to konw where to get architecture like the one in cifar10_macro_final.sh.
Thanks for you response.
@hyhieu Hi,thank you for you work,and I have worked out the question how to get 24 cells with you help last time .Now,I have another two questions.First,how to run this code with 2 or more GPUs on one computer? Second , which file are the architectures from the "./scripts/cifar10_micro_search.sh" file saved in and would you support a method to visualize them?
the information after I run the ./scripts/cifar10_micro_search.sh file as following:
output:
checkpoint
model.ckpt-108769.data-00000-of-00001 model.ckpt-109120.data-00000-of-00001
stdout
events.out.tfevents.1523973322.sugon-W580-G20 model.ckpt-108769.index
model.ckpt-109120.index
graph.pbtxt
model.ckpt-108769.meta
model.ckpt-109120.meta
@hyhieu Hi,Thank you for all you work.
Now,I want to take place of the cifar10 dataset with cifar100 to implement this code to find and tune CNN cells and architectures. But I can't find the interface between dataset and network. Would you please give some advice? Thank you very much.
I tried running micro search on TF 1.7 and it made quite a bit of progress, up to 150 epochs, but then it failed out as follows:
[1 2 1 1 1 3 0 2 2 0 1 1 1 1 1 4 1 4 1 4]
val_acc=0.7750
--------------------------------------------------------------------------------
[0 0 1 0 0 4 0 1 0 4 1 1 1 4 0 1 0 1 5 2]
[0 1 1 0 1 1 1 0 1 2 1 3 1 0 3 3 1 0 2 4]
val_acc=0.6813
--------------------------------------------------------------------------------
[0 1 1 0 0 0 0 0 0 0 1 1 4 0 0 0 0 0 1 1]
[1 0 1 2 1 1 1 1 1 0 1 3 3 0 2 0 1 0 1 1]
val_acc=0.7312
--------------------------------------------------------------------------------
[0 1 0 4 0 0 0 2 1 0 1 3 1 0 3 0 1 1 1 1]
[1 0 1 0 1 1 1 1 1 4 1 1 1 1 1 0 3 4 1 4]
val_acc=0.7188
--------------------------------------------------------------------------------
[0 0 0 2 1 0 1 0 1 4 0 3 0 1 1 0 0 1 4 2]
[0 4 1 1 1 4 1 1 1 1 1 0 1 0 1 2 1 1 1 2]
val_acc=0.7250
--------------------------------------------------------------------------------
Epoch 150: Eval
Eval at 42300
valid_accuracy: 0.6946
Eval at 42300
test_accuracy: 0.6842
Exception in thread QueueRunnerThread-dummy_queue-sync_token_q_EnqueueMany:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 268, in _run
coord.request_stop(e)
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 213, in request_stop
six.reraise(*sys.exc_info())
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
CancelledError: TakeGrad operation was cancelled
[[Node: sync_replicas/AccumulatorTakeGradient = AccumulatorTakeGradient[_class=["loc:@sync_replicas/conditional_accumulator"], dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](sync_replicas/conditional_accumulator, sync_replicas/AccumulatorTakeGradient/num_required)]]
[[Node: sync_replicas/AccumulatorTakeGradient_2/_16859 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_93_sync_replicas/AccumulatorTakeGradient_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
I didn't take any steps to cancel it like hitting ctrl+c so I'm not sure why this is occurring.
Hi, thanks for your work. I find that the controller will sample normal_arc and reduce_arc for every child step in your code, is that right? Can you explain why? Do you try to use a sampled normal_arc and reduce_arc to train child model for a few steps, and then sampling a new normal_arc and reduce_arc?
Aftering running cifar10_micro_search.sh, I got a bunch of architecures and their accuracy, then i selected one architeture with a relatively higher accuracy and retrained it from scratch to get a high accuracy, That is what we should do, right?
The question is that i found that even i choose a low accuracy architecture, or a random architecture, i can still get a high accuracy after retraining it from scratch, they are all about 96%. there seems to be little even no difference between different architectures,Did anyone meet the same question as me? or that is because there are some things i did wrong?
In the final phase where you choose the best architecture based on their reward, the reward of ptb and cifar10 is set to be c/ppl + (entropy term) and accuracy, respectively. Why did you use entropy term for arc selection of ptb and not for that of cifar10?
Hi --
Are you able to explain a little how you selected the various hyperparameters? I see that they are a) different from those mentioned in the paper and b) different between the *_search
and *_final
stages.
~ Ben
if self.fixed_arc is None:
x = self._factorized_reduction(x, out_filters, 2, is_training)
layers = [layers[-1], x] # <<<<<<< HERE
x = self._enas_layer(
layer_id, layers, self.reduce_arc, out_filters)
else:
x = self._fixed_layer(
layer_id, layers, self.reduce_arc, out_filters, 2, is_training,
normal_or_reduction_cell="reduction")
On line 3, this gives layers=layers[-1], _factorised_reduction(layers[-1])]
. Doesn't this make this inconsistent with the fixed cell?
start_idx = self.num_branches
is not correct. When searching partial channels, start_idx should also be 0 since in the controller there is no seqs inserted into the arc_seq/self.sample_arcThank you for your code.
enas/src/cifar10/micro_controller.py
Line 178 in 2734eb2
This line of code is supposed to decide the type of op (e.g. conv 3x3 or 5x5). But the shape of self.w_soft, self.b_soft is [self.lstm_size, self.num_branches] and [1, self.num_branches].
I think they should be [self.lstm_size, self.num_type_ops] and [1, self.num_type_ops] ?
Why do you use a auxiliary heads? Does that affect the performance much?
Thank you for open your work.
Now I tried some experiments.
I tried the search part for rnn cell, then searched rnn cell retrained.
However, I can't reach the your paper result. It is very low performance.
My result is above..
cell : 0 0 0 0 2 0 0 0 0 0 0 3 2 4 0 0 1 7 0 5 0 10 0 -> test ppl : 64.71
cell : 0 0 3 0 1 0 3 0 3 0 1 0 3 0 3 7 0 6 0 9 0 10 0 -> test ppl : 69.66
i checked your fixed architecture. it reaches the 55.8 ppl.
cell : 0 0 0 1 1 2 1 2 0 2 0 5 1 1 0 6 1 8 1 8 1 8 1 -> test ppl : 55.8 ppl
Why is different result between the your searched cell and my searched cell ??
In final training, How adjust the hyper-parameters??
Please briefly describe the dependencies of this project.
Sorry that this isn't actually an issue with the code, but just a question which I'm unable to figure out by reading the paper and code.
I'm trying to understand how the parameter sharing works in ENAS. The first two questions are there partially to answer the third main question.
These aren't issues with the code here as much as questions I had after reading your paper; feel free to close this issue if you don't want to discuss such things here.
In section 2.1 you say
for each pair of nodes
$j < \ell$ , there is an independent parameter matrix$W^{(h)}_{\ell, j}$ .
but then in section 2.3 it says
As for recurrent cells, each operation at each layer in our ENAS convolutional network has a distinct set of parameters.
(emphasis added). Just to be clear - from the RNN section, it seemed that there was only one weight matrix per pair of nodes, but there's actually one per activation function per pair of nodes?
In the RNNs, is node 1 the only node that can access
In the CNN section, from Figure 3, node 4 (the conv 5x5 layer) should take as input the outputs of nodes 1 and 3. However, it seems to take the concatenation of nodes 1, 2, and 3. Am I misunderstanding how selecting the nodes works? For ease, here's the relevant figure:
A clarification about the training procedure to see if I understand correctly: you sample just one model from the controller (e.g. one RNN), which you then train on one pass through the training data, and finally you do some number (e.g. 2000) of update steps on the controller using REINFORCE? You're just updating based on the performance of the single model trained, so wouldn't each of these steps be the same (so taking 2000 of them is like taking one step that's 2000 times as large)? Am I missing something about the controller update part?
If you could help clear some confusion about any or all of these points, I'd appreciate it! Overall, certainly a good paper, and thanks for providing the code as well!
Hello Mr
Does this source support to generate RNN, LSTM or CNN-LSTM(LRCN)?
when fix_arc count is larger than 3,
File "src/cifar10/main.py", line 361, in
tf.app.run()
File "/data1/winterhuang/huangweidong/tools/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "src/cifar10/main.py", line 357, in main
train()
File "src/cifar10/main.py", line 225, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 190, in get_ops
child_model.connect_controller(None)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 708, in connect_controller
self._build_train()
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 598, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 212, in _model
x = self._fixed_layer(layer_id, layers, start_idx, out_filters, is_training)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 468, in _fixed_layer
prev = res_layers + [out]
UnboundLocalError: local variable 'out' referenced before assignment
Caused by op u'child/layer_3/pool_at_3/from_0/AvgPool_1', defined at:
File "main.py", line 359, in
tf.app.run()
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 355, in main
train()
File "main.py", line 223, in train
ops = get_ops(images, labels)
File "main.py", line 171, in get_ops
child_model.connect_controller(controller_model)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 595, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 222, in _model
layer, out_filters, 2, is_training)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 165, in _factorized_reduction
path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1930, in avg_pool
name=name)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 68, in _avg_pool
data_format=data_format, name=name)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.
[[Node: child/layer_3/pool_at_3/from_0/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.