melodyguan / enas Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 391.0 3.89 MB

TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"

Home Page: https://arxiv.org/abs/1802.03268

License: Apache License 2.0

Shell 4.10% Python 95.90%

enas's People

Contributors

Stargazers

Watchers

Forkers

synpon snazz2001 swachalit nguyenducnhaty vhcg77 nikdnaik cclauss ricklentz poppingtonic wonglkd quark0 fireae gyunt ankitnamdeo34 zxt881108 shujian2015 shubhampachori12110095 hargovindarora hyhieu jprothero denethor1997 skyjiao kamilbs zhouyonglong ericxsun alexzhou1995 xshhhm yeshwanthv5 duangenquan ankdesh grandsmile yanghui15 suleisl2000 lukw00heck rclover mingukkang ahundt stanstarks keyky amdcat kim-hyeoncheol jooooh oppa3109 angelajiang hubert0527 medivhna abelardliu freedomtan sanwushuosi dennistang742 markjunior deppmeng kevinleestone haifeng-jin kcyu2014 codegank charlie13 throneclay wangshy31 medicjue asafarevich erwin00776 roeemz hwangtamu neighthan zhaoyingjun wanjinchang karankatiyar92 kurnianggoro chenxingqiang studydeeplearningai dapatil211 hmi88 renyi533 agdyangkang countif gavinzjchao cequencer tremblerz cuptea vincentcent1 beautymess chicm-ms chomolungma todun hangjie720 currylym syx528911137 realitytracer vvvictorlee xinyi2016 ericzgw ebgaaron nanaakwasiabayieboateng fmercury shadowkun aimwts zhaohuiguo jlertle afcarl

enas's Issues

Question about output value

Thank you for providing us with the code.
I'm running the cifar10_macro_search.sh
The code, data, hyper parameter were taken as is and not modified.
tail of cifar10_macro_search.sh(in the stdout file) looks like :

[2]
[3 0]
[0 1 0]
[3 1 0 1]
[0 0 0 0 0]
[0 1 0 1 0 1]
[2 1 1 0 1 0 0]
[1 1 0 0 1 0 0 0]
[1 0 0 0 0 1 1 0 0]
[2 0 0 1 0 0 0 0 0 0]
[1 0 0 0 0 1 0 0 1 1 0]
[5 0 1 0 0 0 1 0 0 0 1 0]
val_acc=0.7734

[2]
[1 0]
[1 0 0]
[5 0 0 1]
[0 0 1 0 0]
[2 0 1 0 0 0]
[2 1 1 0 0 0 1]
[3 0 0 0 1 0 1 1]
[3 0 1 0 0 0 1 0 1]
[5 1 0 0 0 0 0 0 0 0]
[0 1 0 1 1 0 0 0 1 0 1]
[1 1 0 0 0 1 1 1 0 1 1 1]
val_acc=0.8672

[4]
[4 0]
[2 0 1]
[0 0 0 1]
[0 0 0 0 0]
[0 0 0 0 1 1]
[0 0 0 0 0 0 0]
[5 1 0 0 0 0 1 0]
[5 1 0 0 0 0 0 0 1]
[5 0 0 1 0 1 1 0 0 0]
[1 1 1 1 0 1 0 0 0 0 0]
[0 0 0 1 1 0 0 0 1 0 0 1]
val_acc=0.7422

[0]
[4 1]
[1 1 0]
[1 0 0 0]
[5 0 1 1 0]
[2 1 0 0 1 0]
[0 0 0 1 1 1 1]
[5 0 1 1 1 0 0 0]
[4 0 0 0 0 0 0 1 0]
[4 0 0 0 1 0 0 0 0 1]
[5 1 0 0 0 1 0 0 1 0 0]
[0 0 0 0 0 0 1 0 0 0 0 0]
val_acc=0.8516

Epoch 310: Eval
Eval at 109120
valid_accuracy: 0.8068
Eval at 109120
test_accuracy: 0.7946

Now I think I'm supposed to take the architecture the best validation, which is:
[2]
[1 0]
[1 0 0]
[5 0 0 1]
[0 0 1 0 0]
[2 0 1 0 0 0]
[2 1 1 0 0 0 1]
[3 0 0 0 1 0 1 1]
[3 0 1 0 0 0 1 0 1]
[5 1 0 0 0 0 0 0 0 0]
[0 1 0 1 1 0 0 0 1 0 1]
[1 1 0 0 0 1 1 1 0 1 1 1]
val_acc=0.8672

Is that right? Is this result the end of the discovered network(optimal one) using macro search?
Validation accuracy of selected architecture is 86.72%. This result is lower than 96.1%.
What's the reason? Do I have to modify parameters ( num_epochs, batch_size,...)?

micro_search (non-final) throwing errors in slicing when using pooling branches

Hi,
I was able to reproduce most of your paper's experiments with minor modifications to the code such as changing the hyper-parameters and implementing pooling functions in the macro_search_final.

So thank you for open-sourcing this code!

In the case of the macro search I can run both search and final_search with any architecture. For the micro search the final version with your given architecture also seems to work.
However if I try to conduct the micro search on CIFAR10 I get the following errors after graph construction when attempting to run the session with more than 4 branches, i.e. including pooling operations:

[[Node: child/layer_0/cell_0/x/strided_slice_3 = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=30, ellipsis_mask=0, end_mask=30, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child/layer_0/cell_0/x/stack, child/layer_7/cell_0/x/strided_slice_3/stack, child/layer_7/cell_0/x/strided_slice_3/stack_1, child/layer_7/cell_4/x/strided_slice_2/stack_2)]]
2018-06-20 14:21:34.518579: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: slice index 5 of dimension 0 out of bounds.
	 [[Node: child/layer_0/cell_0/x/strided_slice_3 = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=30, ellipsis_mask=0, end_mask=30, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child/layer_0/cell_0/x/stack, child/layer_7/cell_0/x/strided_slice_3/stack, child/layer_7/cell_0/x/strided_slice_3/stack_1, child/layer_7/cell_4/x/strided_slice_2/stack_2)]]

followed by the error message of the tf-graph crashing.
I downgraded my TF version to 1.4, as well as using Python 2.7 on Ubuntu 16.04 as suggested in one of the closed issues.

I think I roughly get where this is pointing to in the code: enas_cell of the micro child where there seems to be some indexing mismatch. Given that none of the tensorflow variable scopes give a name of "strided_slice" I am unable to pinpoint and thus fix the problem though. (If I understand correctly the strided_slice could also just be the graphs version of extended slicing like [:,0:5] etc.?)

I would greatly appreciate any help in fixing this. If you wish I can send a PR with all the other changes once fixed as well.

drop path explanation

Hi --

Thanks again for releasing the code. Are you able to explain a little more what _apply_drop_path is doing in cifar10/micro_child.py, and what the motivation for this regularization is?

Thanks
Ben

Index error while running cifar10 script

We got the error while running the cifar10_micro_final.sh with TensorFlow 1.6.

bash scripts/cifar10_micro_final.sh

......


Build train graph
Layer  0: Tensor("child/layer_0/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer  1: Tensor("child/layer_1/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer  2: Tensor("child/layer_2/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer  3: Tensor("child/layer_3/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Layer  4: Tensor("child/layer_4/final_combine/concat:0", shape=(?, 180, 32, 32), dtype=float32)
Traceback (most recent call last):
  File "src/cifar10/main.py", line 359, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "src/cifar10/main.py", line 355, in main
    train()
  File "src/cifar10/main.py", line 223, in train
    ops = get_ops(images, labels)
  File "src/cifar10/main.py", line 190, in get_ops
    child_model.connect_controller(None)
  File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 819, in connect_controller
    self._build_train()
  File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 702, in _build_train
    logits = self._model(self.x_train, is_training=True)
  File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 283, in _model
    normal_or_reduction_cell="reduction")
  File "/Users/tobe/code/enas/src/cifar10/micro_child.py", line 430, in _fixed_layer
    x_id = arc[4 * cell_id]
IndexError: index 0 is out of bounds for axis 0 with size 0

A question about micro child model

In the following line, why use layers = [layers[-1], x] instead of layers = [layers[0], x]?

enas/src/cifar10/micro_child.py

Line 277 in d1a90ac

layers = [layers[-1], x]

In my opinion, if use your implementation, the model in the search stage is very different from that in the final stage.

Quenstion about output value 2

Thank you for your answer.
I'm running the cifar10_macro_final.sh.
tail of cifar10_macro_final.sh(in the stdout file) looks like :

epoch=306 ch_step=153050 loss=0.001949 lr=0.0011 |g|=0.2311 tr_acc=100/100 mins=2119.42
epoch=306 ch_step=153100 loss=0.001135 lr=0.0011 |g|=0.0480 tr_acc=100/100 mins=2120.06
epoch=306 ch_step=153150 loss=0.001102 lr=0.0011 |g|=0.0725 tr_acc=100/100 mins=2120.71
epoch=306 ch_step=153200 loss=0.001061 lr=0.0011 |g|=0.0932 tr_acc=100/100 mins=2121.35
epoch=306 ch_step=153250 loss=0.001357 lr=0.0011 |g|=0.0557 tr_acc=100/100 mins=2122.00
epoch=306 ch_step=153300 loss=0.001437 lr=0.0011 |g|=0.1014 tr_acc=100/100 mins=2122.64
epoch=306 ch_step=153350 loss=0.001134 lr=0.0011 |g|=0.0374 tr_acc=100/100 mins=2123.29
epoch=306 ch_step=153400 loss=0.001238 lr=0.0011 |g|=0.0612 tr_acc=100/100 mins=2123.93
epoch=306 ch_step=153450 loss=0.001358 lr=0.0011 |g|=0.0592 tr_acc=100/100 mins=2124.58
epoch=307 ch_step=153500 loss=0.000936 lr=0.0011 |g|=0.0476 tr_acc=100/100 mins=2125.22
Epoch 307: Eval
Eval at 153500
test_accuracy: 0.9601
epoch=307 ch_step=153550 loss=0.001563 lr=0.0010 |g|=0.1156 tr_acc=100/100 mins=2126.32
epoch=307 ch_step=153600 loss=0.002924 lr=0.0010 |g|=0.4933 tr_acc=100/100 mins=2126.97
epoch=307 ch_step=153650 loss=0.001787 lr=0.0010 |g|=0.1428 tr_acc=100/100 mins=2127.61
epoch=307 ch_step=153700 loss=0.001629 lr=0.0010 |g|=0.1128 tr_acc=100/100 mins=2128.26
epoch=307 ch_step=153750 loss=0.001239 lr=0.0010 |g|=0.1101 tr_acc=100/100 mins=2128.90
epoch=307 ch_step=153800 loss=0.001421 lr=0.0010 |g|=0.0812 tr_acc=100/100 mins=2129.55
epoch=307 ch_step=153850 loss=0.001244 lr=0.0010 |g|=0.0784 tr_acc=100/100 mins=2130.19
epoch=307 ch_step=153900 loss=0.001778 lr=0.0010 |g|=0.1270 tr_acc=100/100 mins=2130.84
epoch=307 ch_step=153950 loss=0.001900 lr=0.0010 |g|=0.1715 tr_acc=100/100 mins=2131.48
epoch=308 ch_step=154000 loss=0.001303 lr=0.0010 |g|=0.0811 tr_acc=100/100 mins=2132.13
Epoch 308: Eval
Eval at 154000
test_accuracy: 0.9605
epoch=308 ch_step=154050 loss=0.008866 lr=0.0010 |g|=1.3467 tr_acc=100/100 mins=2133.23
epoch=308 ch_step=154100 loss=0.001046 lr=0.0010 |g|=0.0328 tr_acc=100/100 mins=2133.88
epoch=308 ch_step=154150 loss=0.001344 lr=0.0010 |g|=0.0558 tr_acc=100/100 mins=2134.52
epoch=308 ch_step=154200 loss=0.001324 lr=0.0010 |g|=0.0485 tr_acc=100/100 mins=2135.17
epoch=308 ch_step=154250 loss=0.001197 lr=0.0010 |g|=0.0587 tr_acc=100/100 mins=2135.81
epoch=308 ch_step=154300 loss=0.001323 lr=0.0010 |g|=0.0478 tr_acc=100/100 mins=2136.46
epoch=308 ch_step=154350 loss=0.000928 lr=0.0010 |g|=0.0559 tr_acc=100/100 mins=2137.10
epoch=308 ch_step=154400 loss=0.000729 lr=0.0010 |g|=0.0184 tr_acc=100/100 mins=2137.75
epoch=308 ch_step=154450 loss=0.001168 lr=0.0010 |g|=0.0731 tr_acc=100/100 mins=2138.39
epoch=309 ch_step=154500 loss=0.000755 lr=0.0010 |g|=0.0169 tr_acc=100/100 mins=2139.04
Epoch 309: Eval
Eval at 154500
test_accuracy: 0.9589
epoch=309 ch_step=154550 loss=0.000859 lr=0.0010 |g|=0.0329 tr_acc=100/100 mins=2140.14
epoch=309 ch_step=154600 loss=0.003031 lr=0.0010 |g|=1.0015 tr_acc=100/100 mins=2140.79
epoch=309 ch_step=154650 loss=0.001678 lr=0.0010 |g|=0.2013 tr_acc=100/100 mins=2141.44
epoch=309 ch_step=154700 loss=0.000810 lr=0.0010 |g|=0.0335 tr_acc=100/100 mins=2142.08
epoch=309 ch_step=154750 loss=0.001312 lr=0.0010 |g|=0.1542 tr_acc=100/100 mins=2142.73
epoch=309 ch_step=154800 loss=0.001046 lr=0.0010 |g|=0.0383 tr_acc=100/100 mins=2143.37
epoch=309 ch_step=154850 loss=0.001397 lr=0.0010 |g|=0.1267 tr_acc=100/100 mins=2144.02
epoch=309 ch_step=154900 loss=0.001565 lr=0.0010 |g|=0.0715 tr_acc=100/100 mins=2144.66
epoch=309 ch_step=154950 loss=0.001706 lr=0.0010 |g|=0.1621 tr_acc=100/100 mins=2145.31
epoch=310 ch_step=155000 loss=0.001614 lr=0.0010 |g|=0.0683 tr_acc=100/100 mins=2145.95
Epoch 310: Eval
Eval at 155000
test_accuracy: 0.9602

Now I think I'm supposed to outputs has the highest test accuracy at epoch 308, which is:

Epoch 308: Eval
Eval at 154000
test_accuracy: 0.9605

Is this outputs the test accuracy which is applied test data with the optimal parameters(set in macro_final.sh) and architecture you found?

Where are the classification results stored for the test data?

Applying Other Images (256*256)

Thanks to your kind answers, I'm trying to apply source code to other images.
I am trying to apply a 256 * 256 image with two classes by modifying the source code micro search related to cifar10 . After center cropping image to 128 size, I setted batchsize = 10 , num_layer = 8 and reduction cell = 2,4,6,8. However, I do not think that learning is not good because loss does not fall. Is there any other solution to apply something.
Should I adjust my batch size or learning rate?


 epoch=4     ch_step=4850   loss=0.584814 lr=0.0329   |g|=1.3465   tr_acc=8  / 10 mins=260.01
epoch=4     ch_step=4900   loss=0.382023 lr=0.0329   |g|=0.7714   tr_acc=9  / 10 mins=262.52
epoch=4     ch_step=4950   loss=0.266939 lr=0.0329   |g|=1.2139   tr_acc=10 / 10 mins=265.03
epoch=4     ch_step=5000   loss=0.685643 lr=0.0329   |g|=1.0374   tr_acc=7  / 10 mins=267.54
epoch=4     ch_step=5050   loss=0.754228 lr=0.0329   |g|=2.7833   tr_acc=5  / 10 mins=270.06
epoch=4     ch_step=5100   loss=0.688855 lr=0.0329   |g|=3.4854   tr_acc=5  / 10 mins=272.57
epoch=4     ch_step=5150   loss=0.648872 lr=0.0329   |g|=2.3838   tr_acc=7  / 10 mins=275.07
epoch=4     ch_step=5200   loss=0.744257 lr=0.0329   |g|=1.2733   tr_acc=3  / 10 mins=277.59
epoch=4     ch_step=5250   loss=0.706121 lr=0.0329   |g|=0.9046   tr_acc=4  / 10 mins=280.10
epoch=4     ch_step=5300   loss=0.472144 lr=0.0329   |g|=0.9058   tr_acc=10 / 10 mins=282.60
epoch=4     ch_step=5350   loss=0.726746 lr=0.0329   |g|=10.5671  tr_acc=5  / 10 mins=285.10
epoch=4     ch_step=5400   loss=0.563478 lr=0.0329   |g|=1.7984   tr_acc=7  / 10 mins=287.61
epoch=4     ch_step=5450   loss=0.496238 lr=0.0329   |g|=3.2853   tr_acc=9  / 10 mins=290.12
epoch=4     ch_step=5500   loss=0.651356 lr=0.0329   |g|=1.9665   tr_acc=7  / 10 mins=292.62
epoch=4     ch_step=5550   loss=0.666979 lr=0.0329   |g|=1.9109   tr_acc=7  / 10 mins=295.13
epoch=4     ch_step=5600   loss=0.634621 lr=0.0329   |g|=2.5509   tr_acc=7  / 10 mins=297.63
epoch=4     ch_step=5650   loss=0.769902 lr=0.0329   |g|=1.3555   tr_acc=6  / 10 mins=300.15
epoch=4     ch_step=5700   loss=0.387144 lr=0.0329   |g|=1.3194   tr_acc=9  / 10 mins=302.69
epoch=4     ch_step=5750   loss=0.436951 lr=0.0329   |g|=1.1916   tr_acc=9  / 10 mins=305.22
epoch=4     ch_step=5800   loss=0.544716 lr=0.0329   |g|=3.0641   tr_acc=8  / 10 mins=307.76
epoch=4     ch_step=5850   loss=0.489396 lr=0.0329   |g|=0.8843   tr_acc=8  / 10 mins=310.29
epoch=4     ch_step=5900   loss=0.389174 lr=0.0329   |g|=3.2995   tr_acc=9  / 10 mins=312.81
epoch=4     ch_step=5950   loss=0.372266 lr=0.0329   |g|=1.1757   tr_acc=9  / 10 mins=315.35
epoch=5     ch_step=6000   loss=0.338371 lr=0.0329   |g|=14.8412  tr_acc=9  / 10 mins=317.88
Epoch 5: Training controller
ctrl_step=200    loss=18.138  ent=57.21 lr=0.0035 |g|=0.4447   acc=0.7000 bl=0.38  mins=317.88
ctrl_step=205    loss=-0.033  ent=57.21 lr=0.0035 |g|=0.0006   acc=0.5000 bl=0.51  mins=318.07
ctrl_step=210    loss=8.164   ent=57.20 lr=0.0035 |g|=0.1327   acc=0.7000 bl=0.56  mins=318.26
ctrl_step=215    loss=-3.504  ent=57.20 lr=0.0035 |g|=0.0650   acc=0.5000 bl=0.57  mins=318.45
ctrl_step=220    loss=19.004  ent=57.20 lr=0.0035 |g|=0.3491   acc=0.9000 bl=0.59  mins=318.64
ctrl_step=225    loss=17.364  ent=57.21 lr=0.0035 |g|=0.3164   acc=0.9000 bl=0.59  mins=318.83
ctrl_step=230    loss=-0.830  ent=57.21 lr=0.0035 |g|=0.0080   acc=0.6000 bl=0.62  mins=319.02
ctrl_step=235    loss=9.372   ent=57.21 lr=0.0035 |g|=0.2280   acc=0.8000 bl=0.64  mins=319.21
ctrl_step=240    loss=-12.632 ent=57.21 lr=0.0035 |g|=0.3495   acc=0.4000 bl=0.63  mins=319.40
ctrl_step=245    loss=17.016  ent=57.22 lr=0.0035 |g|=0.2418   acc=0.9000 bl=0.60  mins=319.59
Here are 10 architectures
[0 3 0 0 0 1 0 0 2 0 2 3 2 0 3 1 0 1 0 1]
[1 0 0 2 2 4 0 2 2 1 0 0 0 3 3 3 0 0 3 4]
val_acc=0.6000
--------------------------------------------------------------------------------
[1 3 1 0 0 1 2 3 0 1 2 3 4 4 3 3 5 0 1 0]
[0 1 1 0 0 4 1 2 2 3 3 2 1 1 4 0 2 0 3 1]
val_acc=0.3000
--------------------------------------------------------------------------------
[1 1 1 1 2 0 1 0 2 1 3 1 0 0 0 0 2 1 5 0]
[1 0 0 4 0 2 0 0 1 4 0 0 4 3 1 1 5 0 2 0]
val_acc=0.6000
--------------------------------------------------------------------------------
[0 0 1 4 1 4 2 4 2 0 3 0 1 0 3 2 1 1 2 0]
[0 2 0 1 2 0 1 1 3 2 3 2 4 0 2 4 0 4 1 2]
val_acc=0.5000
--------------------------------------------------------------------------------
[0 1 1 1 1 1 1 1 3 1 2 0 1 0 3 0 3 1 3 0]
[0 3 0 0 2 1 1 0 0 0 1 1 2 4 1 1 2 2 0 1]
val_acc=0.6000
--------------------------------------------------------------------------------
[0 2 0 1 1 2 1 0 3 0 0 1 2 3 0 1 2 0 4 2]
[0 1 0 2 2 1 0 1 1 0 0 0 0 2 4 3 2 4 5 2]
val_acc=0.6000
--------------------------------------------------------------------------------
[1 1 1 3 0 3 2 2 1 2 3 1 1 0 0 1 1 0 3 2]
[1 2 1 0 1 0 1 4 1 1 0 0 4 1 3 0 3 1 2 2]
val_acc=0.5000
--------------------------------------------------------------------------------
[1 2 1 3 0 1 2 0 2 1 0 2 0 0 4 0 3 4 4 0]
[1 3 1 0 0 1 1 3 2 4 3 4 4 4 4 0 3 2 2 1]
val_acc=0.5000
--------------------------------------------------------------------------------
[0 0 0 1 2 3 0 0 1 0 0 0 3 1 1 0 1 1 5 1]
[1 4 1 0 0 1 2 4 3 3 0 1 2 3 4 3 3 1 3 4]
val_acc=0.4000
--------------------------------------------------------------------------------
[0 0 0 0 1 4 0 3 0 1 1 0 4 1 4 0 0 4 3 1]
[0 0 0 1 2 3 1 3 2 3 2 0 0 4 1 4 1 4 5 4]
val_acc=0.5000
--------------------------------------------------------------------------------
Epoch 5: Eval
Eval at 6000
[[1589    0]
 [1411    0]]
Total Accuracy: 52.97 %

Difference between `_fixed_layer` and `_enas_layer` in `cifar10/micro_child.py`

There are a number of differences between _fixed_layer and _enas_layer in cifar10/micro_child.py.

Are you able to give some insight on why the code works like this? It seems that when a fixed architecture is specified, the resulting model is not necessarily exactly the same as during the RL training. It seems to me like the easiest way to fix the child architecture is to have an alternate "dummy controller", that just keeps normal_arc and reduce_arc fixed at the desired architecture.

Thanks
Ben

variable assigned error in grads assignment when the system perform clip_mode == "norm"

directory: src/utils.py
error par: from line 141 to 151.
when the system perform clip_mode == "norm", the original variable "g" is assigned to "grads", not the wanted variable "c_g". This may be a bug. Please check it.

Why does MicroController bias for branches 0 and 1?

In this part of the code:

        logits = tf.matmul(next_h[-1], self.w_soft) + self.b_soft
        if self.temperature is not None:
          logits /= self.temperature
        if self.tanh_constant is not None:
          op_tanh = self.tanh_constant / self.op_tanh_reduce
          logits = op_tanh * tf.tanh(logits)
        if use_bias:
          logits += self.b_soft_no_learn

b_soft and b_soft_no_learn are initialised as:

      with tf.variable_scope("softmax"):
        self.w_soft = tf.get_variable("w", [self.lstm_size, self.num_branches])
        b_init = np.array([10.0, 10.0] + [0] * (self.num_branches - 2),
                          dtype=np.float32)
        self.b_soft = tf.get_variable(
          "b", [1, self.num_branches],
          initializer=tf.constant_initializer(b_init))

        b_soft_no_learn = np.array(
          [0.25, 0.25] + [-0.25] * (self.num_branches - 2), dtype=np.float32)
        b_soft_no_learn = np.reshape(b_soft_no_learn, [1, self.num_branches])
        self.b_soft_no_learn = tf.constant(b_soft_no_learn, dtype=tf.float32)

So the controller is biased (both in initialization and permanently) to choose depthwise separable convolution.

Is that correct? Was this an important addition for the results (because this isn't mentioned in the paper).

Why use layer base?

I notice that layer base is used in the final stage but no in the search stage. Can you explain why you use this? Thanks very much.

enas/src/cifar10/micro_child.py

Line 416 in d1a90ac

with tf.variable_scope("layer_base"):

Child Hyper Parameters

Hi, could you briefly explain the meaning of:
"child_out_filters", "child_num_branches", "child_num_cell_layers", and "child_keep_prob"

I am fairly new to ML and am trying to figure out how to properly train the outputted fixed architectures.

Thank you so much! This really is a wonderful tool.

Executor failed to create kernel

I am attempting to run the macro search on the cifar10 dataset and am getting the error:
"Executor failed to create kernel. Invalid argument: Default AvgPoolingOp only supports NHWC."
...
...
...
"InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.
[[Node: child/layer_3/pool_at_3/from_0/AvgPool = AvgPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task$"

Any Ideas? The cifar10 data is in its original format. I am running this on GCP currently, but got the error on my local VM as well. I am using tensorflow 1.4.

Reproducibility of the results from the paper (RNN)

Hi @melodyguan, thanks for the great paper!

But I still can't reproduce results from your paper in finding the RNN cell on PTB dataset.

After approx. 24 hours of training (~22 epochs), the best validation ppl is still 400 (and it's also very unstable, ranging from 1000 to 400 between epochs) and training ppl is around 250, which isn't even close to 55.8 as in your paper. The code and data were taken as is and not modified.

Prior to that, I also had similar problems with reproducing these results with https://github.com/carpedm20/ENAS-pytorch code.

Are there some problems with hyperparameters selection or maybe some bugs in a code?

Explanation of `cifar10.micro_child._factorized_reduction`

The docstring for cifar10.micro_child._factorized_reduction says

"""Reduces the shape of x without information loss due to striding."""

Could you explain what that means?

When stride=2,

path1 = tf.nn.avg_pool(x, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)

and

path2 = tf.nn.avg_pool(path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)

each select 1/4 of the spatial locations, so you end up ignoring half of the spatial locations (specifically, any (i,j) where i % 2 != j % 2). Is that right?

~ Ben

how to get the result and visualize

now i know that the model result is just a sequence number,if i want to use the model to classify or to visualize the network architecture,do you have the tool?

ValueError: Incompatible shapes between op input and calculated input gradient.

While running the "cifar10_macro_search.sh" script, I get the following error. Is it related to the tensorflow version? I am using 1.6.0.

Build train graph
Tensor("child/layer_0/case/cond/Merge:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_1/skip/bn/Identity:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_2/skip/bn/Identity:0", shape=(?, 36, 32, 32), dtype=float32)
Tensor("child/layer_3/pool_at_3/from_4/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_4/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_5/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_6/skip/bn/Identity:0", shape=(?, 36, 16, 16), dtype=float32)
Tensor("child/layer_7/pool_at_7/from_8/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_8/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_9/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_10/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Tensor("child/layer_11/skip/bn/Identity:0", shape=(?, 36, 8, 8), dtype=float32)
Model has 697860 params
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 171, in get_ops
child_model.connect_controller(controller_model)
File "/home/nikhil/google_enas/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/nikhil/google_enas/src/cifar10/general_child.py", line 633, in _build_train
num_replicas=self.num_replicas)
File "/home/nikhil/google_enas/src/utils.py", line 125, in get_train_ops
grads = tf.gradients(loss, tf_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 641, in gradients
(op.name, i, t_in.shape, in_grad.shape))
ValueError: Incompatible shapes between op input and calculated input gradient. Forward operation: child/layer_11/case/cond/cond/cond/cond/cond/cond/Merge. Input index: 0. Original input shape: (). Calculated input gradient shape: (?, 36, 8, 8)

enas for ImageNet

Have you tried testing the performance of the normal/reduction cell sampled from cifar10 in ImageNet?

Why can we get different sample_arcs with a fixed controller?

In your code , when the controller has been trained, you samples 10 architectures and they are all different. In my understanding, at this phase, the controller is fixed, and the inputs are all the same, then why can we get different sample_arcs?

Question about controller detail

I noticed that in your controller, the prev_c and prev_h never changed, are zeros all the time. For the LSTM, only the chosen action's embedding is feed into the next step as inputs.
Is it intended or a bug?
If it is intended, is it necessary to use a 2 layer lstm rather than a simple dense layer with tanh activation? Because output of the cell is only related to the chosen embedding, not related to the whole process of the decisions.

undefined name 'ch_mul'

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./src/utils.py:34:11: E999 SyntaxError: invalid syntax
  print "-" * 80
          ^
./src/cifar10/controller.py:39:13: E999 SyntaxError: invalid syntax
    print "-" * 80
            ^
./src/cifar10/data_utils.py:17:19: E999 SyntaxError: invalid syntax
    print file_name
                  ^
./src/cifar10/general_child.py:510:74: F821 undefined name 'ch_mul'
            "w_depth", [self.filter_size, self.filter_size, out_filters, ch_mul])
                                                                         ^
./src/cifar10/general_child.py:511:67: F821 undefined name 'ch_mul'
          w_point = create_weight("w_point", [1, 1, out_filters * ch_mul, count])
                                                                  ^
./src/cifar10/general_child.py:582:51: F821 undefined name 'avg_or_pool'
        raise ValueError("Unknown pool {}".format(avg_or_pool))
                                                  ^
./src/cifar10/general_controller.py:46:13: E999 SyntaxError: invalid syntax
    print "-" * 80
            ^
./src/cifar10/models.py:45:13: E999 SyntaxError: invalid syntax
    print "-" * 80
            ^
./src/ptb/main.py:310:30: F821 undefined name 'xrange'
              for ct_step in xrange(FLAGS.controller_train_steps *
                             ^
./src/ptb/main.py:339:24: F821 undefined name 'xrange'
              for _ in xrange(10):
                       ^
./src/ptb/ptb_enas_child.py:170:44: F821 undefined name 'eval_set'
    print("{}_total_loss: {:<6.2f}".format(eval_set, total_loss))
                                           ^
./src/ptb/ptb_enas_child.py:171:41: F821 undefined name 'eval_set'
    print("{}_log_ppl: {:<6.2f}".format(eval_set, log_ppl))
                                        ^
./src/ptb/ptb_enas_child.py:172:37: F821 undefined name 'eval_set'
    print("{}_ppl: {:<6.2f}".format(eval_set, ppl))
                                    ^
./src/ptb/ptb_enas_controller.py:74:25: F821 undefined name 'xrange'
        for layer_id in xrange(self.lstm_num_layers):
                        ^
./src/ptb/ptb_enas_controller.py:104:14: F821 undefined name 'xrange'
    for _ in xrange(self.lstm_num_layers):
             ^
./src/ptb/ptb_enas_controller.py:109:21: F821 undefined name 'xrange'
    for layer_id in xrange(self.rhn_depth):
                    ^
./src/ptb/ptb_enas_controller.py:217:47: F821 undefined name 'critic_train_op'
      self.train_op = tf.group(self.train_op, critic_train_op)
                                              ^
5     E999 SyntaxError: invalid syntax
12    F821 undefined name 'ch_mul'
17

How does the child_fixed_arc are created?

How does the child_fixed_arc are created on the final scripts
Is that use the controller trained on the search phase to output the child_fixed_arc?

loss = nan |g| = nan

I am attempting to run the cifar10 macro search on a set of images that I have converted into the same format as for cifar10. During the child training phase, the loss and |g| are always "nan," because the scripts are expecting only 10 classes, while I am using many more. Does anyone know where in the cifar10 scripts I can specify the number of classes?

Question about _fixed_layer function in general_child.py

When I used an architecture different from fixed_arc in cifar10_macro_final.sh, I got the same error as @ShenghaiRong and @zeus7777777 mentioned in Question about output value (perhaps due to pooling operation is not implemented). Furthermore, it seems number 1 and 3 (corresponding to separable_conv_3x3 and separable_conv_5x5) are implemented as normal convolution operations in _fixed_layer function not separable_conv2d in _enas_layer function.
Is the _fixed_layer function still not completed? Or do I misunderstand?

RNN results not reproducible

Hello,
Thanks for open sourcing the code.
After your commit:
2734eb2

I get 63.26 in ppl and not the 55.6 stated in the paper. However before this commit I get 55.6. Is there something I am missing?

Thanks

Compatibility Issues

Hello,

I've been experiencing difficulties trying to get tensorflow-gpu 1.4 to work with my setup. Which OS, CUDA toolkit version, and CUDNN version are you using? I've attempted to do this on GCP as well as my local machines.

Expected performance

If we run the three experiments from the README:

# Exp. 1
./scripts/ptb_search.sh
./scripts/ptb_final.sh

# Exp. 2
./scripts/cifar10_macro_search.sh
./scripts/cifar10_macro_final.sh

# Exp 3.
./scripts/cifar10_micro_search.sh
./scripts/cifar10_micro_final.sh

what should we expect the final performance metrics to be? Are you able to post the expected results either here or in the README?

Thanks

ValueError: Unknown search_for macro

Hi,

When to try to run: /scripts/cifar10_macro_search.sh. I got the below exception:

Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/home//.envs/py27/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 169, in get_ops
num_replicas=FLAGS.controller_num_replicas)
File "/home//Downloads/enas/src/cifar10/general_controller.py", line 81, in init
self._build_sampler()
File "/home/***/Downloads/enas/src/cifar10/general_controller.py", line 163, in _build_sampler
raise ValueError("Unknown search_for {}".format(self.search_for))
ValueError: Unknown search_for macro

I am with Ubuntu 14.04, Python 2.7 and Tensorflow 1.3.0.

UnboundLocalError

I am attempting to use an architecture from cifar10_macro_search and am getting this error. Please advise.

Path outputs exists. Remove and remake.

Logging to outputs/stdout

batch_size...................................................................100
child_block_size...............................................................3
child_cutout_size...........................................................None
child_drop_path_keep_prob....................................................1.0
child_filter_size..............................................................5
child_fixed_arc5 2 1 0 0 1 0 0 0 1 3 1 0 1 1 4 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 5 0 0 1 0 0 1 0 0 4 0 0 0 1 0 1 0 1 0 4 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 1
child_grad_bound.............................................................5.0
child_keep_prob..............................................................0.5
child_l2_reg..............................................................0.0002
child_lr.....................................................................0.1
child_lr_T_0..................................................................10
child_lr_T_mul.................................................................2
child_lr_cosine.............................................................True
child_lr_dec_every...........................................................100
child_lr_dec_rate............................................................0.1
child_lr_max................................................................0.05
child_lr_min...............................................................0.001
child_num_aggregate.........................................................None
child_num_branches.............................................................4
child_num_cells................................................................5
child_num_layers..............................................................12
child_num_replicas.............................................................1
child_out_filters.............................................................48
child_out_filters_scale........................................................1
child_skip_pattern..........................................................None
child_sync_replicas........................................................False
child_use_aux_heads.........................................................True
controller_bl_dec...........................................................0.99
controller_entropy_weight.................................................0.0001
controller_forwards_limit......................................................2
controller_keep_prob.........................................................0.5
controller_l2_reg............................................................0.0
controller_lr..............................................................0.001
controller_lr_dec_rate.......................................................1.0
controller_num_aggregate......................................................20
controller_num_replicas........................................................1
controller_op_tanh_reduce....................................................2.5
controller_search_whole_channels............................................True
controller_skip_target.......................................................0.4
controller_skip_weight.......................................................0.8
controller_sync_replicas....................................................True
controller_tanh_constant.....................................................1.5
controller_temperature......................................................None
controller_train_every.........................................................1
controller_train_steps........................................................50
controller_training........................................................False
controller_use_critic......................................................False
data_format.................................................................NCHW
data_path...........................................................data/cifar10
eval_every_epochs..............................................................1
log_every.....................................................................50
num_epochs...................................................................310
output_dir...............................................................outputs
reset_output_dir............................................................True
search_for.................................................................macro

Reading data
data_batch_1
data_batch_2
data_batch_3
data_batch_4
data_batch_5
test_batch
Prepropcess: [subtract mean], [divide std]
mean: [125.30723 122.95053 113.86535]
std: [62.993362 62.088478 66.70482 ]

Build model child
Build data ops

Build train graph
Traceback (most recent call last):
File "src/cifar10/main.py", line 359, in
tf.app.run()
File "/home/zachary_swartz/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/cifar10/main.py", line 355, in main
train()
File "src/cifar10/main.py", line 223, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 190, in get_ops
child_model.connect_controller(None)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 595, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 212, in _model
x = self._fixed_layer(layer_id, layers, start_idx, out_filters, is_training)
File "/home/zachary_swartz/enas/src/cifar10/general_child.py", line 481, in _fixed_layer
return out
UnboundLocalError: local variable 'out' referenced before assignment

About training procedure

Hi,

I am wondering how the training procedure is, as in your code, once controller is connected to child_model, it confuses me.

May I interpret like pseudocode below?

for epoch in range(epoches):
    arc_seq = controller.sample_arc_seq()
    child_model.use_arc(arc_seq)
    for train_data in train_data_set:
        loss = train(child_model, train_data)
    for step in range(ctr_steps):
        arc_seq = controller.sample_arc_seq()
        child_model.use_arc(arc_seq)
        accuracy = eval(child_model, eval_data_set)
        controller.update(accuracy, arc_seq)

Or it should be something like this:

for epoch in range(epoches):
    for train_data in train_data_set:
        arc_seq = controller.sample_arc_seq()
        child_model.use_arc(arc_seq)
        loss = train(child_model, train_data)
    for step in range(ctr_steps):
        arc_seq = controller.sample_arc_seq()
        child_model.use_arc(arc_seq)
        accuracy = eval(child_model, eval_data_set)
        controller.update(accuracy, arc_seq)

May I know which one is correct?

Best,
zmonoid

Expected performance-2

Thank you for your work.
After I tried some experiments,I met met some questions:
For ciafr10,there are two search space and corresponding experiments：
----./scripts/cifar10_macro_search.sh ./scripts/cifar10_macro_final.sh
----./scripts/cifar10_micro_search.sh ./scripts/cifar10_micro_final.sh
1> ----I want to confirm the difference between the architectures produced by cifar10_macro_search.sh and
cifar10_micro_search.sh. (one is )
2> I found the architectures produced by cifar10_macro_search.sh like
[1]
[1 1]
[5 0 0]
[5 0 0 0]
[0 0 1 1 0]
[1 1 0 0 0 0]
[1 1 0 1 1 1 0]
[3 0 0 1 0 1 1 1]
[5 0 0 1 0 0 1 0 0]
[1 1 1 0 0 0 0 1 0 0]
[0 1 1 0 0 0 0 1 1 1 1]
[0 0 1 1 1 1 0 1 0 0 1 1], which has 12 cells,while in cifar10_macro_final.sh,the architecture is
fixed_arc="0"
fixed_arc="$fixed_arc 3 0"
fixed_arc="$fixed_arc 0 1 0"
fixed_arc="$fixed_arc 2 0 0 1"
fixed_arc="$fixed_arc 2 0 0 0 0"
fixed_arc="$fixed_arc 3 1 1 0 1 0"
fixed_arc="$fixed_arc 2 0 0 0 0 0 1"
fixed_arc="$fixed_arc 2 0 1 1 0 1 1 1"
fixed_arc="$fixed_arc 1 0 1 1 1 0 1 0 1"
fixed_arc="$fixed_arc 0 0 0 0 0 0 0 0 0 0"
fixed_arc="$fixed_arc 2 0 0 0 0 0 1 0 0 0 0"
fixed_arc="$fixed_arc 0 1 0 0 1 1 0 0 0 0 1 1"
fixed_arc="$fixed_arc 2 0 1 0 0 0 0 0 1 0 1 1 0"
fixed_arc="$fixed_arc 1 0 0 1 0 0 0 1 1 1 0 1 0 1"
fixed_arc="$fixed_arc 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0"
fixed_arc="$fixed_arc 2 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1"
fixed_arc="$fixed_arc 2 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0"
fixed_arc="$fixed_arc 2 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1"
fixed_arc="$fixed_arc 3 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0"
fixed_arc="$fixed_arc 3 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1"
fixed_arc="$fixed_arc 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0"
fixed_arc="$fixed_arc 3 0 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0"
fixed_arc="$fixed_arc 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0"
fixed_arc="$fixed_arc 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 0 0", which has 24 cells.
So I want to konw where to get architecture like the one in cifar10_macro_final.sh.
Thanks for you response.

more GPUs and visualization

@hyhieu Hi,thank you for you work,and I have worked out the question how to get 24 cells with you help last time .Now，I have another two questions.First,how to run this code with 2 or more GPUs on one computer? Second , which file are the architectures from the "./scripts/cifar10_micro_search.sh" file saved in and would you support a method to visualize them?

the information after I run the ./scripts/cifar10_micro_search.sh file as following:
output:
checkpoint
model.ckpt-108769.data-00000-of-00001 model.ckpt-109120.data-00000-of-00001
stdout
events.out.tfevents.1523973322.sugon-W580-G20 model.ckpt-108769.index
model.ckpt-109120.index
graph.pbtxt
model.ckpt-108769.meta
model.ckpt-109120.meta

change dataset to cifar100

@hyhieu Hi,Thank you for all you work.
Now,I want to take place of the cifar10 dataset with cifar100 to implement this code to find and tune CNN cells and architectures. But I can't find the interface between dataset and network. Would you please give some advice? Thank you very much.

Errors in attempt at reproducing micro search

I tried running micro search on TF 1.7 and it made quite a bit of progress, up to 150 epochs, but then it failed out as follows:

[1 2 1 1 1 3 0 2 2 0 1 1 1 1 1 4 1 4 1 4]
val_acc=0.7750
--------------------------------------------------------------------------------
[0 0 1 0 0 4 0 1 0 4 1 1 1 4 0 1 0 1 5 2]
[0 1 1 0 1 1 1 0 1 2 1 3 1 0 3 3 1 0 2 4]
val_acc=0.6813
--------------------------------------------------------------------------------
[0 1 1 0 0 0 0 0 0 0 1 1 4 0 0 0 0 0 1 1]
[1 0 1 2 1 1 1 1 1 0 1 3 3 0 2 0 1 0 1 1]
val_acc=0.7312
--------------------------------------------------------------------------------
[0 1 0 4 0 0 0 2 1 0 1 3 1 0 3 0 1 1 1 1]
[1 0 1 0 1 1 1 1 1 4 1 1 1 1 1 0 3 4 1 4]
val_acc=0.7188
--------------------------------------------------------------------------------
[0 0 0 2 1 0 1 0 1 4 0 3 0 1 1 0 0 1 4 2]
[0 4 1 1 1 4 1 1 1 1 1 0 1 0 1 2 1 1 1 2]
val_acc=0.7250
--------------------------------------------------------------------------------
Epoch 150: Eval
Eval at 42300
valid_accuracy: 0.6946
Eval at 42300
test_accuracy: 0.6842
Exception in thread QueueRunnerThread-dummy_queue-sync_token_q_EnqueueMany:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 268, in _run
    coord.request_stop(e)
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 213, in request_stop
    six.reraise(*sys.exc_info())
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
    enqueue_callable()
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
CancelledError: TakeGrad operation was cancelled
         [[Node: sync_replicas/AccumulatorTakeGradient = AccumulatorTakeGradient[_class=["loc:@sync_replicas/conditional_accumulator"], dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](sync_replicas/conditional_accumulator, sync_replicas/AccumulatorTakeGradient/num_required)]]
         [[Node: sync_replicas/AccumulatorTakeGradient_2/_16859 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_93_sync_replicas/AccumulatorTakeGradient_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

I didn't take any steps to cancel it like hitting ctrl+c so I'm not sure why this is occurring.

When does the Controller sample normal_arc and reduce_arc?

Hi, thanks for your work. I find that the controller will sample normal_arc and reduce_arc for every child step in your code, is that right? Can you explain why? Do you try to use a sampled normal_arc and reduce_arc to train child model for a few steps, and then sampling a new normal_arc and reduce_arc?

quesion about selecting architectures

Aftering running cifar10_micro_search.sh, I got a bunch of architecures and their accuracy, then i selected one architeture with a relatively higher accuracy and retrained it from scratch to get a high accuracy, That is what we should do, right?
The question is that i found that even i choose a low accuracy architecture, or a random architecture, i can still get a high accuracy after retraining it from scratch, they are all about 96%. there seems to be little even no difference between different architectures,Did anyone meet the same question as me? or that is because there are some things i did wrong?

Seeming inconsistency between cifar10 and ptb in terms of reward set for arc selection

In the final phase where you choose the best architecture based on their reward, the reward of ptb and cifar10 is set to be c/ppl + (entropy term) and accuracy, respectively. Why did you use entropy term for arc selection of ptb and not for that of cifar10?

Hyperparameter choices

Hi --

Are you able to explain a little how you selected the various hyperparameters? I see that they are a) different from those mentioned in the paper and b) different between the *_search and *_final stages.

~ Ben

Difference between fixed and searched path in reduction cells?

In this code:

            if self.fixed_arc is None:
              x = self._factorized_reduction(x, out_filters, 2, is_training)
              layers = [layers[-1], x]      # <<<<<<< HERE
              x = self._enas_layer(
                layer_id, layers, self.reduce_arc, out_filters)
            else:
              x = self._fixed_layer(
                layer_id, layers, self.reduce_arc, out_filters, 2, is_training,
                normal_or_reduction_cell="reduction")

On line 3, this gives layers=layers[-1], _factorised_reduction(layers[-1])]. Doesn't this make this inconsistent with the fixed cell?

2 bugs for macro_search

in cifar10_macro_search.sh, the "child_num_cell_layers" should be "child_num_cells"
in general_child.py/_model function,
start_idx = self.num_branches is not correct. When searching partial channels, start_idx should also be 0 since in the controller there is no seqs inserted into the arc_seq/self.sample_arc

[Question] In micro_controller, about the shape of params

Thank you for your code.

enas/src/cifar10/micro_controller.py

Line 178 in 2734eb2

logits = tf.matmul(next_h[-1], self.w_soft) + self.b_soft

This line of code is supposed to decide the type of op (e.g. conv 3x3 or 5x5). But the shape of self.w_soft, self.b_soft is [self.lstm_size, self.num_branches] and [1, self.num_branches].

I think they should be [self.lstm_size, self.num_type_ops] and [1, self.num_type_ops] ?

Why using a auxiliary heads?

Why do you use a auxiliary heads? Does that affect the performance much?

Question about new searched rnn cell....

Thank you for open your work.
Now I tried some experiments.
I tried the search part for rnn cell, then searched rnn cell retrained.
However, I can't reach the your paper result. It is very low performance.
My result is above..
cell : 0 0 0 0 2 0 0 0 0 0 0 3 2 4 0 0 1 7 0 5 0 10 0 -> test ppl : 64.71
cell : 0 0 3 0 1 0 3 0 3 0 1 0 3 0 3 7 0 6 0 9 0 10 0 -> test ppl : 69.66
i checked your fixed architecture. it reaches the 55.8 ppl.
cell : 0 0 0 1 1 2 1 2 0 2 0 5 1 1 0 6 1 8 1 8 1 8 1 -> test ppl : 55.8 ppl
Why is different result between the your searched cell and my searched cell ??
In final training, How adjust the hyper-parameters??

Which version of tensorflow?

Please briefly describe the dependencies of this project.

How does parameter sharing work

Sorry that this isn't actually an issue with the code, but just a question which I'm unable to figure out by reading the paper and code.

I'm trying to understand how the parameter sharing works in ENAS. The first two questions are there partially to answer the third main question.

Are all nodes only used ONCE during macro search?
For macro search, will all the nodes definitely link to its previous node? (seems so with inputs=prev_layers[-1])
How are the parameters shared? Does each operations have their own weights, which are always loaded when called (e.g. Conv2D 3x3 has a weight that is applied each time it's called)? If this is the case, then which weight is used to update and memorize during training, assuming multiple instances of the same operation is used.
Or are there weights for each unique connection, e.g. Node1 to Node3 (W13) has one weight set, Node2 to Node3 (W23) has another weight set. If so, then how does it handle cases when there are skip connections (e.g. Node1 and Node2 are concatenated, which are then passed to Node 3. Will it have W12-3?)?

Questions about your paper

These aren't issues with the code here as much as questions I had after reading your paper; feel free to close this issue if you don't want to discuss such things here.

In section 2.1 you say

for each pair of nodes $j < \ell$, there is an independent parameter matrix $W^{(h)}_{\ell, j}$.

but then in section 2.3 it says

As for recurrent cells, each operation at each layer in our ENAS convolutional network has a distinct set of parameters.

(emphasis added). Just to be clear - from the RNN section, it seemed that there was only one weight matrix per pair of nodes, but there's actually one per activation function per pair of nodes?
In the RNNs, is node 1 the only node that can access $x_t, h_{t - 1}$? Or could later nodes output a certain index that corresponds to either of this? (this doesn't happen in any of the shown examples, but I wanted to make sure that this isn't possible)
- EDIT - just looked through the appendix which affirms that only the first node in an RNN cell has access to $x_t, h_{t - 1}$.
In the CNN section, from Figure 3, node 4 (the conv 5x5 layer) should take as input the outputs of nodes 1 and 3. However, it seems to take the concatenation of nodes 1, 2, and 3. Am I misunderstanding how selecting the nodes works? For ease, here's the relevant figure:
A clarification about the training procedure to see if I understand correctly: you sample just one model from the controller (e.g. one RNN), which you then train on one pass through the training data, and finally you do some number (e.g. 2000) of update steps on the controller using REINFORCE? You're just updating based on the performance of the single model trained, so wouldn't each of these steps be the same (so taking 2000 of them is like taking one step that's 2000 times as large)? Am I missing something about the controller update part?

If you could help clear some confusion about any or all of these points, I'd appreciate it! Overall, certainly a good paper, and thanks for providing the code as well!

Generating LSTM and RNN architectures

Hello Mr
Does this source support to generate RNN, LSTM or CNN-LSTM(LRCN)?

use cifar10_macro_final code is running error

when fix_arc count is larger than 3,
File "src/cifar10/main.py", line 361, in
tf.app.run()
File "/data1/winterhuang/huangweidong/tools/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "src/cifar10/main.py", line 357, in main
train()
File "src/cifar10/main.py", line 225, in train
ops = get_ops(images, labels)
File "src/cifar10/main.py", line 190, in get_ops
child_model.connect_controller(None)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 708, in connect_controller
self._build_train()
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 598, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 212, in _model
x = self._fixed_layer(layer_id, layers, start_idx, out_filters, is_training)
File "/data1/winterhuang/automl/enas/src/cifar10/general_child.py", line 468, in _fixed_layer
prev = res_layers + [out]
UnboundLocalError: local variable 'out' referenced before assignment

Output format visualization for macro search

Hello,
If i have fixed_arch like:
1
1 1
1 0 1

Is the following figure visualize fixed_arch correctly?

Thanks.

InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC. how to solve this issue?

Caused by op u'child/layer_3/pool_at_3/from_0/AvgPool_1', defined at:
File "main.py", line 359, in
tf.app.run()
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 355, in main
train()
File "main.py", line 223, in train
ops = get_ops(images, labels)
File "main.py", line 171, in get_ops
child_model.connect_controller(controller_model)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 705, in connect_controller
self._build_train()
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 595, in _build_train
logits = self._model(self.x_train, is_training=True)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 222, in _model
layer, out_filters, 2, is_training)
File "/home/sileihe/enas-master/src/cifar10/general_child.py", line 165, in _factorized_reduction
path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1930, in avg_pool
name=name)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 68, in _avg_pool
data_format=data_format, name=name)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/sileihe/anaconda3/envs/test_py2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Default AvgPoolingOp only supports NHWC.
[[Node: child/layer_3/pool_at_3/from_0/AvgPool_1 = AvgPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

melodyguan / enas Goto Github PK

enas's People

Contributors

Stargazers

Watchers

Forkers

enas's Issues

[2] [3 0] [0 1 0] [3 1 0 1] [0 0 0 0 0] [0 1 0 1 0 1] [2 1 1 0 1 0 0] [1 1 0 0 1 0 0 0] [1 0 0 0 0 1 1 0 0] [2 0 0 1 0 0 0 0 0 0] [1 0 0 0 0 1 0 0 1 1 0] [5 0 1 0 0 0 1 0 0 0 1 0] val_acc=0.7734

[2] [1 0] [1 0 0] [5 0 0 1] [0 0 1 0 0] [2 0 1 0 0 0] [2 1 1 0 0 0 1] [3 0 0 0 1 0 1 1] [3 0 1 0 0 0 1 0 1] [5 1 0 0 0 0 0 0 0 0] [0 1 0 1 1 0 0 0 1 0 1] [1 1 0 0 0 1 1 1 0 1 1 1] val_acc=0.8672

[4] [4 0] [2 0 1] [0 0 0 1] [0 0 0 0 0] [0 0 0 0 1 1] [0 0 0 0 0 0 0] [5 1 0 0 0 0 1 0] [5 1 0 0 0 0 0 0 1] [5 0 0 1 0 1 1 0 0 0] [1 1 1 1 0 1 0 0 0 0 0] [0 0 0 1 1 0 0 0 1 0 0 1] val_acc=0.7422

[0] [4 1] [1 1 0] [1 0 0 0] [5 0 1 1 0] [2 1 0 0 1 0] [0 0 0 1 1 1 1] [5 0 1 1 1 0 0 0] [4 0 0 0 0 0 0 1 0] [4 0 0 0 1 0 0 0 0 1] [5 1 0 0 0 1 0 0 1 0 0] [0 0 0 0 0 0 1 0 0 0 0 0] val_acc=0.8516

Epoch 310: Eval Eval at 109120 valid_accuracy: 0.8068 Eval at 109120 test_accuracy: 0.7946

While running the "cifar10_macro_search.sh" script, I get the following error. Is it related to the tensorflow version? I am using 1.6.0.

Path outputs exists. Remove and remake.

Logging to outputs/stdout

Reading data data_batch_1 data_batch_2 data_batch_3 data_batch_4 data_batch_5 test_batch Prepropcess: [subtract mean], [divide std] mean: [125.30723 122.95053 113.86535] std: [62.993362 62.088478 66.70482 ]

Build model child Build data ops

Recommend Projects

Recommend Topics

Recommend Org

Jobs