yuhuixu1993 / pc-darts Goto Github PK
View Code? Open in Web Editor NEWPC-DARTS:Partial Channel Connections for Memory-Efficient Differentiable Architecture Search
PC-DARTS:Partial Channel Connections for Memory-Efficient Differentiable Architecture Search
I see the code set 0.2, when tests.
@yuhuixu1993 hi, I just noticed that you have dropped the parameter '--unrolled' after 'python train_search.py' command in ReadME. I want to know it's a carefulless missing or you truely did it. After all, the derivation of architecture parameters will be incomplete without '--unrolled'.
Thank you for your release code! I noticed that you replaced "input_search, target_search = next(iter(valid_queue))", why it is much faster? And why the code in "try" is "next(valid_queue_iter)" instead of "next(valid_queue)"? Hope for your reply!
try:
input_search, target_search = next(valid_queue_iter)
except:
valid_queue_iter = iter(valid_queue)
input_search, target_search = next(valid_queue_iter)
"To reduce search time, we randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters." I have two questions:
你们使用的imagenet数据集是ILSVRC2012(大概压缩包有150G)吗?你们基于什么配置在imagenet数据集下训练搜索出来的最佳模型呢?总共花多长时间呢?
When I run train_search.py, it creates many floder start with "experiment dir:search" Only one floder is used to save log. logs in other floders is empty. I can't find the commend to created floder. So how can close it ?
Hi, Thanks for your great work. When I search imagenet on 4 GPUs, it is observed that the GPU utilization is really low (<20 %). How could I fix this?
Thanks in advance.
Hello @yuhuixu1993,
Could you release a script for data preparation "10% and 2.5% images" or filename lists of those data?
Thank you so much.
"PC-DARTS allows a direct search on ImageNet (while DARTS failed due to low
stability), and achieves a state-of-the-art top-1 error of 24.2% (under the mobile setting) with only
3.8 GPU-days (11.5 hours) on 8 GPUs for search" the sentence means what?
if use a single gpu, it is 3.8GPUS-days ; if use 8 gpus, it is 11.5 hours ?
The following description is extracted from Section 4.2:
we freeze network hyper-parameters and
only allow network parameters to be tuned in the first 15 epochs
I guess the authors wanted to say the parameters are allowed to be tuned only after the first 15 epochs, which is consistent with the codes.
Does anybody run the codes on CIFAR10?
My valid_acc on CIFAR10 is only 97.06
I just run
python train.py --auxiliary --cutout
and set the batch_size to be 128 (default 96)
so, what the problem with my experiment? thanks
@yuhuixu1993 Hi, thanks for your paper and project. I have studied Darts series for a while, and was confused about skip_connection issue(Skip_connection appears more and more during the searching process). I found that in your paper(PC-darts), you use 'consistent' explain the advantage of weights-free ops. How to understand the 'consistent' you talked, does it mean that the output across iterations are consistent? And how does this consistency bring advantages, does it have a certain impact on the direction of gradient descent during backpropagation?
Looking forward to your explanation, Sincerly
@yuhuixu1993 thanks for you last timely reply,I now run search model in ImageNet,I use 100% ImageNet (batch size=1024 and 8 V100),and 2 day pass,In epoch 5 the logs shows tran_acc 3.305580,is it right?and i have another question,I saw in you paper " Still, a total of 50 epochs are trained and architecture hyper-parameters are frozen during the first 35 epochs." I am a little coufuse about this step.
hi,I want to know train how many epoch can get the 2.5% error in cifar10 ?
thank you vary much!
Hi:
I have 8 GPUs in one computer. if I use model = nn.DataParallel(model) in train_serach.py,it can't works.How to train pc-dart by using 8 gpus?
hi, @yuhuixu1993, thank you for sharing your amazing work.
I have tried your code to search a network on cifar10, and I followed the instruction ``python train_search.py'' to run the code; however, the search procedure was ended up with CUDA out of memory error, i.e.,
09/01 01:55:07 PM train 000 2.426532e-01 90.234375 100.000000
09/01 01:57:19 PM train 050 2.439179e-01 91.590074 99.862132
09/01 01:59:23 PM train_acc 91.776000
09/01 01:59:23 PM epoch 49 lr 1.000000e-03
09/01 01:59:23 PM genotype = Genotype(normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], normal_concat=range(2, 6), reduce=[('skip_connect', 1), ('max_pool_3x3', 0), ('sep_conv_5x5', 1), ('dil_conv_5x5', 0), ('sep_conv_5x5', 1), ('sep_conv_5x5', 2), ('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], reduce_concat=range(2, 6))
tensor([[0.1646, 0.1040, 0.1085, 0.1185, 0.1612, 0.1110, 0.1120, 0.1202],
[0.1498, 0.1149, 0.1231, 0.1378, 0.0881, 0.1234, 0.1322, 0.1307],
[0.1124, 0.1017, 0.1039, 0.1098, 0.1700, 0.1085, 0.1513, 0.1423],
[0.1205, 0.1086, 0.1074, 0.1125, 0.1670, 0.1348, 0.1133, 0.1358],
[0.1754, 0.0947, 0.0893, 0.1235, 0.1288, 0.1079, 0.1647, 0.1156],
[0.1414, 0.0955, 0.0984, 0.1059, 0.1967, 0.1468, 0.0984, 0.1169],
[0.1193, 0.1157, 0.1068, 0.1123, 0.2055, 0.1308, 0.1107, 0.0988],
[0.1533, 0.1183, 0.1268, 0.1451, 0.1239, 0.1101, 0.1156, 0.1068],
[0.1498, 0.1063, 0.1073, 0.1328, 0.1361, 0.1421, 0.1167, 0.1089],
[0.1132, 0.1281, 0.1156, 0.1162, 0.1608, 0.1445, 0.1159, 0.1056],
[0.1024, 0.1174, 0.1393, 0.1294, 0.1359, 0.1527, 0.1220, 0.1010],
[0.1306, 0.1090, 0.1147, 0.1288, 0.1520, 0.1055, 0.1442, 0.1152],
[0.1462, 0.0942, 0.0924, 0.1288, 0.1084, 0.1127, 0.1749, 0.1425],
[0.1416, 0.0991, 0.0974, 0.1282, 0.1396, 0.1392, 0.1244, 0.1306]],
device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([[0.1148, 0.1495, 0.1392, 0.1084, 0.1359, 0.1406, 0.1165, 0.0951],
[0.1327, 0.1076, 0.0993, 0.1602, 0.1379, 0.1272, 0.1087, 0.1264],
[0.1164, 0.1317, 0.1378, 0.1009, 0.1293, 0.1330, 0.1105, 0.1405],
[0.1259, 0.1005, 0.1057, 0.1288, 0.1212, 0.1532, 0.1372, 0.1273],
[0.1387, 0.0891, 0.0919, 0.1215, 0.1347, 0.1005, 0.1860, 0.1375],
[0.1181, 0.1241, 0.1199, 0.1367, 0.1470, 0.1228, 0.1066, 0.1249],
[0.1344, 0.1070, 0.1094, 0.1165, 0.1187, 0.1590, 0.1319, 0.1233],
[0.1372, 0.1035, 0.1067, 0.1308, 0.1076, 0.1623, 0.1355, 0.1164],
[0.1400, 0.1138, 0.1198, 0.1409, 0.1042, 0.1349, 0.1158, 0.1307],
[0.1309, 0.1175, 0.1291, 0.1270, 0.1265, 0.1102, 0.1504, 0.1083],
[0.1337, 0.1022, 0.1076, 0.1221, 0.1369, 0.1550, 0.1146, 0.1279],
[0.1339, 0.0890, 0.0939, 0.1212, 0.1117, 0.1830, 0.1362, 0.1313],
[0.1431, 0.1034, 0.1148, 0.1350, 0.1092, 0.1270, 0.1375, 0.1300],
[0.1396, 0.0920, 0.0998, 0.1267, 0.1404, 0.0974, 0.1373, 0.1669]],
device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([0.3777, 0.3653, 0.2571], device='cuda:1', grad_fn=<SoftmaxBackward>)
09/01 01:59:27 PM train 000 2.647865e-01 89.843750 100.000000
09/01 02:01:39 PM train 050 2.318434e-01 91.980699 99.892770
09/01 02:03:43 PM train_acc 91.660000
09/01 02:03:43 PM valid 000 5.367675e-01 83.593750 99.218750
Traceback (most recent call last):
File "train_search.py", line 206, in <module>
main()
File "train_search.py", line 130, in main
valid_acc, valid_obj = infer(valid_queue, model, criterion)
File "train_search.py", line 190, in infer
logits = model(input)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/model_search.py", line 159, in forward
s0, s1 = s1, cell(s0, s1, weights,weights2)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in forward
s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in <genexpr>
s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/model_search.py", line 44, in forward
temp1 = sum(w * op(xtemp) for w, op in zip(weights, self._ops))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 15.29 GiB already allocated; 15.56 MiB free; 7.17 MiB cached)
Then, I continued to run the command ``python train.py --auxiliary --cutout'', and the training of the searched model also raised the OOM error, i.e.,
➜ pc-darts python train.py --auxiliary --cutout --gpu 1
Experiment dir : eval-EXP-20190901-143341
09/01 02:33:41 PM gpu device = 1
09/01 02:33:41 PM args = Namespace(arch='PCDARTS', auxiliary=True, auxiliary_weight=0.4, batch_size=96, cutout=True, cutout_length=16, data='../data', drop_path_prob=0.3, epochs=600, gpu=1, grad_clip=5, init_channels=36, layers=20, learning_rate=0.025, model_path='saved_models', momentum=0.9, report_freq=50, save='eval-EXP-20190901-143341', seed=0, set='cifar10', weight_decay=0.0003)
108 108 36
108 144 36
144 144 36
144 144 36
144 144 36
144 144 36
144 144 72
144 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 144
288 576 144
576 576 144
576 576 144
576 576 144
576 576 144
576 576 144
09/01 02:33:44 PM param size = 3.634678MB
Files already downloaded and verified
Files already downloaded and verified
/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
09/01 02:33:46 PM epoch 0 lr 2.499983e-02
train.py:136: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
nn.utils.clip_grad_norm(model.parameters(), args.grad_clip)
09/01 02:33:48 PM train 000 3.258214e+00 8.333333 50.000000
09/01 02:34:18 PM train 050 3.215875e+00 13.623365 56.638069
09/01 02:34:49 PM train 100 3.148910e+00 15.459983 61.984321
09/01 02:35:19 PM train 150 3.054110e+00 18.329194 67.335814
09/01 02:35:49 PM train 200 2.970589e+00 20.677860 71.035445
09/01 02:36:19 PM train 250 2.899201e+00 22.705012 73.725927
09/01 02:36:50 PM train 300 2.842999e+00 24.228266 75.633303
09/01 02:37:20 PM train 350 2.789153e+00 25.741927 77.148620
09/01 02:37:50 PM train 400 2.736966e+00 27.153469 78.514648
09/01 02:38:20 PM train 450 2.694277e+00 28.466832 79.557000
09/01 02:38:50 PM train 500 2.656195e+00 29.663588 80.561790
09/01 02:39:03 PM train_acc 30.111999
09/01 02:39:03 PM valid 000 1.350200e+00 52.083332 91.666664
Traceback (most recent call last):
File "train.py", line 177, in <module>
main()
File "train.py", line 113, in main
valid_acc, valid_obj = infer(valid_queue, model, criterion)
File "train.py", line 161, in infer
logits, _ = model(input)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/model.py", line 150, in forward
s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/model.py", line 51, in forward
h1 = op1(h1)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/ganji/Documents/work/pc-darts/operations.py", line 66, in forward
return self.op(x)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 343, in forward
return self.conv2d_forward(input, self.weight)
File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 340, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 14.26 GiB already allocated; 1.56 MiB free; 1.05 GiB cached)
In addition, my environment is ``Ubuntu 16.04 + CUDA 10.0 + Python 3.7 + PyTorch 1.2''.
Since I am new to NAS, I cannot figure out what causes the OOM error. Could you help fix this error or give some suggestions? Thanks.
这是很有效的方法,看了整个框架后,我有这样两个疑问:
1、Genotype是怎么进行更新的:在model_search.py里,实现了生成新的Genotype,那下一步的迭代是如何使用这个新的Genotype呢?
2、每次只对1/4的X进行运算,剩下的3/4直接传递下去了,这是否可以看做是skip_connection呢?
Thanks for your work.
I was wondering how to save the network that have been found by PCDARTS. I see that At the beigining, of the training, utils.create_exp_dir create a directory with all py scripts. However, it seems to me that the genotype of the new model is not saved.
Hi @yuhuixu1993 , Could you please make the pretrained imagenet model public?
Hi @yuhuixu1993,
It seems the channel_shuffle function evenly redistributes the channels composed of those pass through mixed operations (the first 1/K) and those do not (the later 1-1/K) in a determined pattern. Although it can produce much more complex results in later nodes as more channel_shuffle functions contribute, there is no randomness involved. Is it possible for the network to get used to the pattern and lose the regularization effect eventually. Why didn't you apply random sampling? Thank you!
def channel_shuffle(x, groups):
batchsize, num_channels, height, width = x.data.size()
channels_per_group = num_channels // groups
# reshape
x = x.view(batchsize, groups,
channels_per_group, height, width)
x = torch.transpose(x, 1, 2).contiguous()
# flatten
x = x.view(batchsize, -1, height, width)
return x
In my toy example:
x = torch.Tensor([ [ [ [1,1,1], [1,1,1], [1,1,1] ],
[ [2,2,2], [2,2,2], [2,2,2] ],
[ [3,3,3], [3,3,3], [3,3,3] ],
[ [4,4,4], [4,4,4], [4,4,4] ],
[ [5,5,5], [5,5,5], [5,5,5] ],
[ [6,6,6], [6,6,6], [6,6,6] ],
[ [7,7,7], [7,7,7], [7,7,7] ],
[ [8,8,8], [8,8,8], [8,8,8] ] ] ])
the output of channel_shuffle(x, 4) is always
tensor([[[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]],
[[5., 5., 5.],
[5., 5., 5.],
[5., 5., 5.]],
[[7., 7., 7.],
[7., 7., 7.],
[7., 7., 7.]],
[[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]],
[[4., 4., 4.],
[4., 4., 4.],
[4., 4., 4.]],
[[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.]],
[[8., 8., 8.],
[8., 8., 8.],
[8., 8., 8.]]]])
When search on ImageNet,at seed 0,the network has 1 skip_connect and flops is 645M. At seed 1, the network has 0 skip_connect and flops is 712M. I also test PC_DARTS_image ,the flops is 595M.
I test on different dataset ,it seems the flops can be approximate by 700-50*skip_connect, am I right?Sometimes max_pool and ave_pool show in norm cell, they also reduce 50M. If that's right, all work based on DARTS report their flops below 600M,so most of thier work has 2 or more skip_connect.
But when I search, often get 0 or 1 skip_connect, I wonder if "args.epochs" should be expanded?what epochs is suitable?
hi @yuhuixu1993 thanks for sharing your excellent work. I just wonder the search cost when you search on ImageNet.
PC-DARTS uses 12.5% images of Imagenet for searching in 4.3, so would the search cost be 3.8*8=30.4 GPU-days, if we use 100% images of ImageNet?
thx
Hi Yuhui,
After I search on the CIFAR10 dataset, I get one type of genotype, which is familiar with the reported case in your paper.
However, when I derive the final architecture and calculate the FLOPs and latency, it seems a little strange.
For example, I run
from model import NetworkCIFAR as Network
import genotypes
genotype = genotype = eval("genotypes.%s" % "PCDARTS")
with torch.cuda.device(0):
model = Network(36, 1000, 14, True, genotype)
model.drop_path_prob = 0.3
model.eval()
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print("{:<30} {:<8}".format("Computational complexity: ", flops))
print("{:<30} {:<8}".format("Number of parameters: ", params))
The reported model complexity and number of parameters for the searched genotypes (with 14 layers)are as follows:
Computational complexity: 20.11 GMac
Number of parameters: 4.3 M
But when I run the resnet50 for comparison:
from torchvision.models import resnet50
with torch.cuda.device(0):
model = resnet50(pretrained=False)
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True,
print_per_layer_stat=True)
print('{:<30} {:<8}'.format('Computational complexity: ', flops))
print('{:<30} {:<8}'.format('Number of parameters: ', params))
The reported model complexity and number of parameters for resnet50 are as follows:
Computational complexity: 4.12 GMac
Number of parameters: 25.56 M
The reported FLOPs in your paper on ImageNet setting is 597M. It seems there is something wrong with my derived final architecture. At your convenience, could you help to give clarifications about how to derive the final architecture? I will consider to deploy the searched model on some hardware devices and try to add some hardware-aware constraints for the overall design.
Additionally, the latency for the searched genotype (with 14 layers) is nearly ten times as the resnet50, which is unacceptable.
I am also an undergraduate from SJTU. Really thanks for your help. hahahaha
Thank you for this wonderful code. I have a question. In the paper on ImageNet, you start with three conv layers to reduce the resolution. Why not use cells to reduce the size,
In sec 4.1, you said:
instead choose the first K channels of xi for operation mixture directly. To compensate, after xj is obtained, we shuffle its channels before using it for further computations.
but i think it is not a random sample implement because choosing the first K channels and channel shuffle are all determined operation.
i'm glad that you tell me if there is any misunderstanding of the implementation.
Hello, thank you for your code sharing.
When looking into your code, I have a question about implementation for your partial channel connection idea.
In your code (model_search.py), it seems that "channel_shuffle" function only choose the first quartile of channels (including "forward" function of MixedOp class).
Does it mean that a channel sampling mask S_i,j defined in your paper is a fixed mask?
Please answer my question.
Thank you!
Tesla V100 which requires CUDA_VERSION >= 9000 for optimal performance and fast startup time, but your PyTorch(0.3.1) was compiled with CUDA_VERSION 8000.
Hello, we have sent you several emails before asking about this problem. We first try to use 10% training set for training and 2.5% training set for validation and use a linear learning rate scheduler, with SGDM we train 250 epochs but the result is horrible. Then we try to use the whole ImageNet and get an even much more horrible result. Later we change the optimizer and use the whole ImageNet, now we still have clear accuracy gap. May you kindly give us more details about how you obtained the results in your paper? We really need your response, or we may have to follow our "wrong" configuration, do fair comparisons, and report our real observations to the conferences. Thank you so much!
Hi @yuhuixu1993,
Appreciate if you may reply to the following questions:
Best,
Bolian
In the paper, each imagenet cell consists of N = 6 nodes. Why the default setting of the node number in the code is 4?
I exec 'python train.py --auxiliary --cutout' and run an error which logs shows:
File “train.py ” line 72,in main
File "",line 1,in
AttributeError: module 'genotypes' has no attribute 'DARTS'
My pytorch version is 1.1.0 python version is python3
Hello,
I would like to begin by thanking you for this work. Me as an intern , i am investigating PC-DARTS for my breast-cancer diagnosis task ( classification of different cancers ). and I am trying to find the best architecture possible for this.
And since all your tests were done on natural images datasets ,do you have any comments about using them on such medical datasets? hyper-parameters that we should choose carefully ?
I did run some tests on it and i am saturating at 58 % on validation set, would an end-to-end training can tell me if the cell configuration i got is messy ?
thanks in advance for answering.
Hello,
I want to load the found architecture in a different script, is there a way to load the architecture structure(class) and the .weight file externally, just by importing using torch.load() ?
Thanks
Hello,
I am trying to find an architecture for a 3x320x320 input image set as my dataset. I keep getting out of memory error from CUDA. Is it possible to run this NAS method for this image resolution using only 1 x Nvidia 1080ti 11GB ?
Hi, thank you for your great contribution.
Could you pls release the search code for ImageNet? I can hardly reproduce the results reported in the paper.
regards
hi, @yuhuixu1993, thanks for your good work.
You mentioned "We use eight Tesla V100 GPUs for search, and the total batch size is
1,024. The entire search process takes around 11.5 hours. "
I just wanna to know do you have any plan to realease your parallel train_search code on ImageNet ? Thank you.
test.py文件中的model.pt找不到,我试图改为weight.pt会报内存溢出的错误,请问这是什么问题?model.pt是在训练时生曾的文件吧?可是我找不到?
I run the code as suggested but fail to get the code running but get an error "out of memory".
Envs:
I use one GPU 1080TI with 11178MiB.
I use python2.7 and PyTorch 1.0.
I simply download the code and run python train_search.py.
Would you please tell me where I went wrong?
I am getting 31% validation accuracy after the search directly on imagenet, is it the same with what you guys got? If not, can you tell me what is the validation accuracy I should expect at the end of train_search_imagenet?
In train_search.py, you train the architect after epoch 15, is it neccessary to get the valid input before epoch 15?
It will save time if you do not get the valid input before epoch 15.
Hello, @yuhuixu1993
Should the #L130 from model_search_imagenet.py be changed as:
reduction_prev = False
Thanks,
Hi @yuhuixu1993 ,
I am trying to run train_search_imagenet.
But I find that the learning rate decay is weird.
if the initial_lr is 0.5 and there is Warming-up Epoc.
The lr should be 0.1, 0.2, 0.3, 0.4, 0.5, 0.49xxx, 0.49xxxx.....
However, the lr become 0.1, 0.04, 0.024, 0.018, ...
It seems that you times 1/5, 2/5, 3/5, ... on previous lr not on the initial_lr 0.5
Am I right?
当开始train_search.py的时候,会不停的创建log.txt,但事实上应该是只有第一个创建的txt是有效的。之后会不停的创建log文件,并且这些文件都是空的
Thanks for your awesome work.
I notice that some code is used to keep the same iterations in train and search
assert train_iters==valid_iters
I can't understand it. Is it a convention in nas?
Hi,
I am wondering in the search stage , we begin to update the architecture only after 15 epochs ? what's the principal reason ?
If not working with CIFAR10 or IMAGENET , can this particular epoch number change ?if yes, it depends on what ?
Thanks in advance
Hello, when I am trying to re-implement your result on Cifar-10 with your code, I search 4 times, and train them with your code for 600 epochs, the best accuracy on validation set are 96.77, 97.32, 97.35, 97.21 separately. But in paper you claim the accuracy on testing set is 97.43+-0.07. Obviously here is a significant gap, why does this happen? Hope to get your response, thank you!
First of all, thank you for your greate work.
I have a gender recognition project. The dataset I am using now is Celeba. I divide the dataset to female and male according to their lables, then follow the parameters you used to search on imagenet. As mentioned in the paper, during the first 35 of 50 epochs , only weights of the network are updated while the architechture is not changed. Then in the last 15 epochs, the architechture search is performed.
However, the results show that after 35 epochs, the accuracy of trainning/validation/ test are decressed dramatically. The architecture search does not bring any improvement but is harmful to the accuracy. The figure of the accuracy is shown below.
The parameters I used are shown below. Any parameters that are not shown are the same as the default values in file "train_search_imagenet.py"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.