yuhuixu1993 / pc-darts Goto Github PK

View Code? Open in Web Editor NEW

435.0 435.0 108.0 1.52 MB

PC-DARTS:Partial Channel Connections for Memory-Efficient Differentiable Architecture Search

Python 100.00%

differentiable-architecture-search

pc-darts's Issues

Should we set drop rate 0 when tests?

I see the code set 0.2, when tests.

why did you discard parameter '--unrolled'?

@yuhuixu1993 hi, I just noticed that you have dropped the parameter '--unrolled' after 'python train_search.py' command in ReadME. I want to know it's a carefulless missing or you truely did it. After all, the derivation of architecture parameters will be incomplete without '--unrolled'.

About "replace input_search, target_search = next(iter(valid_queue))"

Thank you for your release code! I noticed that you replaced "input_search, target_search = next(iter(valid_queue))", why it is much faster? And why the code in "try" is "next(valid_queue_iter)" instead of "next(valid_queue)"? Hope for your reply!
try:
input_search, target_search = next(valid_queue_iter)
except:
valid_queue_iter = iter(valid_queue)
input_search, target_search = next(valid_queue_iter)

ImageNet search question

"To reduce search time, we randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters." I have two questions:

10% images of ImageNet means the subset only has 100 classes?
hyper-parameters means architecture weights?
Thanks for your reply!

关于在imagenet数据集上训练的相关咨询

你们使用的imagenet数据集是ILSVRC2012（大概压缩包有150G）吗？你们基于什么配置在imagenet数据集下训练搜索出来的最佳模型呢？总共花多长时间呢？

incessant create floder

When I run train_search.py, it creates many floder start with "experiment dir:search" Only one floder is used to save log. logs in other floders is empty. I can't find the commend to created floder. So how can close it ?

GPU Utilization is Bad

Hi, Thanks for your great work. When I search imagenet on 4 GPUs, it is observed that the GPU utilization is really low (<20 %). How could I fix this?

Thanks in advance.

Data preparation of ImageNet

Hello @yuhuixu1993,

Could you release a script for data preparation "10% and 2.5% images" or filename lists of those data?

Thank you so much.

About search cost in imagenet

"PC-DARTS allows a direct search on ImageNet (while DARTS failed due to low
stability), and achieves a state-of-the-art top-1 error of 24.2% (under the mobile setting) with only
3.8 GPU-days (11.5 hours) on 8 GPUs for search" the sentence means what？

if use a single gpu, it is 3.8GPUS-days ; if use 8 gpus, it is 11.5 hours ?

text error in the paper about network parameters freeze

The following description is extracted from Section 4.2:
we freeze network hyper-parameters and
only allow network parameters to be tuned in the first 15 epochs

I guess the authors wanted to say the parameters are allowed to be tuned only after the first 15 epochs, which is consistent with the codes.

Is this a normal case?

I run this command

python train_search.py

and get a these results

is it right ?

CIFAR10 test error

Does anybody run the codes on CIFAR10?
My valid_acc on CIFAR10 is only 97.06
I just run
python train.py --auxiliary --cutout
and set the batch_size to be 128 (default 96)
so, what the problem with my experiment? thanks

Advantage of weights-free operations(skip_connection)

@yuhuixu1993 Hi, thanks for your paper and project. I have studied Darts series for a while, and was confused about skip_connection issue(Skip_connection appears more and more during the searching process). I found that in your paper(PC-darts)， you use 'consistent' explain the advantage of weights-free ops. How to understand the 'consistent' you talked, does it mean that the output across iterations are consistent? And how does this consistency bring advantages, does it have a certain impact on the direction of gradient descent during backpropagation?
Looking forward to your explanation， Sincerly

Search on ImageNet

@yuhuixu1993 thanks for you last timely reply,I now run search model in ImageNet,I use 100% ImageNet （batch size=1024 and 8 V100）,and 2 day pass,In epoch 5 the logs shows tran_acc 3.305580,is it right?and i have another question,I saw in you paper " Still, a total of 50 epochs are trained and architecture hyper-parameters are frozen during the first 35 epochs." I am a little coufuse about this step.

finally accuracy

hi，I want to know train how many epoch can get the 2.5% error in cifar10 ？
thank you vary much！

how to train by using multi GPU?

Hi:
I have 8 GPUs in one computer. if I use model = nn.DataParallel(model) in train_serach.py，it can't works.How to train pc-dart by using 8 gpus?

Out of memory during searching & training.

hi, @yuhuixu1993, thank you for sharing your amazing work.

I have tried your code to search a network on cifar10, and I followed the instruction ``python train_search.py'' to run the code; however, the search procedure was ended up with CUDA out of memory error, i.e.,

09/01 01:55:07 PM train 000 2.426532e-01 90.234375 100.000000
09/01 01:57:19 PM train 050 2.439179e-01 91.590074 99.862132
09/01 01:59:23 PM train_acc 91.776000
09/01 01:59:23 PM epoch 49 lr 1.000000e-03
09/01 01:59:23 PM genotype = Genotype(normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], normal_concat=range(2, 6), reduce=[('skip_connect', 1), ('max_pool_3x3', 0), ('sep_conv_5x5', 1), ('dil_conv_5x5', 0), ('sep_conv_5x5', 1), ('sep_conv_5x5', 2), ('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], reduce_concat=range(2, 6))
tensor([[0.1646, 0.1040, 0.1085, 0.1185, 0.1612, 0.1110, 0.1120, 0.1202],
        [0.1498, 0.1149, 0.1231, 0.1378, 0.0881, 0.1234, 0.1322, 0.1307],
        [0.1124, 0.1017, 0.1039, 0.1098, 0.1700, 0.1085, 0.1513, 0.1423],
        [0.1205, 0.1086, 0.1074, 0.1125, 0.1670, 0.1348, 0.1133, 0.1358],
        [0.1754, 0.0947, 0.0893, 0.1235, 0.1288, 0.1079, 0.1647, 0.1156],
        [0.1414, 0.0955, 0.0984, 0.1059, 0.1967, 0.1468, 0.0984, 0.1169],
        [0.1193, 0.1157, 0.1068, 0.1123, 0.2055, 0.1308, 0.1107, 0.0988],
        [0.1533, 0.1183, 0.1268, 0.1451, 0.1239, 0.1101, 0.1156, 0.1068],
        [0.1498, 0.1063, 0.1073, 0.1328, 0.1361, 0.1421, 0.1167, 0.1089],
        [0.1132, 0.1281, 0.1156, 0.1162, 0.1608, 0.1445, 0.1159, 0.1056],
        [0.1024, 0.1174, 0.1393, 0.1294, 0.1359, 0.1527, 0.1220, 0.1010],
        [0.1306, 0.1090, 0.1147, 0.1288, 0.1520, 0.1055, 0.1442, 0.1152],
        [0.1462, 0.0942, 0.0924, 0.1288, 0.1084, 0.1127, 0.1749, 0.1425],
        [0.1416, 0.0991, 0.0974, 0.1282, 0.1396, 0.1392, 0.1244, 0.1306]],
       device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([[0.1148, 0.1495, 0.1392, 0.1084, 0.1359, 0.1406, 0.1165, 0.0951],
        [0.1327, 0.1076, 0.0993, 0.1602, 0.1379, 0.1272, 0.1087, 0.1264],
        [0.1164, 0.1317, 0.1378, 0.1009, 0.1293, 0.1330, 0.1105, 0.1405],
        [0.1259, 0.1005, 0.1057, 0.1288, 0.1212, 0.1532, 0.1372, 0.1273],
        [0.1387, 0.0891, 0.0919, 0.1215, 0.1347, 0.1005, 0.1860, 0.1375],
        [0.1181, 0.1241, 0.1199, 0.1367, 0.1470, 0.1228, 0.1066, 0.1249],
        [0.1344, 0.1070, 0.1094, 0.1165, 0.1187, 0.1590, 0.1319, 0.1233],
        [0.1372, 0.1035, 0.1067, 0.1308, 0.1076, 0.1623, 0.1355, 0.1164],
        [0.1400, 0.1138, 0.1198, 0.1409, 0.1042, 0.1349, 0.1158, 0.1307],
        [0.1309, 0.1175, 0.1291, 0.1270, 0.1265, 0.1102, 0.1504, 0.1083],
        [0.1337, 0.1022, 0.1076, 0.1221, 0.1369, 0.1550, 0.1146, 0.1279],
        [0.1339, 0.0890, 0.0939, 0.1212, 0.1117, 0.1830, 0.1362, 0.1313],
        [0.1431, 0.1034, 0.1148, 0.1350, 0.1092, 0.1270, 0.1375, 0.1300],
        [0.1396, 0.0920, 0.0998, 0.1267, 0.1404, 0.0974, 0.1373, 0.1669]],
       device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([0.3777, 0.3653, 0.2571], device='cuda:1', grad_fn=<SoftmaxBackward>)
09/01 01:59:27 PM train 000 2.647865e-01 89.843750 100.000000
09/01 02:01:39 PM train 050 2.318434e-01 91.980699 99.892770
09/01 02:03:43 PM train_acc 91.660000
09/01 02:03:43 PM valid 000 5.367675e-01 83.593750 99.218750
Traceback (most recent call last):
  File "train_search.py", line 206, in <module>
    main() 
  File "train_search.py", line 130, in main
    valid_acc, valid_obj = infer(valid_queue, model, criterion)
  File "train_search.py", line 190, in infer
    logits = model(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 159, in forward
    s0, s1 = s1, cell(s0, s1, weights,weights2)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in forward
    s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in <genexpr>
    s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 44, in forward
    temp1 = sum(w * op(xtemp) for w, op in zip(weights, self._ops))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 15.29 GiB already allocated; 15.56 MiB free; 7.17 MiB cached)

Then, I continued to run the command ``python train.py --auxiliary --cutout'', and the training of the searched model also raised the OOM error, i.e.,

➜  pc-darts python train.py --auxiliary --cutout --gpu 1
Experiment dir : eval-EXP-20190901-143341
09/01 02:33:41 PM gpu device = 1
09/01 02:33:41 PM args = Namespace(arch='PCDARTS', auxiliary=True, auxiliary_weight=0.4, batch_size=96, cutout=True, cutout_length=16, data='../data', drop_path_prob=0.3, epochs=600, gpu=1, grad_clip=5, init_channels=36, layers=20, learning_rate=0.025, model_path='saved_models', momentum=0.9, report_freq=50, save='eval-EXP-20190901-143341', seed=0, set='cifar10', weight_decay=0.0003)
108 108 36
108 144 36
144 144 36
144 144 36
144 144 36
144 144 36
144 144 72
144 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 144
288 576 144
576 576 144
576 576 144
576 576 144
576 576 144
576 576 144
09/01 02:33:44 PM param size = 3.634678MB
Files already downloaded and verified
Files already downloaded and verified
/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
09/01 02:33:46 PM epoch 0 lr 2.499983e-02
train.py:136: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  nn.utils.clip_grad_norm(model.parameters(), args.grad_clip)
09/01 02:33:48 PM train 000 3.258214e+00 8.333333 50.000000
09/01 02:34:18 PM train 050 3.215875e+00 13.623365 56.638069
09/01 02:34:49 PM train 100 3.148910e+00 15.459983 61.984321
09/01 02:35:19 PM train 150 3.054110e+00 18.329194 67.335814
09/01 02:35:49 PM train 200 2.970589e+00 20.677860 71.035445
09/01 02:36:19 PM train 250 2.899201e+00 22.705012 73.725927
09/01 02:36:50 PM train 300 2.842999e+00 24.228266 75.633303
09/01 02:37:20 PM train 350 2.789153e+00 25.741927 77.148620
09/01 02:37:50 PM train 400 2.736966e+00 27.153469 78.514648
09/01 02:38:20 PM train 450 2.694277e+00 28.466832 79.557000
09/01 02:38:50 PM train 500 2.656195e+00 29.663588 80.561790
09/01 02:39:03 PM train_acc 30.111999
09/01 02:39:03 PM valid 000 1.350200e+00 52.083332 91.666664
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    main() 
  File "train.py", line 113, in main
    valid_acc, valid_obj = infer(valid_queue, model, criterion)
  File "train.py", line 161, in infer
    logits, _ = model(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model.py", line 150, in forward
    s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model.py", line 51, in forward
    h1 = op1(h1)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/operations.py", line 66, in forward
    return self.op(x)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 343, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 340, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 14.26 GiB already allocated; 1.56 MiB free; 1.05 GiB cached)

In addition, my environment is ``Ubuntu 16.04 + CUDA 10.0 + Python 3.7 + PyTorch 1.2''.

Since I am new to NAS, I cannot figure out what causes the OOM error. Could you help fix this error or give some suggestions? Thanks.

感谢你的开源

这是很有效的方法，看了整个框架后，我有这样两个疑问：
1、Genotype是怎么进行更新的：在model_search.py里，实现了生成新的Genotype，那下一步的迭代是如何使用这个新的Genotype呢？
2、每次只对1/4的X进行运算，剩下的3/4直接传递下去了，这是否可以看做是skip_connection呢？

save genotype

Thanks for your work.

I was wondering how to save the network that have been found by PCDARTS. I see that At the beigining, of the training, utils.create_exp_dir create a directory with all py scripts. However, it seems to me that the genotype of the new model is not saved.

Is there any plan to release the pretrained imagenet model?

Hi @yuhuixu1993 , Could you please make the pretrained imagenet model public?

randomness involved in the channel_shuffle function?

Hi @yuhuixu1993,

It seems the channel_shuffle function evenly redistributes the channels composed of those pass through mixed operations (the first 1/K) and those do not (the later 1-1/K) in a determined pattern. Although it can produce much more complex results in later nodes as more channel_shuffle functions contribute, there is no randomness involved. Is it possible for the network to get used to the pattern and lose the regularization effect eventually. Why didn't you apply random sampling? Thank you!

def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.data.size()
    channels_per_group = num_channels // groups
    # reshape
    x = x.view(batchsize, groups,
               channels_per_group, height, width)
    x = torch.transpose(x, 1, 2).contiguous()
    # flatten
    x = x.view(batchsize, -1, height, width)
    return x

In my toy example:

x = torch.Tensor([ [ [ [1,1,1], [1,1,1], [1,1,1] ],
                     [ [2,2,2], [2,2,2], [2,2,2] ],
                     [ [3,3,3], [3,3,3], [3,3,3] ],
                     [ [4,4,4], [4,4,4], [4,4,4] ],
                     [ [5,5,5], [5,5,5], [5,5,5] ],
                     [ [6,6,6], [6,6,6], [6,6,6] ],
                     [ [7,7,7], [7,7,7], [7,7,7] ],
                     [ [8,8,8], [8,8,8], [8,8,8] ] ] ])

the output of channel_shuffle(x, 4) is always

tensor([[[[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],

         [[3., 3., 3.],
          [3., 3., 3.],
          [3., 3., 3.]],

         [[5., 5., 5.],
          [5., 5., 5.],
          [5., 5., 5.]],

         [[7., 7., 7.],
          [7., 7., 7.],
          [7., 7., 7.]],

         [[2., 2., 2.],
          [2., 2., 2.],
          [2., 2., 2.]],

         [[4., 4., 4.],
          [4., 4., 4.],
          [4., 4., 4.]],

         [[6., 6., 6.],
          [6., 6., 6.],
          [6., 6., 6.]],

         [[8., 8., 8.],
          [8., 8., 8.],
          [8., 8., 8.]]]])

search on ImageNet

When search on ImageNet，at seed 0，the network has 1 skip_connect and flops is 645M. At seed 1, the network has 0 skip_connect and flops is 712M. I also test PC_DARTS_image ,the flops is 595M.

I test on different dataset ,it seems the flops can be approximate by 700-50*skip_connect, am I right?Sometimes max_pool and ave_pool show in norm cell, they also reduce 50M. If that's right, all work based on DARTS report their flops below 600M,so most of thier work has 2 or more skip_connect.

But when I search, often get 0 or 1 skip_connect, I wonder if "args.epochs" should be expanded?what epochs is suitable?

How to change the channel proportion K?

It seems to be hardcoded. Do you just modify the following lines to change the parameter K?

PC-DARTS/model_search.py

Lines 33 to 35 in 2f6aac3

 op = OPS[primitive](C //4, stride, False) 

 if 'pool' in primitive: 

 op = nn.Sequential(op, nn.BatchNorm2d(C //4, affine=False))

PC-DARTS/model_search.py

Lines 42 to 43 in 2f6aac3

 xtemp = x[ : , : dim_2//4, :, :] 

 xtemp2 = x[ : , dim_2//4:, :, :]

PC-DARTS/model_search.py

Line 50 in 2f6aac3

ans = channel_shuffle(ans,4)

Results on ImageNet

hi @yuhuixu1993 thanks for sharing your excellent work. I just wonder the search cost when you search on ImageNet.
PC-DARTS uses 12.5% images of Imagenet for searching in 4.3, so would the search cost be 3.8*8=30.4 GPU-days, if we use 100% images of ImageNet?
thx

How to derive the final architecture?

Hi Yuhui,

After I search on the CIFAR10 dataset, I get one type of genotype, which is familiar with the reported case in your paper.

However, when I derive the final architecture and calculate the FLOPs and latency, it seems a little strange.

For example, I run

from model import NetworkCIFAR as Network
import genotypes

genotype = genotype = eval("genotypes.%s" % "PCDARTS")

with torch.cuda.device(0):
    model = Network(36, 1000, 14, True, genotype)
    model.drop_path_prob = 0.3
    model.eval()
    flops, params =  get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
    print("{:<30}  {:<8}".format("Computational complexity: ", flops))
    print("{:<30}  {:<8}".format("Number of parameters: ", params))

The reported model complexity and number of parameters for the searched genotypes (with 14 layers)are as follows:

Computational complexity:       20.11 GMac
Number of parameters:           4.3 M

But when I run the resnet50 for comparison:

from torchvision.models import resnet50

with torch.cuda.device(0):
    model = resnet50(pretrained=False)
    flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True,
                                             print_per_layer_stat=True)
    print('{:<30}  {:<8}'.format('Computational complexity: ', flops))
    print('{:<30}  {:<8}'.format('Number of parameters: ', params))

The reported model complexity and number of parameters for resnet50 are as follows:

Computational complexity:       4.12 GMac
Number of parameters:           25.56 M

The reported FLOPs in your paper on ImageNet setting is 597M. It seems there is something wrong with my derived final architecture. At your convenience, could you help to give clarifications about how to derive the final architecture? I will consider to deploy the searched model on some hardware devices and try to add some hardware-aware constraints for the overall design.

Additionally, the latency for the searched genotype (with 14 layers) is nearly ten times as the resnet50, which is unacceptable.

I am also an undergraduate from SJTU. Really thanks for your help. hahahaha

Why not use more reduction cells?

Thank you for this wonderful code. I have a question. In the paper on ImageNet, you start with three conv layers to reduce the resolution. Why not use cells to reduce the size,

about random sample

In sec 4.1, you said:

instead choose the first K channels of xi for operation mixture directly. To compensate, after xj is obtained, we shuffle its channels before using it for further computations.

but i think it is not a random sample implement because choosing the first K channels and channel shuffle are all determined operation.

i'm glad that you tell me if there is any misunderstanding of the implementation.

Is a channel sampling mask fixed?

Hello, thank you for your code sharing.
When looking into your code, I have a question about implementation for your partial channel connection idea.

In your code (model_search.py), it seems that "channel_shuffle" function only choose the first quartile of channels (including "forward" function of MixedOp class).
Does it mean that a channel sampling mask S_i,j defined in your paper is a fixed mask?

Please answer my question.
Thank you!

pytorch vision

Tesla V100 which requires CUDA_VERSION >= 9000 for optimal performance and fast startup time, but your PyTorch(0.3.1) was compiled with CUDA_VERSION 8000.

We cannot obtain your claimed result on ImageNet after trying many configurations

Hello, we have sent you several emails before asking about this problem. We first try to use 10% training set for training and 2.5% training set for validation and use a linear learning rate scheduler, with SGDM we train 250 epochs but the result is horrible. Then we try to use the whole ImageNet and get an even much more horrible result. Later we change the optimizer and use the whole ImageNet, now we still have clear accuracy gap. May you kindly give us more details about how you obtained the results in your paper? We really need your response, or we may have to follow our "wrong" configuration, do fair comparisons, and report our real observations to the conferences. Thank you so much!

How was PC_DARTS_cifar got?

Hi @yuhuixu1993,

Appreciate if you may reply to the following questions:

was PC_DARTS_cifar searched by CIFAR10 or CIFAR100?
was PC_DARTS_cifar the generated genotype of the last (50th) epoch?
Thanks a lot!!

Best,
Bolian

The node number in the imagenet cell

In the paper, each imagenet cell consists of N = 6 nodes. Why the default setting of the node number in the code is 4?

model 'genotype' has no attribute 'DARTS'

I exec 'python train.py --auxiliary --cutout' and run an error which logs shows：
File “train.py ” line 72,in main
File "",line 1,in
AttributeError: module 'genotypes' has no attribute 'DARTS'
My pytorch version is 1.1.0 python version is python3

When will the ckpt file release?

PC-DARTS on medical images classification

Hello,
I would like to begin by thanking you for this work. Me as an intern , i am investigating PC-DARTS for my breast-cancer diagnosis task ( classification of different cancers ). and I am trying to find the best architecture possible for this.

And since all your tests were done on natural images datasets ,do you have any comments about using them on such medical datasets? hyper-parameters that we should choose carefully ?

I did run some tests on it and i am saturating at 58 % on validation set, would an end-to-end training can tell me if the cell configuration i got is messy ?

thanks in advance for answering.

How can i import the network and weights in a different script ?

Hello,

I want to load the found architecture in a different script, is there a way to load the architecture structure(class) and the .weight file externally, just by importing using torch.load() ?

Thanks

Can i use this method with a [3x320x320] img shape on a Nvidia 1080ti 11GB ?

Hello,

I am trying to find an architecture for a 3x320x320 input image set as my dataset. I keep getting out of memory error from CUDA. Is it possible to run this NAS method for this image resolution using only 1 x Nvidia 1080ti 11GB ?

Could you pls release the search code for ImageNet

Hi, thank you for your great contribution.
Could you pls release the search code for ImageNet? I can hardly reproduce the results reported in the paper.
regards

parallel train_search code on ImageNet

hi, @yuhuixu1993, thanks for your good work.
You mentioned "We use eight Tesla V100 GPUs for search, and the total batch size is
1,024. The entire search process takes around 11.5 hours. "
I just wanna to know do you have any plan to realease your parallel train_search code on ImageNet ? Thank you.

test.py运行报错

test.py文件中的model.pt找不到，我试图改为weight.pt会报内存溢出的错误，请问这是什么问题？model.pt是在训练时生曾的文件吧？可是我找不到？

Using 1080Ti and python2 pytorch1.0 out of memory

I run the code as suggested but fail to get the code running but get an error "out of memory".

Envs:
I use one GPU 1080TI with 11178MiB.
I use python2.7 and PyTorch 1.0.
I simply download the code and run python train_search.py.

Would you please tell me where I went wrong?

search accuracy for imagenet

I am getting 31% validation accuracy after the search directly on imagenet, is it the same with what you guys got? If not, can you tell me what is the validation accuracy I should expect at the end of train_search_imagenet?

about the valid_queue

In train_search.py, you train the architect after epoch 15, is it neccessary to get the valid input before epoch 15?
It will save time if you do not get the valid input before epoch 15.

Typo at model_search_imagenet.py

Hello, @yuhuixu1993

Should the #L130 from model_search_imagenet.py be changed as:
reduction_prev = False

Thanks,

train_search_imagenet lr scheduler is weird

Hi @yuhuixu1993 ,

I am trying to run train_search_imagenet.
But I find that the learning rate decay is weird.
if the initial_lr is 0.5 and there is Warming-up Epoc.
The lr should be 0.1, 0.2, 0.3, 0.4, 0.5, 0.49xxx, 0.49xxxx.....
However, the lr become 0.1, 0.04, 0.024, 0.018, ...
It seems that you times 1/5, 2/5, 3/5, ... on previous lr not on the initial_lr 0.5
Am I right?

有一个小bug

当开始train_search.py的时候，会不停的创建log.txt，但事实上应该是只有第一个创建的txt是有效的。之后会不停的创建log文件，并且这些文件都是空的

split for nas

Thanks for your awesome work.
I notice that some code is used to keep the same iterations in train and search

assert train_iters==valid_iters

I can't understand it. Is it a convention in nas?

Why modifying architecture after epoch 15

Hi,
I am wondering in the search stage , we begin to update the architecture only after 15 epochs ? what's the principal reason ?
If not working with CIFAR10 or IMAGENET , can this particular epoch number change ?if yes, it depends on what ?

Thanks in advance

Cannot re-implement your claimed result

Hello, when I am trying to re-implement your result on Cifar-10 with your code, I search 4 times, and train them with your code for 600 epochs, the best accuracy on validation set are 96.77, 97.32, 97.35, 97.21 separately. But in paper you claim the accuracy on testing set is 97.43+-0.07. Obviously here is a significant gap, why does this happen? Hope to get your response, thank you!

Question about search on custom dataset

First of all, thank you for your greate work.

I have a gender recognition project. The dataset I am using now is Celeba. I divide the dataset to female and male according to their lables, then follow the parameters you used to search on imagenet. As mentioned in the paper, during the first 35 of 50 epochs , only weights of the network are updated while the architechture is not changed. Then in the last 15 epochs, the architechture search is performed.

However, the results show that after 35 epochs, the accuracy of trainning/validation/ test are decressed dramatically. The architecture search does not bring any improvement but is harmful to the accuracy. The figure of the accuracy is shown below.

The parameters I used are shown below. Any parameters that are not shown are the same as the default values in file "train_search_imagenet.py"

	op = OPS[primitive](C //4, stride, False)
	if 'pool' in primitive:
	op = nn.Sequential(op, nn.BatchNorm2d(C //4, affine=False))

	xtemp = x[ : , : dim_2//4, :, :]
	xtemp2 = x[ : , dim_2//4:, :, :]

yuhuixu1993 / pc-darts Goto Github PK

pc-darts's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs