GithubHelp home page GithubHelp logo

alphanet's Introduction

AlphaNet: Improved Training of Supernet with Alpha-Divergence

This repository contains our PyTorch training code, evaluation code and pretrained models for AlphaNet.

PWC

Our implementation is largely based on AttentiveNAS. To reproduce our results, please first download the AttentiveNAS repo, and use our train_alphanet.py for training and test_alphanet.py for testing.

For more details, please see AlphaNet: Improved Training of Supernet with Alpha-Divergence by Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra.

If you find this repo useful in your research, please consider citing our work and AttentiveNAS:

@article{wang2021alphanet,
  title={AlphaNet: Improved Training of Supernet with Alpha-Divergence},
  author={Wang, Dilin and Gong, Chengyue and Li, Meng and Liu, Qiang and Chandra, Vikas},
  journal={arXiv preprint arXiv:2102.07954},
  year={2021}
}

@article{wang2020attentivenas,
  title={AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling},
  author={Wang, Dilin and Li, Meng and Gong, Chengyue and Chandra, Vikas},
  journal={arXiv preprint arXiv:2011.09011},
  year={2020}
}

Evaluation

To reproduce our results:

  • Please first download our pretrained AlphaNet models from a Google Drive path and put the pretrained models under your local folder ./alphanet_data

  • To evaluate our pre-trained AlphaNet models, from AlphaNet-A0 to A6, on ImageNet with a single GPU, please run:

    python test_alphanet.py --config-file ./configs/eval_alphanet_models.yml --model a[0-6]

    Expected results:

    Name MFLOPs Top-1 (%)
    AlphaNet-A0 203 77.87
    AlphaNet-A1 279 78.94
    AlphaNet-A2 317 79.20
    AlphaNet-A3 357 79.41
    AlphaNet-A4 444 80.01
    AlphaNet-A5 (small) 491 80.29
    AlphaNet-A5 (base) 596 80.62
    AlphaNet-A6 709 80.78
  • Additionally, here is our pretrained supernet with KL based inplace-KD and here is our pretrained supernet without inplace-KD.

Training

To train our AlphaNet models from scratch, please run:

python train_alphanet.py --config-file configs/train_alphanet_models.yml --machine-rank ${machine_rank} --num-machines ${num_machines} --dist-url ${dist_url}

We adopt SGD training on 64 GPUs. The mini-batch size is 32 per GPU; all training hyper-parameters are specified in train_alphanet_models.yml.

Evolutionary search

In case you want to search the set of models of your own interest - we provide an example to show how to search the Pareto models for the best FLOPs vs. accuracy tradeoffs in parallel_supernet_evo_search.py; to run this example:

python parallel_supernet_evo_search.py --config-file configs/parallel_supernet_evo_search.yml 

License

AlphaNet is licensed under CC-BY-NC.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

alphanet's People

Contributors

dilinwang820 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphanet's Issues

Re-training code is available?

Thanks for your great codes!
after running the alphanet, I would like to re-train the searched model. Can you provide the re-training code?

The AdaptiveLossSoft become NAN

Hi,
When I use my own dataset and train the Knowledge distillation by the AdaptiveLossSoft, the loss will gradually become NAN and the acc1 will first increase and then decrease (acc5 is set the same as acc1 in my code, just ignore this). Any suggestions?

Epoch: [0][ 0/7176] Time 71.193 (71.193) Data 66.870 (66.870) Loss 2.8784e-01 (2.8784e-01) Acc@1 0.09 ( 0.09) Acc@5 0.09 ( 0.09)
Epoch: [0][ 100/7176] Time 1.870 ( 2.728) Data 0.000 ( 0.662) Loss 4.6225e-02 (-7.6008e-02) Acc@1 0.08 ( 0.11) Acc@5 0.08 ( 0.11)
Epoch: [0][ 200/7176] Time 1.931 ( 2.363) Data 0.000 ( 0.333) Loss 4.9116e-01 (7.5377e-02) Acc@1 0.09 ( 0.11) Acc@5 0.09 ( 0.11)
Epoch: [0][ 300/7176] Time 1.860 ( 2.250) Data 0.000 ( 0.222) Loss 2.0824e-01 (2.0263e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11)
Epoch: [0][ 400/7176] Time 2.271 ( 2.200) Data 0.000 ( 0.167) Loss 8.5119e-01 (3.0746e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11)
Epoch: [0][ 500/7176] Time 2.325 ( 2.173) Data 0.000 ( 0.134) Loss 1.6488e+00 (4.4695e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11)
Epoch: [0][ 600/7176] Time 2.029 ( 2.150) Data 0.000 ( 0.111) Loss 1.2261e+00 (6.1032e-01) Acc@1 0.13 ( 0.11) Acc@5 0.13 ( 0.11)
Epoch: [0][ 700/7176] Time 1.945 ( 2.134) Data 0.000 ( 0.096) Loss -3.8604e-01 (6.4971e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11)
Epoch: [0][ 800/7176] Time 1.981 ( 2.121) Data 0.000 ( 0.084) Loss -1.3327e-02 (5.4336e-01) Acc@1 0.15 ( 0.12) Acc@5 0.15 ( 0.12)
Epoch: [0][ 900/7176] Time 2.034 ( 2.108) Data 0.000 ( 0.074) Loss 1.0979e+00 (5.1461e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12)
Epoch: [0][1000/7176] Time 1.975 ( 2.101) Data 0.000 ( 0.067) Loss 1.6591e-02 (4.1266e-01) Acc@1 0.18 ( 0.12) Acc@5 0.18 ( 0.12)
Epoch: [0][1100/7176] Time 1.715 ( 2.095) Data 0.000 ( 0.061) Loss -1.0021e+00 (3.3724e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12)
Epoch: [0][1200/7176] Time 1.852 ( 2.088) Data 0.000 ( 0.056) Loss -3.7594e-01 (2.9739e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12)
Epoch: [0][1300/7176] Time 1.892 ( 2.082) Data 0.000 ( 0.052) Loss -7.8806e-02 (2.5089e-01) Acc@1 0.12 ( 0.12) Acc@5 0.12 ( 0.12)
Epoch: [0][1400/7176] Time 1.956 ( 2.078) Data 0.000 ( 0.048) Loss 8.6050e-02 (2.3144e-01) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13)
Epoch: [0][1500/7176] Time 2.031 ( 2.074) Data 0.000 ( 0.045) Loss -1.8159e-01 (2.2123e-01) Acc@1 0.16 ( 0.13) Acc@5 0.16 ( 0.13)
Epoch: [0][1600/7176] Time 2.118 ( 2.072) Data 0.000 ( 0.042) Loss 3.8409e-01 (2.1557e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][1700/7176] Time 2.163 ( 2.069) Data 0.000 ( 0.039) Loss 3.2751e-01 (2.1508e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13)
Epoch: [0][1800/7176] Time 2.166 ( 2.068) Data 0.000 ( 0.037) Loss -3.0104e-01 (2.1683e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][1900/7176] Time 1.822 ( 2.066) Data 0.000 ( 0.035) Loss 3.6041e-01 (2.1936e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][2000/7176] Time 1.888 ( 2.065) Data 0.000 ( 0.034) Loss 6.0852e-02 (2.2056e-01) Acc@1 0.17 ( 0.14) Acc@5 0.17 ( 0.14)
Epoch: [0][2100/7176] Time 1.928 ( 2.064) Data 0.000 ( 0.032) Loss 7.0139e-01 (2.2213e-01) Acc@1 0.20 ( 0.14) Acc@5 0.20 ( 0.14)
Epoch: [0][2200/7176] Time 2.212 ( 2.061) Data 0.000 ( 0.031) Loss 2.7252e-01 (2.1953e-01) Acc@1 0.21 ( 0.14) Acc@5 0.21 ( 0.14)
Epoch: [0][2300/7176] Time 1.816 ( 2.060) Data 0.000 ( 0.029) Loss -1.5090e-01 (2.2140e-01) Acc@1 0.15 ( 0.14) Acc@5 0.15 ( 0.14)
Epoch: [0][2400/7176] Time 1.929 ( 2.059) Data 0.000 ( 0.028) Loss 4.2306e-01 (2.1328e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15)
Epoch: [0][2500/7176] Time 1.886 ( 2.057) Data 0.000 ( 0.027) Loss 2.7449e-01 (1.9290e-01) Acc@1 0.17 ( 0.15) Acc@5 0.17 ( 0.15)
Epoch: [0][2600/7176] Time 1.813 ( 2.056) Data 0.000 ( 0.026) Loss 5.1589e-02 (2.1373e-01) Acc@1 0.16 ( 0.15) Acc@5 0.16 ( 0.15)
Epoch: [0][2700/7176] Time 2.145 ( 2.055) Data 0.000 ( 0.025) Loss -6.0235e-01 (1.9399e-01) Acc@1 0.19 ( 0.15) Acc@5 0.19 ( 0.15)
Epoch: [0][2800/7176] Time 1.944 ( 2.054) Data 0.000 ( 0.024) Loss 7.8085e-02 (1.7437e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15)
Epoch: [0][2900/7176] Time 1.778 ( 2.053) Data 0.000 ( 0.023) Loss -1.6850e-03 (1.6211e-01) Acc@1 0.13 ( 0.15) Acc@5 0.13 ( 0.15)
Epoch: [0][3000/7176] Time 1.767 ( 2.052) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.15) Acc@5 0.00 ( 0.15)
Epoch: [0][3100/7176] Time 2.064 ( 2.050) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14)
Epoch: [0][3200/7176] Time 2.222 ( 2.048) Data 0.000 ( 0.021) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14)
Epoch: [0][3300/7176] Time 2.206 ( 2.046) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13)
Epoch: [0][3400/7176] Time 1.906 ( 2.044) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13)
Epoch: [0][3500/7176] Time 2.058 ( 2.042) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13)
Epoch: [0][3600/7176] Time 1.912 ( 2.040) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12)
Epoch: [0][3700/7176] Time 2.006 ( 2.038) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12)
Epoch: [0][3800/7176] Time 1.990 ( 2.036) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12)
Epoch: [0][3900/7176] Time 2.073 ( 2.035) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11)
Epoch: [0][4000/7176] Time 2.152 ( 2.033) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11)
Epoch: [0][4100/7176] Time 2.183 ( 2.033) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11)
Epoch: [0][4200/7176] Time 2.054 ( 2.031) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10)
Epoch: [0][4300/7176] Time 1.870 ( 2.030) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10)
Epoch: [0][4400/7176] Time 1.923 ( 2.029) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10)
Epoch: [0][4500/7176] Time 1.891 ( 2.028) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10)
Epoch: [0][4600/7176] Time 1.866 ( 2.027) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10)
Epoch: [0][4700/7176] Time 1.887 ( 2.026) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)
Epoch: [0][4800/7176] Time 2.037 ( 2.025) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)
Epoch: [0][4900/7176] Time 2.019 ( 2.024) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)
Epoch: [0][5000/7176] Time 1.936 ( 2.023) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)
Epoch: [0][5100/7176] Time 1.972 ( 2.022) Data 0.000 ( 0.013) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)

How can I preserve the search architecture?

Hi,when testing, I found it use the default architecture(a0~a6) you provided in eval_alphanet_model.yaml. How should I preserve my own architecture when training the searching stage?

there are some files missing or I can't find them

I run the test_alphanet.py with python test_alphanet.py --config-file ./configs/eval_alphanet_models.yml --model a[0-6]
then it reports an error ModuleNotFouror: No module named 'models'
e325f2655ca89622384e291e6a9b2f5

I lookup the files but i can't find models and some others in the picture in the files
c3ac0ae0e00da9e2c2b5b666491c5e6

thanks
ps:I'm a rookie so i maybe make low-level mistakes

How were the final architectures selected?

Hello,

I like your work, but I'm a bit confused about how final models a0-a6 were selected.
In the paper, in section 4.2, subsection "Evaluation" you describe an evolutionary search procedure.
However, in the subsection "Improvements on SOTA" you write that you choose a0-a6 architectures to be the same as in the AttentiveNAS paper.
Do I understand correctly that the results of the evolutionary search were not used when selecting the final models?

Thanks in advance!

Why use the training dataset in the test stage?

Hi, I check the test_alphanet.py and find it use the training dataset when testing. From your comment 'bn running stats calibration following Slimmable' , does it mean to caculate the training data's mean and variance in BN?
Thank you in advance!

How to modify the loss function to apply to multi-label classification tasks

Hello, I applied it to multi-label classification tasks, but as the following scenario shows, the loss function does not converge. How do I modify my loss function?

Epoch: [0][4000/7176] Time 2.197 ( 2.047) Data 0.000 ( 0.010) Loss -1.7713e+00 (-6.9243e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13)
Epoch: [0][4050/7176] Time 1.978 ( 2.047) Data 0.000 ( 0.010) Loss -2.5454e+00 (-7.1199e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4100/7176] Time 2.234 ( 2.048) Data 0.000 ( 0.010) Loss -3.2569e+00 (-7.2793e-01) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13)
Epoch: [0][4150/7176] Time 1.940 ( 2.048) Data 0.000 ( 0.010) Loss -1.6426e+00 (-7.5071e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4200/7176] Time 2.101 ( 2.047) Data 0.000 ( 0.010) Loss -2.9430e+00 (-7.7234e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13)
Epoch: [0][4250/7176] Time 2.056 ( 2.047) Data 0.000 ( 0.009) Loss -2.6333e+00 (-7.9679e-01) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13)
Epoch: [0][4300/7176] Time 2.002 ( 2.047) Data 0.000 ( 0.009) Loss -2.0716e+00 (-8.1554e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4350/7176] Time 1.919 ( 2.048) Data 0.000 ( 0.009) Loss -1.1843e+00 (-8.3195e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4400/7176] Time 1.972 ( 2.047) Data 0.001 ( 0.009) Loss -2.2746e+00 (-8.4905e-01) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13)
Epoch: [0][4450/7176] Time 2.151 ( 2.047) Data 0.000 ( 0.009) Loss -2.3914e+00 (-8.6940e-01) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13)
Epoch: [0][4500/7176] Time 1.873 ( 2.047) Data 0.000 ( 0.009) Loss -2.9244e+00 (-8.8694e-01) Acc@1 0.12 ( 0.13) Acc@5 0.12 ( 0.13)
Epoch: [0][4550/7176] Time 2.112 ( 2.047) Data 0.000 ( 0.009) Loss -2.9324e+00 (-9.0775e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4600/7176] Time 1.902 ( 2.047) Data 0.000 ( 0.009) Loss -2.0961e+00 (-9.2422e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13)
Epoch: [0][4650/7176] Time 1.752 ( 2.047) Data 0.000 ( 0.009) Loss -1.9184e+00 (-9.3473e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4700/7176] Time 1.967 ( 2.047) Data 0.000 ( 0.009) Loss -6.9974e-01 (-9.4724e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13)
Epoch: [0][4750/7176] Time 1.994 ( 2.046) Data 0.000 ( 0.009) Loss -3.3847e+00 (-9.6442e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13)
Epoch: [0][4800/7176] Time 2.085 ( 2.047) Data 0.000 ( 0.008) Loss -2.5070e+00 (-9.8096e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13)
Epoch: [0][4850/7176] Time 1.899 ( 2.047) Data 0.000 ( 0.008) Loss -2.2142e+00 (-9.9467e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13)
Epoch: [0][4900/7176] Time 2.096 ( 2.046) Data 0.000 ( 0.008) Loss -3.2781e+00 (-1.0132e+00) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13)
Epoch: [0][4950/7176] Time 1.782 ( 2.046) Data 0.000 ( 0.008) Loss -2.8195e+00 (-1.0289e+00) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13)
Epoch: [0][5000/7176] Time 1.996 ( 2.046) Data 0.000 ( 0.008) Loss -3.1988e+00 (-1.0456e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13)
Epoch: [0][5050/7176] Time 2.248 ( 2.046) Data 0.000 ( 0.008) Loss -1.9582e+00 (-1.0614e+00) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13)
Epoch: [0][5100/7176] Time 2.029 ( 2.046) Data 0.000 ( 0.008) Loss -2.2378e+00 (-1.0686e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13)
Epoch: [0][5150/7176] Time 1.943 ( 2.046) Data 0.000 ( 0.008) Loss -2.6617e+00 (-1.0803e+00) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13)
Epoch: [0][5200/7176] Time 2.050 ( 2.046) Data 0.000 ( 0.008) Loss -2.4670e+00 (-1.0952e+00) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13)
Epoch: [0][5250/7176] Time 2.051 ( 2.046) Data 0.000 ( 0.008) Loss -2.1453e+00 (-1.1098e+00) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13)
Epoch: [0][5300/7176] Time 2.064 ( 2.045) Data 0.000 ( 0.008) Loss -2.2843e+00 (-1.1214e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13)
Epoch: [0][5350/7176] Time 2.232 ( 2.045) Data 0.000 ( 0.008) Loss -2.4472e+00 (-1.1356e+00) Acc@1 0.16 ( 0.13) Acc@5 0.16 ( 0.13)
Epoch: [0][5400/7176] Time 1.980 ( 2.045) Data 0.000 ( 0.008) Loss -2.9187e+00 (-1.1485e+00) Acc@1 0.18 ( 0.14) Acc@5 0.18 ( 0.14)
Epoch: [0][5450/7176] Time 2.144 ( 2.046) Data 0.000 ( 0.007) Loss -2.7685e+00 (-1.1622e+00) Acc@1 0.16 ( 0.14) Acc@5 0.16 ( 0.14)
Epoch: [0][5500/7176] Time 1.839 ( 2.045) Data 0.000 ( 0.007) Loss -2.7240e+00 (-1.1766e+00) Acc@1 0.12 ( 0.14) Acc@5 0.12 ( 0.14)
Epoch: [0][5550/7176] Time 1.953 ( 2.046) Data 0.000 ( 0.007) Loss -2.1483e+00 (-1.1920e+00) Acc@1 0.15 ( 0.14) Acc@5 0.15 ( 0.14)
Epoch: [0][5600/7176] Time 1.842 ( 2.045) Data 0.000 ( 0.007) Loss -1.6526e+00 (-1.1999e+00) Acc@1 0.17 ( 0.14) Acc@5 0.17 ( 0.14)

Training accuracy suddenly approaches zero

Hi, I meet a question during the training process, the accuracy suddenly approaches zero (just like the network parameters are re-initialized). The log is as follows:

Epoch: [48][ 70/625] Time 1.383 ( 1.428) Data 0.000 ( 0.152) Loss -1.8897e+00 (-2.0853e+00) Acc@1 56.54 ( 60.11) Acc@5 79.59 ( 81.28)
Epoch: [48][ 80/625] Time 1.346 ( 1.401) Data 0.000 ( 0.133) Loss -2.3125e+00 (-2.0805e+00) Acc@1 57.71 ( 60.05) Acc@5 79.44 ( 81.26)
Epoch: [48][ 90/625] Time 1.224 ( 1.386) Data 0.000 ( 0.119) Loss -2.1656e+00 (-2.0629e+00) Acc@1 58.15 ( 59.98) Acc@5 80.13 ( 81.25)
Epoch: [48][100/625] Time 1.474 ( 1.375) Data 0.000 ( 0.107) Loss -2.3581e+00 (-2.0779e+00) Acc@1 58.06 ( 59.92) Acc@5 78.66 ( 81.22)
Epoch: [48][110/625] Time 1.215 ( 1.367) Data 0.000 ( 0.097) Loss -4.3465e+00 (-2.1967e+00) Acc@1 0.10 ( 56.28) Acc@5 0.59 ( 76.59)
Epoch: [48][120/625] Time 1.111 ( 1.357) Data 0.000 ( 0.089) Loss -4.5741e+00 (-2.3879e+00) Acc@1 0.15 ( 51.64) Acc@5 0.39 ( 70.31)
Epoch: [48][130/625] Time 1.152 ( 1.350) Data 0.000 ( 0.083) Loss -4.5668e+00 (-2.5548e+00) Acc@1 0.05 ( 47.71) Acc@5 0.44 ( 64.99)
Epoch: [48][140/625] Time 1.232 ( 1.346) Data 0.000 ( 0.077) Loss -4.5077e+00 (-2.6952e+00) Acc@1 0.29 ( 44.34) Acc@5 0.88 ( 60.44)
Epoch: [48][150/625] Time 1.219 ( 1.341) Data 0.000 ( 0.072) Loss -4.4784e+00 (-2.8117e+00) Acc@1 0.34 ( 41.41) Acc@5 1.12 ( 56.49)
Epoch: [48][160/625] Time 1.274 ( 1.338) Data 0.000 ( 0.067) Loss -4.2812e+00 (-2.9086e+00) Acc@1 0.34 ( 38.86) Acc@5 0.93 ( 53.04)

What's more, the training loss is always negative, is that right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.