Hi Shaojie, I could not reproduce the result for MDEQ on CIFAR-10 image classifica

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

CIFAR-10 Reproduction about deq HOT 6 CLOSED

locuslab commented on July 21, 2024

CIFAR-10 Reproduction

from deq.

Comments (6)

jerrybai1995 commented on July 21, 2024

Hi @HieuPhan33 ,

Are you using an equivalent batch size of 1024? Could you try a smaller batch size like the default one (I usually use ~100, and found this to be important)?

In addition, when I reproduced the result, I also sometimes (but generally rarely) get <93%, which is part of the fluctuation. If you still encounter the issue, you can also reach me via email and I can send you a sample training log for you to compare... I believe you should expect ~92% after 100 epochs already.

from deq.

HieuPhan33 commented on July 21, 2024

Hi @jerrybai1995, thanks for quick response. I will reduce the batch size and keep you updated.
Thumbs up.

from deq.

HieuPhan33 commented on July 21, 2024

Hi, I achieved 92.30% when using a batch size of 128.
Would you have any advice to continue to increase the accuracy to ~93% as expected?

from deq.

jerrybai1995 commented on July 21, 2024

Hmmm, 92.3% still sounds too low to me for the given default parameters (my logs are usually in the range 92.6% - 93.4%). Could you try increasing f_thres (e.g., 9) and b_thres (e.g., 8 or 9) in the yaml file and using the default batch size? I also think that increasing the momentum (e.g., to 0.99) would improve the performance but I believe you should be able to reproduce the ~93% level performance even without tuning these things.

I'll look into this but in case you might find it useful, feel free to contact me ([email protected]) and I'll send you some training logs.

from deq.

jerrybai1995 commented on July 21, 2024

Hi @HieuPhan33 ,

I was able to produce 93.04% and 92.78% on two (slightly different and) independent runs, basically with the modifications/settings mentioned above. E.g., I got 93.04% from the following yaml:

GPUS: (0,)
LOG_DIR: 'log/'
DATA_DIR: ''
OUTPUT_DIR: 'output/'
WORKERS: 2
PRINT_FREQ: 100

MODEL: 
  NAME: mdeq
  NUM_LAYERS: 8
  NUM_CLASSES: 10
  NUM_GROUPS: 8
  DROPOUT: 0.22
  WNORM: true
  DOWNSAMPLE_TIMES: 0
  EXPANSION_FACTOR: 5
  POST_GN_AFFINE: false
  IMAGE_SIZE: 
    - 32
    - 32
  EXTRA:
    FULL_STAGE:
      NUM_MODULES: 1
      NUM_BRANCHES: 4
      BLOCK: BASIC
      BIG_KERNELS:
      - 0
      - 0
      - 0
      - 0
      HEAD_CHANNELS:
      - 14
      - 28
      - 56
      - 112
      FINAL_CHANSIZE: 1680
      NUM_BLOCKS:
      - 1
      - 1
      - 1
      - 1
      NUM_CHANNELS:
      - 32
      - 64
      - 128
      - 256
      FUSE_METHOD: SUM
DEQ:
  F_SOLVER: 'broyden'
  B_SOLVER: 'broyden'
  STOP_MODE: 'rel'
  F_THRES: 8
  B_THRES: 7
  RAND_F_THRES_DELTA: 1
  SPECTRAL_RADIUS_MODE: false
CUDNN:
  BENCHMARK: true
  DETERMINISTIC: false
  ENABLED: true
LOSS:
  JAC_LOSS_FREQ: 0.02
  JAC_LOSS_WEIGHT: 0.4
  PRETRAIN_JAC_LOSS_WEIGHT: 0.0
  JAC_STOP_EPOCH: 90
DATASET:
  DATASET: 'cifar10'
  DATA_FORMAT: 'jpg'
  ROOT: 'data/cifar10/'
  TEST_SET: 'val'
  TRAIN_SET: 'train'
TEST:
  BATCH_SIZE_PER_GPU: 96
  MODEL_FILE: ''
TRAIN:
  BATCH_SIZE_PER_GPU: 96
  BEGIN_EPOCH: 0
  END_EPOCH: 220
  RESUME: false
  LR_SCHEDULER: 'cosine'
  PRETRAIN_STEPS: 12000
  LR_FACTOR: 0.1
  LR_STEP:
  - 30
  - 60
  - 90
  OPTIMIZER: adam
  LR: 0.001
  WD: 0.0
  MOMENTUM: 0.99
  NESTEROV: true
  SHUFFLE: true
DEBUG:
  DEBUG: false

Hope this helps!

from deq.

HieuPhan33 commented on July 21, 2024

Thanks Shaojie, really appreciate your help!

from deq.

CIFAR-10 Reproduction about deq HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs