Comments (6)
Hi @HieuPhan33 ,
Are you using an equivalent batch size of 1024? Could you try a smaller batch size like the default one (I usually use ~100, and found this to be important)?
In addition, when I reproduced the result, I also sometimes (but generally rarely) get <93%, which is part of the fluctuation. If you still encounter the issue, you can also reach me via email and I can send you a sample training log for you to compare... I believe you should expect ~92% after 100 epochs already.
from deq.
Hi @jerrybai1995, thanks for quick response. I will reduce the batch size and keep you updated.
Thumbs up.
from deq.
Hi, I achieved 92.30% when using a batch size of 128.
Would you have any advice to continue to increase the accuracy to ~93% as expected?
from deq.
Hmmm, 92.3% still sounds too low to me for the given default parameters (my logs are usually in the range 92.6% - 93.4%). Could you try increasing f_thres (e.g., 9) and b_thres (e.g., 8 or 9) in the yaml file and using the default batch size? I also think that increasing the momentum (e.g., to 0.99) would improve the performance but I believe you should be able to reproduce the ~93% level performance even without tuning these things.
I'll look into this but in case you might find it useful, feel free to contact me ([email protected]) and I'll send you some training logs.
from deq.
Hi @HieuPhan33 ,
I was able to produce 93.04% and 92.78% on two (slightly different and) independent runs, basically with the modifications/settings mentioned above. E.g., I got 93.04% from the following yaml:
GPUS: (0,)
LOG_DIR: 'log/'
DATA_DIR: ''
OUTPUT_DIR: 'output/'
WORKERS: 2
PRINT_FREQ: 100
MODEL:
NAME: mdeq
NUM_LAYERS: 8
NUM_CLASSES: 10
NUM_GROUPS: 8
DROPOUT: 0.22
WNORM: true
DOWNSAMPLE_TIMES: 0
EXPANSION_FACTOR: 5
POST_GN_AFFINE: false
IMAGE_SIZE:
- 32
- 32
EXTRA:
FULL_STAGE:
NUM_MODULES: 1
NUM_BRANCHES: 4
BLOCK: BASIC
BIG_KERNELS:
- 0
- 0
- 0
- 0
HEAD_CHANNELS:
- 14
- 28
- 56
- 112
FINAL_CHANSIZE: 1680
NUM_BLOCKS:
- 1
- 1
- 1
- 1
NUM_CHANNELS:
- 32
- 64
- 128
- 256
FUSE_METHOD: SUM
DEQ:
F_SOLVER: 'broyden'
B_SOLVER: 'broyden'
STOP_MODE: 'rel'
F_THRES: 8
B_THRES: 7
RAND_F_THRES_DELTA: 1
SPECTRAL_RADIUS_MODE: false
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
LOSS:
JAC_LOSS_FREQ: 0.02
JAC_LOSS_WEIGHT: 0.4
PRETRAIN_JAC_LOSS_WEIGHT: 0.0
JAC_STOP_EPOCH: 90
DATASET:
DATASET: 'cifar10'
DATA_FORMAT: 'jpg'
ROOT: 'data/cifar10/'
TEST_SET: 'val'
TRAIN_SET: 'train'
TEST:
BATCH_SIZE_PER_GPU: 96
MODEL_FILE: ''
TRAIN:
BATCH_SIZE_PER_GPU: 96
BEGIN_EPOCH: 0
END_EPOCH: 220
RESUME: false
LR_SCHEDULER: 'cosine'
PRETRAIN_STEPS: 12000
LR_FACTOR: 0.1
LR_STEP:
- 30
- 60
- 90
OPTIMIZER: adam
LR: 0.001
WD: 0.0
MOMENTUM: 0.99
NESTEROV: true
SHUFFLE: true
DEBUG:
DEBUG: false
Hope this helps!
from deq.
Thanks Shaojie, really appreciate your help!
from deq.
Related Issues (20)
- Two slightly different process for Deq HOT 2
- Segmentation Fault when Loss Backward CIFAR cls_mdeq_LARGE_reg HOT 10
- Test ImageNet Pre-trained Model HOT 10
- Segmentation fault after removing hook HOT 3
- RuntimeError: einsum(): the number of subscripts in the equation (3) does not match the number of dimensions (4) for operand 0 and no ellipsis was given HOT 1
- DEQ for Vision Transformer HOT 2
- Memory consumption on CIFAR-10 HOT 4
- I'd like to ask if anderson can't be used normally sometimes HOT 11
- Does MDEQ have different inference results for different batch sizes? HOT 6
- Expected a 'cuda' device type for generator (related to speed issues?) HOT 5
- RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment HOT 2
- Question about Remove Hook HOT 6
- Higher order derivatives
- UnboundLocalError: local variable 'lowest_xest' referenced before assignment HOT 4
- Broyden defeats the purpose of DEQs? HOT 6
- UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown HOT 4
- Expected a 'cuda' device type for generator but found 'cpu' HOT 2
- Mismatch between a pretrained ImageNet model and a config file HOT 1
- Hyperparameters for MDEQ-XL on ImageNet
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deq.