I'm runnin the default run (python main.py fedavg config/template.yml) . I'm getting t

why after fine tunning accuracy shows 0% ?? about fl-bench HOT 10 CLOSED

Elhamnazari1372 commented on July 17, 2024

why after fine tunning accuracy shows 0% ??

from fl-bench.

Comments (10)

KarhouTam commented on July 17, 2024

Format: (before local fine-tuning) -> (after local fine-tuning) So if finetune_epoch = 0, x.xx% -> 0.00% is normal.

☝ finetune_epoch is set to 0 in template.yml

FL-bench/config/template.yml

Line 24 in b19d935

finetune_epoch: 0

from fl-bench.

KarhouTam commented on July 17, 2024

This issue is closed due to long time no response.

from fl-bench.

Elhamnazari1372 commented on July 17, 2024

I changed as your recommend but got the same results. Seems still not running the finetune.

from fl-bench.

KarhouTam commented on July 17, 2024

Sorry for my late respone. What's your run command? If you set finetune_epoch, you need to specify the config file in the command like python main.py fedavg your_config.yml

from fl-bench.

Elhamnazari1372 commented on July 17, 2024

I use the same command as you mentioned . my config is :

Full explaination are listed on README.md

mode: parallel # [serial, parallel]

parallel: # It's fine to keep these configs.

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

ray_cluster_addr: null # [null, auto, local]

`null` implies that all cpus/gpus are included.

num_cpus: null
num_gpus: null

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

num_workers: 2

common:
dataset: mnist
seed: 42
model: lenet5
join_ratio: 0.1
global_epoch: 100
local_epoch: 5
finetune_epoch: 20
batch_size: 32
test_interval: 100
straggler_ratio: 0
straggler_min_local_epoch: 0
external_model_params_file: ""
optimizer:
name: sgd # [sgd, adam, adamw, rmsprop, adagrad]
lr: 0.01
dampening: 0 # SGD
weight_decay: 0
momentum: 0 # [SGD, RMSprop]
alpha: 0.99 # RMSprop
nesterov: false # SGD
betas: [0.9, 0.999] # [Adam, AdamW]
amsgrad: false # [Adam, AdamW]

lr_scheduler:
name: step # null for deactivating
step_size: 10

eval_test: true
eval_val: false
eval_train: false

verbose_gap: 10
visible: false
use_cuda: true
save_log: true
save_model: false
save_fig: true
save_metrics: true
check_convergence: true

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

fedprox:
mu: 0.01
pfedsim:
warmup_round: 0.7

...

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

from fl-bench.

KarhouTam commented on July 17, 2024

I tested on my workspace and everything is fine.

Here is the result, config, commands to reproduce it:

Result

==================== FedAvg Experiment Results: ====================                                                                                                                                                      
Format: (before local fine-tuning) -> (after local fine-tuning) So if finetune_epoch = 0, x.xx% -> 0.00% is normal.                                                                                                       
{100: {'all_clients': {'test': {'loss': '0.3364 -> 0.3116', 'accuracy': '91.44% -> 92.18%'}}}}                                                                                                                            
========== FedAvg Convergence on train clients ==========                                                                                                                                                                 
test (before local training):                                                                                                                                                                                             
10.0%(11.65%) at epoch: 0                                                                                                                                                                                                 
20.0%(27.31%) at epoch: 3                                                                                                                                                                                                 
30.0%(35.33%) at epoch: 4                                                                                                                                                                                                 
40.0%(47.46%) at epoch: 5                                                                                                                                                                                                 
60.0%(63.21%) at epoch: 7                                                                                                                                                                                                 
70.0%(75.43%) at epoch: 9                                                                                                                                                                                                 
80.0%(86.50%) at epoch: 18                                                                                                                                                                                                
90.0%(90.34%) at epoch: 37                                                                                                                                                                                                
test (after local training):                                                                                                                                                                                              
80.0%(82.13%) at epoch: 0                                                                                                                                                                                                 
90.0%(91.06%) at epoch: 1                                                                                                                                                                                                 
==================== FedAvg Max Accuracy ====================                                                                                                                                                             
all_clients:                                                                                                                                                                                                              
(test) before fine-tuning: 91.44% at epoch 100                                                                                                                                                                            
(test) after fine-tuning: 92.18% at epoch 100

Config

# cfg.yml
mode: parallel # [serial, parallel]

parallel: # It's fine to keep these configs.
  # Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.
  ray_cluster_addr: null # [null, auto, local]
  
  # `null` implies that all cpus/gpus are included.
  num_cpus: null
  num_gpus: null

  # should be set larger than 1, or training mode fallback to `serial`
  # Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.
  num_workers: 2

common:
  dataset: mnist
  seed: 42
  model: lenet5
  join_ratio: 0.1
  global_epoch: 100
  local_epoch: 5
  finetune_epoch: 5
  batch_size: 32
  test_interval: 100
  straggler_ratio: 0
  straggler_min_local_epoch: 0
  external_model_params_file: ""
  buffers: local # [local, global, drop]
  optimizer:
    name: sgd # [sgd, adam, adamw, rmsprop, adagrad]
    lr: 0.01
    dampening: 0 # SGD
    weight_decay: 0
    momentum: 0 # [SGD, RMSprop]
    alpha: 0.99 # RMSprop
    nesterov: false # SGD
    betas: [0.9, 0.999] # [Adam, AdamW]
    amsgrad: false # [Adam, AdamW]

  lr_scheduler:
    name: step # null for deactivating
    step_size: 10

  eval_test: true
  eval_val: false
  eval_train: false

  verbose_gap: 10
  visible: false
  use_cuda: true
  save_log: true
  save_model: false
  save_fig: true
  save_metrics: true
  check_convergence: true

# You can set specific arguments for FL methods also
# FL-bench uses FL method arguments by args.<method>.<arg>
# e.g.
fedprox:
  mu: 0.01
pfedsim:
  warmup_round: 0.7
# ...

# NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

Commands

python generate_data.py -d mnist -a 0.1 -cn 100
python main.py fedavg cfg.yml

from fl-bench.

Elhamnazari1372 commented on July 17, 2024

thanks for your response . could I ask you what config I can use for resnet18 and cifar10 to get the best accuracy?

from fl-bench.

KarhouTam commented on July 17, 2024

There are tons of variables that can affect the final accuracy. Sorry I can't tell you the optimal config.

from fl-bench.

Elhamnazari1372 commented on July 17, 2024

is there a config that you used and got a reasonable response?
thanks

from fl-bench.

KarhouTam commented on July 17, 2024

Just try it yourself.

from fl-bench.

why after fine tunning accuracy shows 0% ?? about fl-bench HOT 10 CLOSED

Comments (10)

Full explaination are listed on README.md

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

`null` implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

...

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

Result

Config

Commands

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

Comments (10)

Full explaination are listed on README.md

Go check doc of https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html for more details.

null implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to serial

Set a larger num_workers can further boost efficiency, also let each worker have less computational resources.

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

...

NOTE: For those unmentioned arguments, the default values are set in get_<method>_args() in src/server/<method>.py

Result

Config

Commands

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

`null` implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`