wwzzz / easyfl Goto Github PK

View Code? Open in Web Editor NEW

509.0 509.0 85.0 12.85 MB

An experimental platform for federated learning.

License: Apache License 2.0

Python 10.36% Jupyter Notebook 89.63% Dockerfile 0.01%

deep-learning distributed-computing fairness federated-learning machine-learning pytorch

easyfl's People

Contributors

Stargazers

Watchers

easyfl's Issues

fedfv运行报错

您好，我用了以下代码
python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm fedfv --num_rounds 3 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1
遇到报错

请问如何修复这个bug？十分感谢！

module 'flgo.simulator' has no attribute 'base'

你好，按照0_Quick_Start.ipynb执行的时候，在Start training with FedAvg部分报错如下，有什么解决方法吗？

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 fedavg_runner = flgo.init(task=task, algorithm=fedavg, option={'num_rounds':5, 'num_epochs':1, 'gpu':0})
      2 fedavg_runner.run()

File ~/miniconda3/lib/python3.8/site-packages/flgo/utils/fflow.py:482, in init(task, algorithm, option, model, Logger, Simulator, scene)
    480 gv.logger.info('Use `{}` as the system simulator'.format(str(Simulator)))
    481 # flgo.simulator.base.random_seed_gen = flgo.simulator.base.seed_generator(option['seed'])
--> 482 gv.clock = flgo.simulator.base.ElemClock()
    483 gv.simulator = Simulator(objects, option)
    484 gv.clock.register_simulator(simulator=gv.simulator)

AttributeError: module 'flgo.simulator' has no attribute 'base'

使用自己的特殊格式数据集训练

非常感谢作者提供的框架。如果使用自己创建的数据集，比如特殊领域的格式，我应该注意哪些问题？比如数据加载、数据分配等等。目前在框架上实现自己的联邦任务还没有成功。期待您的回复！wx二维码已经过期等待更新~

使用Efficientnet-b0导致qfedavg失效

感谢您的联邦框架！！非常简洁并且方便移植！！！
不过有一个问题想麻烦您回答，当我将model换成efficientnet-b0，在cifar10数据集上使用qfedavg、fedfv、fedprox时，会出现自始至终loss不变的问题，这是我设定的model
from torch import nn
from flgo.utils.fmodule import FModule
from efficientnet_pytorch import EfficientNet

class Model(FModule):
def init(self):
super(Model, self).init()
pretrained = True
self.base_model = (
EfficientNet.from_pretrained("efficientnet-b0")
if pretrained
else EfficientNet.from_name("efficientnet-b0")
)
# self.base_model=torchvision.models.efficientnet_v2_s(pretrained=pretrained)
nftrs = self.base_model._fc.in_features
# print("Number of features output by EfficientNet", nftrs)
self.base_model._fc = nn.Linear(nftrs, 10)

def forward(self, x):
    # Convolution layers
    x = self.base_model.extract_features(x)
    # Pooling and final linear layer
    feature_x = self.base_model._avg_pooling(x)
    if self.base_model._global_params.include_top:
        x = feature_x.flatten(start_dim=1)
        x = self.base_model._dropout(x)
        x = self.base_model._fc(x)
    return x

def init_local_module(object):
pass

def init_global_module(object):
if 'Server' in object.class.name:
object.model = Model().to(object.device)
会出现这样的结果

模型训练出现的问题

非常感谢您的工作，但是我在复现您代码过程中，常会遇到如下问题，在模型迭代过程中，accuracy 会突然变成0，以及loss会趋近于Nan。我想知道出现这个问题的原因是什么。
{"option": {"sample": "md", "aggregate": "uniform", "num_rounds": 100, "proportion": 0.6, "learning_rate_decay": 0.998, "lr_scheduler": -1, "early_stop": -1, "num_epochs": 2, "num_steps": -1, "learning_rate": 0.1, "batch_size": 64.0, "optimizer": "SGD", "clip_grad": 0.0, "momentum": 0.0, "weight_decay": 0.0, "num_edge_rounds": 5, "algo_para": [], "train_holdout": 0.1, "test_holdout": 0.0, "local_test": false, "seed": 0, "gpu": [0], "server_with_cpu": false, "num_parallels": 1, "num_workers": 0, "pin_memory": false, "test_batch_size": 512, "availability": "IDL", "connectivity": "IDL", "completeness": "IDL", "responsiveness": "IDL", "log_level": "INFO", "log_file": true, "no_log_console": false, "no_overwrite": false, "eval_interval": 1, "task": "./my_task", "algorithm": "fedprox", "model": "cnn"}
`

2023-08-11 12:33:58,111 fedbase.py run [line:246] INFO --------------Round 80--------------
2023-08-11 12:33:58,111 simple_logger.py log_once [line:14] INFO Current_time:80
2023-08-11 12:34:01,120 simple_logger.py log_once [line:28] INFO test_accuracy 0.5547
2023-08-11 12:34:01,120 simple_logger.py log_once [line:28] INFO test_loss 1.3093
2023-08-11 12:34:01,120 simple_logger.py log_once [line:28] INFO val_accuracy 0.5546
2023-08-11 12:34:01,121 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.5612
2023-08-11 12:34:01,121 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.1447
2023-08-11 12:34:01,121 simple_logger.py log_once [line:28] INFO val_loss 1.3208
2023-08-11 12:34:01,121 simple_logger.py log_once [line:28] INFO mean_val_loss 1.2945
2023-08-11 12:34:01,121 simple_logger.py log_once [line:28] INFO std_val_loss 0.4200
2023-08-11 12:34:01,121 fedbase.py run [line:251] INFO Eval Time Cost: 3.0099s
2023-08-11 12:34:14,004 fedbase.py run [line:246] INFO --------------Round 81--------------
2023-08-11 12:34:14,004 simple_logger.py log_once [line:14] INFO Current_time:81
2023-08-11 12:34:17,050 simple_logger.py log_once [line:28] INFO test_accuracy 0.5408
2023-08-11 12:34:17,050 simple_logger.py log_once [line:28] INFO test_loss 1.4533
2023-08-11 12:34:17,050 simple_logger.py log_once [line:28] INFO val_accuracy 0.5417
2023-08-11 12:34:17,050 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.5378
2023-08-11 12:34:17,050 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.1402
2023-08-11 12:34:17,051 simple_logger.py log_once [line:28] INFO val_loss 1.4877
2023-08-11 12:34:17,051 simple_logger.py log_once [line:28] INFO mean_val_loss 1.4758
2023-08-11 12:34:17,051 simple_logger.py log_once [line:28] INFO std_val_loss 0.5131
2023-08-11 12:34:17,051 fedbase.py run [line:251] INFO Eval Time Cost: 3.0476s
2023-08-11 12:34:29,085 fedbase.py run [line:246] INFO --------------Round 82--------------
2023-08-11 12:34:29,085 simple_logger.py log_once [line:14] INFO Current_time:82
2023-08-11 12:34:32,096 simple_logger.py log_once [line:28] INFO test_accuracy 0.5482
2023-08-11 12:34:32,096 simple_logger.py log_once [line:28] INFO test_loss 1.3859
2023-08-11 12:34:32,096 simple_logger.py log_once [line:28] INFO val_accuracy 0.5411
2023-08-11 12:34:32,096 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.5413
2023-08-11 12:34:32,097 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.1292
2023-08-11 12:34:32,097 simple_logger.py log_once [line:28] INFO val_loss 1.4131
2023-08-11 12:34:32,097 simple_logger.py log_once [line:28] INFO mean_val_loss 1.4190
2023-08-11 12:34:32,097 simple_logger.py log_once [line:28] INFO std_val_loss 0.4560
2023-08-11 12:34:32,097 fedbase.py run [line:251] INFO Eval Time Cost: 3.0121s
2023-08-11 12:34:44,857 fedbase.py run [line:246] INFO --------------Round 83--------------
2023-08-11 12:34:44,858 simple_logger.py log_once [line:14] INFO Current_time:83
2023-08-11 12:34:47,908 simple_logger.py log_once [line:28] INFO test_accuracy 0.1000
2023-08-11 12:34:47,908 simple_logger.py log_once [line:28] INFO test_loss nan
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO val_accuracy 0.0996
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.0760
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.2083
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO val_loss nan
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO mean_val_loss nan
2023-08-11 12:34:47,909 simple_logger.py log_once [line:28] INFO std_val_loss nan
2023-08-11 12:34:47,909 fedbase.py run [line:251] INFO Eval Time Cost: 3.0507s
2023-08-11 12:35:00,954 fedbase.py run [line:246] INFO --------------Round 84--------------
2023-08-11 12:35:00,954 simple_logger.py log_once [line:14] INFO Current_time:84
2023-08-11 12:35:03,999 simple_logger.py log_once [line:28] INFO test_accuracy 0.1000
2023-08-11 12:35:03,999 simple_logger.py log_once [line:28] INFO test_loss nan
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO val_accuracy 0.0996
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.0760
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.2083
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO val_loss nan
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO mean_val_loss nan
2023-08-11 12:35:04,000 simple_logger.py log_once [line:28] INFO std_val_loss nan
2023-08-11 12:35:04,000 fedbase.py run [line:251] INFO Eval Time Cost: 3.0459s
2023-08-11 12:35:14,175 fedbase.py run [line:246] INFO --------------Round 85--------------
2023-08-11 12:35:14,175 simple_logger.py log_once [line:14] INFO Current_time:85
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO test_accuracy 0.1000
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO test_loss nan
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO val_accuracy 0.0996
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.0760
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.2083
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO val_loss nan
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO mean_val_loss nan
2023-08-11 12:35:17,211 simple_logger.py log_once [line:28] INFO std_val_loss nan
2023-08-11 12:35:17,211 fedbase.py run [line:251] INFO Eval Time Cost: 3.0360s
2023-08-11 12:35:31,371 fedbase.py run [line:246] INFO --------------Round 86--------------
2023-08-11 12:35:31,371 simple_logger.py log_once [line:14] INFO Current_time:86
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO test_accuracy 0.1000
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO test_loss nan
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO val_accuracy 0.0996
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO mean_val_accuracy 0.0760
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO std_val_accuracy 0.2083
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO val_loss nan
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO mean_val_loss nan
2023-08-11 12:35:34,401 simple_logger.py log_once [line:28] INFO std_val_loss nan
2023-08-11 12:35:34,401 fedbase.py run [line:251] INFO Eval Time Cost: 3.0308s

"Failed to saving splited dataset" when run the quickstart

Hi, when I run the command "python generate_fedtask.py --benchmark mnist_classification --dist 0 --skew 0 --num_clients 100", it shows "Failed to saving splited dataset". Any idea about it? Thank you!

fmodule.py文件

您好，看到您最近在更新代码，刚刚发现fmodule.py里第90行出现的lossfunc好像没有初始化？冒昧打扰了

多设备真机实战

作者你好，我想在多个设备上进行联邦学习，在server和client通信这一块没想清楚怎么弄。我看了您的代码是用ZreoMQ实现的，每种通信方式都有，但是我没看太懂，所以想请教以下您的通信逻辑是怎么样的？

客户端数量设置问题

你好，当客户端shu'lian数量设置到800时就会出现报错，请问怎么解决

添加自定义optimizer

如果想添加自定义的optimizer需要在哪些文件修改呢？看的有点乱了，希望作者大大能解答一下，十分感谢！

你好，运行修改后的fedfa.py，每次到round4就报错

PS D:\easyFL> python main.py --task mnist_classification_cnum10_dist0_skew0_seed0 --model cnn --algorithm fedfa --num_rounds 20 --num_epoch 5 --learning_rate 0.001 --proportion 0.1 --batch_size 10 --eval_interval 1
2023-01-30 11:43:50,875 fflow.py initialize [line:92] INFO Using Logger in utils.logger.basic_logger
2023-01-30 11:43:50,875 fflow.py initialize [line:93] INFO Initializing fedtask: mnist_classification_cnum10_dist0_skew0_seed0
2023-01-30 11:43:51,242 fflow.py initialize [line:106] INFO Using model cnn in benchmark.mnist_classification.model.cnn as the globally shared model.
2023-01-30 11:43:51,242 fflow.py initialize [line:120] INFO No server-specific model is used.
2023-01-30 11:43:51,242 fflow.py initialize [line:132] INFO No client-specific model is used.
2023-01-30 11:43:51,242 fflow.py initialize [line:138] INFO Initializing devices: cpu will be used for this running.
2023-01-30 11:43:51,259 fflow.py initialize [line:157] INFO Initializing Clients: 10 clients of algorithm.fedfa.Client being created.
2023-01-30 11:43:51,259 fflow.py initialize [line:163] INFO Initializing Server: 1 server of algorithm.fedfa.Server being created.
2023-01-30 11:43:51,276 fflow.py initialize [line:168] INFO Initializing Systemic Heterogeneity: Availability IDL
2023-01-30 11:43:51,276 fflow.py initialize [line:169] INFO Initializing Systemic Heterogeneity: Connectivity IDL
2023-01-30 11:43:51,276 fflow.py initialize [line:170] INFO Initializing Systemic Heterogeneity: Completeness IDL
2023-01-30 11:43:51,276 fflow.py initialize [line:171] INFO Initializing Systemic Heterogeneity: Timeliness IDL
2023-01-30 11:43:51,276 fflow.py initialize [line:176] INFO Ready to start.
2023-01-30 11:43:51,276 fedbase.py run [line:59] INFO --------------Round 1--------------
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.0972
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO test_loss 2.3085
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.0944
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO train_loss 2.3092
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.0959
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.0959
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.0063
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO valid_loss 2.3088
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 2.3088
2023-01-30 11:44:53,102 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.0013
2023-01-30 11:44:53,102 basic_logger.py time_end [line:74] INFO Eval Time Cost: 61.8260s
2023-01-30 11:46:29,294 basic_logger.py time_end [line:74] INFO Time Cost: 158.0185s
2023-01-30 11:46:29,294 fedbase.py run [line:59] INFO --------------Round 2--------------
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.9679
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO test_loss 0.0991
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.9648
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO train_loss 0.1150
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.9637
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.9637
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.0024
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO valid_loss 0.1163
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 0.1163
2023-01-30 11:47:36,910 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.0086
2023-01-30 11:47:36,910 basic_logger.py time_end [line:74] INFO Eval Time Cost: 67.6151s
2023-01-30 11:49:15,304 basic_logger.py time_end [line:74] INFO Time Cost: 166.0097s
2023-01-30 11:49:15,304 fedbase.py run [line:59] INFO --------------Round 3--------------
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.9797
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO test_loss 0.0635
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.9788
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO train_loss 0.0719
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.9777
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.9777
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.0038
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO valid_loss 0.0733
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 0.0733
2023-01-30 11:50:19,216 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.0108
2023-01-30 11:50:19,216 basic_logger.py time_end [line:74] INFO Eval Time Cost: 63.9115s
2023-01-30 11:52:04,157 basic_logger.py time_end [line:74] INFO Time Cost: 168.8528s
2023-01-30 11:52:04,158 fedbase.py run [line:59] INFO --------------Round 4--------------
2023-01-30 11:53:14,734 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.9819
2023-01-30 11:53:14,734 basic_logger.py show_current_output [line:134] INFO test_loss 0.0556
2023-01-30 11:53:14,734 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.9825
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO train_loss 0.0574
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.9820
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.9820
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.0042
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO valid_loss 0.0634
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 0.0634
2023-01-30 11:53:14,736 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.0124
2023-01-30 11:53:14,736 basic_logger.py time_end [line:74] INFO Eval Time Cost: 70.5784s
2023-01-30 11:53:14,736 main.py main [line:16] ERROR Exception Logged
Traceback (most recent call last):
File "main.py", line 13, in main
server.run()
File "D:\easyFL\algorithm\fedbase.py", line 68, in run
self.iterate()
File "D:\easyFL\algorithm\fedfa.py", line 36, in iterate
dw = wnew -self.model
TypeError: unsupported operand type(s) for -: 'NoneType' and 'Model'
Traceback (most recent call last):
File "main.py", line 13, in main
server.run()
File "D:\easyFL\algorithm\fedbase.py", line 68, in run
self.iterate()
File "D:\easyFL\algorithm\fedfa.py", line 36, in iterate
dw = wnew -self.model
TypeError: unsupported operand type(s) for -: 'NoneType' and 'Model'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 22, in
main()
File "main.py", line 17, in main
raise RuntimeError
RuntimeError

Simulator问题

根据您在4.0真实环境模拟中的设备相应异构的RSsimulator代码实现问题。
代码运行有以下几点问题：
1、运行结果和不使用simulator的运行时间无差异，所以我看来应该是flgo\system_simulator\base.py的BasicSimulator的初始化__init__中没有加入initialize和update_client_responsiveness函数，这个我后来添加了。
2、对于

中的代码报错信息是self.clients是列表类型，无法使用keys调用。
感谢解惑

想问下result_analysis.py的参数设置

我看到有这段代码
_default_option = {
'plot': ['linestyle', 'marker', 'linewidth', 'markersize',],
'scatter': ['s', 'cmap', 'alpha', 'color',],
'group_plot': ['linestyle', 'marker', 'linewidth', 'markersize',],
}
请问可以设置参数来group_plot并设置样式嘛

如何采用模型压缩

希望对梯度进行量化、稀疏应该对哪部分进行处理呢？

梯度加密问题

希望调用同态加密之类的密码学算法实现梯度加密，在这个框架的那部分进行修改能够实现？

运行result_analysis.py的报错

Traceback (most recent call last): File "D:/zyz/paper2/code/utils/result_analysis.py", line 505, in <module> drawer.draw(cfg['ploter']) File "D:/zyz/paper2/code/utils/result_analysis.py", line 251, in draw f(plot_obj) File "D:/zyz/paper2/code/utils/result_analysis.py", line 329, in bar data = [d[plot_obj['y']] for d in self.rec_dicts] File "D:/zyz/paper2/code/utils/result_analysis.py", line 329, in <listcomp> data = [d[plot_obj['y']] for d in self.rec_dicts] KeyError: 'client_datavol'

Fashion数据集的core.py程序报错，无法生成任务数据集

使用fashion_mnist数据集进行fedmgda+训练有bug

首先赞美大佬给予好用的联邦学习框架赐福！

我在用fedmgda+算法进行fashion_mnist任务训练时出现了error，具体如下：

其中fedmgda+算法是根据大佬的教程复制粘贴过去的，没有有什么改动。数据分布是每个client只有一类数据，如图：

另外其他参数设置是
option_batch_size_10 = {'learning_rate': 0.01, 'num_steps': 1, 'num_rounds': 500, 'gpu': 1, 'batch_size': 10,
'proportion':0.1, 'seed': 0}

经过之前一系列测试，是经过标准化（gi.normalize()）函数后出现了nan值，应该是标准化除以0了。

希望大佬早日修好bug，在做算法实验了所以比较急。

最后再次赞美大佬！

Running on AMD GPU

Hi Dear,

thanks for your effort.
when I am trying to run on AMD readon gpu it give me the following error.
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

it seems that it expect nvidia only while if clear --gpu 0 option i can run it on cpu.

the following comman I ran on ubuntu 20 :
$ python3 main.py --task grasping_classification_cnum10_dist0_skew0_seed0 --optimizer Adam --num_epochs 10 --algorithm centralize --model cnn --pre squeezenet1_0 --fields "cr_" --learning_rate 0.001 --batch_size 16 --gpu 0 --lr_scheduler 0 --logger simple_logger

all running output s below:

023-03-31 15:33:36,200 fflow.py initialize [line:94] INFO Using Logger in algorithm.centralize
2023-03-31 15:33:36,200 fflow.py initialize [line:95] INFO Initializing fedtask: grasping_classification_cnum10_dist0_skew0_seed0
fields: ['tactileColorR']
fields: ['tactileColorR']
origin_class <class 'benchmark.grasping_classification.core.Grasping'>
origin_train_data <benchmark.grasping_classification.core.Grasping object at 0x7f78d1a11690>
origin_test_data <benchmark.grasping_classification.core.Grasping object at 0x7f78bfe026b0>
2023-03-31 15:33:39,320 fflow.py initialize [line:109] INFO Using model cnn in benchmark.grasping_classification.model.cnn as the globally shared model.
2023-03-31 15:33:39,321 fflow.py initialize [line:123] INFO No server-specific model is used.
2023-03-31 15:33:39,321 fflow.py initialize [line:135] INFO No client-specific model is used.
2023-03-31 15:33:39,321 fflow.py initialize [line:141] INFO Initializing devices: cuda:2 will be used for this running.
/home/omar/.local/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/omar/.local/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=SqueezeNet1_0_Weights.IMAGENET1K_V1. You can also use weights=SqueezeNet1_0_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "/home/omar/fl/easyFL/main.py", line 22, in
main()
File "/home/omar/fl/easyFL/main.py", line 10, in main
server = flw.initialize(option)
File "/home/omar/fl/easyFL/utils/fflow.py", line 146, in initialize
model = utils.fmodule.Model(fields=option['fields'], model_name=option['pre']).to(utils.fmodule.dev_list[0])
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/omar/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/omar/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Fedmask算法未复现

ReadMe中提到有Fedmask的复现，但是找不到对应的算法文件。

logger文件名非法

添加simulator之中的Staleness内容，比如在flgo.init()的option内添加'algo_para':[0, 200, 0]，同时采用默认logger，会报错文件名非法，因为logger文件名内出现了 | ，
所以在flgo\experiment\logger_init_.py进行了修改

在对应位置添加了一行代码：
filename = filename.replace("|", "-")，之后可以正常生成logger文件

迪利克雷函数受参数alpha的影响

首先感谢提供这么好的平台供大家使用，我看了下1.1_Configuration - Task里面Dirichlet函数的alpha应该是越大越趋向iid数据，越小越趋向Noniid，从生成的图像对比看也是。

你好，使用fedfa时报错

Traceback (most recent call last):
File "main.py", line 22, in
main()
File "main.py", line 10, in main
server = flw.initialize(option)
File "D:\easyFL\utils\fflow.py", line 159, in initialize
clients = [Client(option, name=client_names[cid], train_data=train_datas[cid], valid_data=valid_datas[cid]) for cid in range(num_clients)]
File "D:\easyFL\utils\fflow.py", line 159, in
clients = [Client(option, name=client_names[cid], train_data=train_datas[cid], valid_data=valid_datas[cid]) for cid in range(num_clients)]
File "D:\easyFL\algorithm\fedfa.py", line 49, in init
self.momentum = option ['gamma']
KeyError: 'gamma'

How to run MOON with customized models?

Since both the two attr 'encoder' and 'head' are required for the model used by MOON, and few existing benchmarks support this kind of settings, here I discuss how to customize models for MOON. One can specify the model by the keyword 'model' in flgo.init(...), and the passed model should be converted to the format that is suitable for FLGo by the API flgo.conver_model(model_class:nn.Module, model_name:str). An example is as follows. Other methods that have specific requirements on the model architectures or reguarization can be implemented in this way.

This function is updated in flgo-v0.1.4, please update the library by 'pip install --upgrade flgo' into the latest version before running the example.

Remark: the class Model should define the encoder and head.

import flgo
import flgo.benchmark.partition as fbp
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(1),
            nn.Linear(3136, 512),
            nn.ReLU(),
        )
        self.head = nn.Linear(512, 10)

    def forward(self, x):
        x = self.encoder(x)
        x = self.head(x)
        return x

model = flgo.convert_model(Model, 'mycnn')
moon = flgo.download_resource('.', 'moon', 'algorithm')
mnist = flgo.download_resource('.', 'mnist_classification', 'benchmark')
task = './my_task'
flgo.gen_task_by_(mnist, fbp.IIDPartitioner(num_clients=100), task)
runner = flgo.init(task, moon, option={"gpu":0, 'local_test':True, }, model=model)
runner.run()

**服务器发送给每个本地客户端的数据不一样，要怎么用这个框架实现呀，谢谢🙏

我看了涉及到通信相关的函数，是communicate()和communicate_with()函数，不知道要怎么改，感觉这两个函数，**服务器只能统一发送一致的数据给每个本地客户端，感觉不能根据不同的客户端发送不同的数据，请问能实现**服务器发送给每个本地客户端的数据不一样吗

一个小小的问题，fflow.py中read_option函数的gpu参数应该是list之类的可迭代的，而不是int型。

a fantastic framwork

如何设置参数使得保存模型最优权重的问题？

感谢作者提供的框架。当我运行FedAsync时，出现了一个错误，似乎是由于异步通信引起的

Traceback (most recent call last):
File "/home/archlab/lzr/easyFL-FLGo/main.py", line 31, in
runner.run()
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedbase.py", line 243, in run
updated = self.iterate()
File "/home/archlab/lzr/easyFL-FLGo/flgo/algorithm/fedasync.py", line 25, in iterate
res = self.communicate(self.selected_clients, asynchronous=True)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout
return communicate(self, selected_clients, mtype, asynchronous)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 481, in communicate_with_dropout
return communicate(self, selected_clients, mtype, asynchronous)
File "/home/archlab/lzr/easyFL-FLGo/flgo/simulator/base.py", line 573, in communicate_with_clock
if pkgs[0].get('__cid', None) is None:
IndexError: list index out of range

Fail to run with torch.multiprocessing

With argument num_threads > 1, I got this error:
AttributeError: 'Server' object has no attribute 'delayed_communicate_with'
Can someone help me? Thank you very much!

目前实现的算法都是在Client里deepcopy一个模型吗？

在复杂模型下开几十个客户端是不是就容易爆显存

DiversityPartitioner中缺少lmbalance项

class DiversityPartitioner(BasicPartitioner):
"""`Partition the indices of samples in the original dataset according to numbers of types of a particular
attribute (e.g. label) . This way of partition is widely used by existing works in federated learning.

Args:
    num_clients (int, optional): the number of clients
    diversity (float, optional): the ratio of locally owned types of the attributes (i.e. the actual number=diversity * total_num_of_types)
    imbalance (float, optional): the degree of imbalance of the amounts of different local_movielens_recommendation data (0<=imbalance<=1)
    index_func (int, optional): the index of the distribution-dependent (i.e. label) attribute in each sample.
"""
def __init__(self, num_clients=100, diversity=1.0, index_func=lambda X:[xi[-1] for xi in X]):
    self.num_clients = num_clients
    self.diversity = diversity
    self.index_func = index_func

貌似imbalance项没有在调用里

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

D:\anaconda\envs\pytorch\python.exe D:\博士论文\paper2\easyFL-main\main.py --task cifar10_classification_cnum20_dist4_skew0.7_seed0 --model resnet18 --aggregate other --algorithm scaffold --num_rounds 100 --num_epochs 1 --learning_rate 0.01 --proportion 0.6 --batch_size 16 --eval_interval 1 --gpu 0 2023-08-09 08:15:49,563 fflow.py initialize [line:92] INFO Using Logger in utils.logger.basic_logger2023-08-09 08:15:49,563 fflow.py initialize [line:93] INFO Initializing fedtask: cifar10_classification_cnum20_dist4_skew0.7_seed0 Files already downloaded and verified Files already downloaded and verified 2023-08-09 08:15:51,150 fflow.py initialize [line:106] INFO Using modelresnet18inbenchmark.cifar10_classification.model.resnet18as the globally shared model. 2023-08-09 08:15:51,150 fflow.py initialize [line:120] INFO No server-specific model is used. 2023-08-09 08:15:51,150 fflow.py initialize [line:132] INFO No client-specific model is used. 2023-08-09 08:15:51,150 fflow.py initialize [line:138] INFO Initializing devices: cuda:0 will be used for this running. 2023-08-09 08:15:52,339 fflow.py initialize [line:157] INFO Initializing Clients: 20 clients ofalgorithm.scaffold.Clientbeing created. 2023-08-09 08:15:52,339 fflow.py initialize [line:163] INFO Initializing Server: 1 server ofalgorithm.scaffold.Server` being created.
2023-08-09 08:15:52,413 fflow.py initialize [line:168] INFO Initializing Systemic Heterogeneity: Availability IDL
2023-08-09 08:15:52,413 fflow.py initialize [line:169] INFO Initializing Systemic Heterogeneity: Connectivity IDL
2023-08-09 08:15:52,413 fflow.py initialize [line:170] INFO Initializing Systemic Heterogeneity: Completeness IDL
2023-08-09 08:15:52,413 fflow.py initialize [line:171] INFO Initializing Systemic Heterogeneity: Timeliness IDL
2023-08-09 08:15:52,414 fflow.py initialize [line:176] INFO Ready to start.
2023-08-09 08:15:52,414 fedbase.py run [line:64] INFO --------------Round 1--------------
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.1000
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO test_loss 1.8656
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.1005
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO train_loss 1.8656
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.0980
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.1352
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.1431
2023-08-09 08:17:13,612 basic_logger.py show_current_output [line:134] INFO valid_loss 1.8651
2023-08-09 08:17:13,613 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 1.8586
2023-08-09 08:17:13,613 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.0145
2023-08-09 08:17:13,613 basic_logger.py time_end [line:74] INFO Eval Time Cost: 81.1995s
2023-08-09 08:18:34,178 basic_logger.py time_end [line:74] INFO Time Cost: 161.7642s
2023-08-09 08:18:34,178 fedbase.py run [line:64] INFO --------------Round 2--------------
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.1189
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO test_loss 1.8432
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.1188
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO train_loss 1.8449
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.1180
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.1091
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.1004
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO valid_loss 1.8496
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 1.8126
2023-08-09 08:19:55,190 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.2656
2023-08-09 08:19:55,190 basic_logger.py time_end [line:74] INFO Eval Time Cost: 81.0116s
2023-08-09 08:20:27,632 basic_logger.py time_end [line:74] INFO Time Cost: 113.4540s
2023-08-09 08:20:27,632 fedbase.py run [line:64] INFO --------------Round 3--------------
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.2738
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO test_loss 1.4739
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.2771
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO train_loss 1.4716
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.2728
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.2710
2023-08-09 08:21:48,558 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.0861
2023-08-09 08:21:48,559 basic_logger.py show_current_output [line:134] INFO valid_loss 1.4940
2023-08-09 08:21:48,559 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 1.4443
2023-08-09 08:21:48,559 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.2772
2023-08-09 08:21:48,559 basic_logger.py time_end [line:74] INFO Eval Time Cost: 80.9270s
2023-08-09 08:21:53,176 basic_logger.py time_end [line:74] INFO Time Cost: 85.5443s
2023-08-09 08:21:53,176 fedbase.py run [line:64] INFO --------------Round 4--------------
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO test_accuracy 0.1806
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO test_loss 4.0666
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO train_accuracy 0.1782
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO train_loss 4.1298
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO valid_accuracy 0.1829
2023-08-09 08:23:14,047 basic_logger.py show_current_output [line:134] INFO mean_valid_accuracy 0.2202
2023-08-09 08:23:14,048 basic_logger.py show_current_output [line:134] INFO std_valid_accuracy 0.1279
2023-08-09 08:23:14,048 basic_logger.py show_current_output [line:134] INFO valid_loss 4.0689
2023-08-09 08:23:14,048 basic_logger.py show_current_output [line:134] INFO mean_valid_loss 3.7908
2023-08-09 08:23:14,048 basic_logger.py show_current_output [line:134] INFO std_valid_loss 0.9718
2023-08-09 08:23:14,048 basic_logger.py time_end [line:74] INFO Eval Time Cost: 80.8717s
2023-08-09 08:23:14,048 main.py main [line:16] ERROR Exception Logged
Traceback (most recent call last):
File "D:\博士论文\paper2\easyFL-main\main.py", line 13, in main
server.run()
File "D:\博士论文\paper2\easyFL-main\algorithm\fedbase.py", line 73, in run
self.iterate()
File "D:\博士论文\paper2\easyFL-main\algorithm\scaffold.py", line 30, in iterate
self.model, self.cg = self.aggregate(dys, dcs)
File "D:\博士论文\paper2\easyFL-main\algorithm\scaffold.py", line 36, in aggregate
new_model = self.model + self.eta * fmodule._model_average(dys)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'
Traceback (most recent call last):
File "D:\博士论文\paper2\easyFL-main\main.py", line 13, in main
server.run()
File "D:\博士论文\paper2\easyFL-main\algorithm\fedbase.py", line 73, in run
self.iterate()
File "D:\博士论文\paper2\easyFL-main\algorithm\scaffold.py", line 30, in iterate
self.model, self.cg = self.aggregate(dys, dcs)
File "D:\博士论文\paper2\easyFL-main\algorithm\scaffold.py", line 36, in aggregate
new_model = self.model + self.eta * fmodule._model_average(dys)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\博士论文\paper2\easyFL-main\main.py", line 22, in
main()
File "D:\博士论文\paper2\easyFL-main\main.py", line 17, in main
raise RuntimeError
RuntimeError
`
您好，我在cifar10数据集上选用scaffold算法进行训练时，在迭代过程中报错，我并没有找到问题的原因。在复现algorithm中其他一些非FedAvg、FedProx过程中同样有这个问题。

运行afl和qfedavg报错

您好，在执行afl baseline的时候出现如下报错
cmd:
python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm afl --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1

在执行qffl的时候也发生报错
cmd:
python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm qfedavg --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1

这也许和communicate的输出格式有关，请问如何修复这个bug？十分感谢！

wwzzz / easyfl Goto Github PK

easyfl's People

Contributors

Stargazers

Watchers

Forkers

easyfl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs