alibaba / federatedscope Goto Github PK
View Code? Open in Web Editor NEWAn easy-to-use federated learning platform
Home Page: https://www.federatedscope.io
License: Apache License 2.0
An easy-to-use federated learning platform
Home Page: https://www.federatedscope.io
License: Apache License 2.0
How to reproduce:
python federatedscope/main.py --cfg federatedscope/gfl/baseline/fedavg_gnn_node_fullbatch_citation.yaml federate.sample_client_rate 1.0
But
python federatedscope/main.py --cfg federatedscope/gfl/baseline/fedavg_gnn_node_fullbatch_citation.yaml federate.sample_client_num 5
works fine.
In fact, federate.sample_client_rate=1.0
equals federate.sample_client_num=5
. @yxdyc @rayrayraykk
An Error happens in FederatedScope/federatedscope/cv/dataset/leaf_cv.py
:
When it is required to download the dataset, a function download_url
is imported from the module torch_geometric.data
, which is not included in the minimal version of the requirement, (of course it shouldn't be included in the minimal version, in my opinion).
So maybe we should replace the download_url
with another implementation here. Thanks :)
An executable toy example to demonstrate how to run distributed mode with FederatedScope.
请教两个问题:
1、federatedScope有没有提供像fate的federatedML那样的联邦算法库?fate的federatedMl在横向纵向lr、神经网络、决策树等等都有联邦算法提供可以直接使用
2、federatedScope的架构实现上相较fate有没有更优异的地方?
3、federatedScope有没有已经现网生产落地部署、或者是商用的案例?
Although I have successfully completed the installation procedure, I cannot import federatedscope
from path other than the root of the cloned repo.
FederatedScope/federatedscope/core/configs/cfg_fl_setting.py
Lines 81 to 82 in 4a986e0
The term cfg.federate.client_num
is allowed to be 0, which implies that the client_num
would be determined by the dataset after the loading process. However, the above assertion happens before data loading and thus causes some unexpected behaviors, such as forcibly setting sample_client_num
to the same value as cfg.federate.client_num
:
FederatedScope/federatedscope/core/configs/cfg_fl_setting.py
Lines 103 to 105 in 4a986e0
Hi, I am trying to follow up on the guidance here: https://federatedscope.io/docs/own-case/, to add the data/model/trainer etc., into the project and run up an FL task.
However, this guidance is not that clear, particularly, 1). what does the config section work; 2). how to unify the customized data/model/trainer/config together to complete an FL task.
Is that possible to have complete guidance by using something like mnist/cifar10?
Is it possible to add a keyword, such as cfg.federate.task_type, to indicate the task type of each client? It is useful in calculating the loss function because y_true should be long and float respectively for classification and regression tasks.
'Results_raw': {'client_individual': {'val_loss': 0.7106942534446716, 'test_loss': 0.7106942534446716, 'test_avg_loss': 0.7106942534446716, 'test_total': 1.0, 'val_avg_loss': 0.7106942534446716, 'val_total': 1.0}, 'client_summarized_weighted_avg': {'val_loss'
It is difficult for users to know client_individual
means the best individual results.
In the current main branch (commit 954322c), the implementation of the gRPCCommManager
class (see federatedscope/core/communication.py) largely refers to that in FedML (see https://github.com/FedML-AI/FedML/blob/master/fedml_core/distributed/communication/gRPC/grpc_comm_manager.py in commit 0fb63dd157e55ee603b7049568bf4c4ed0586e71), as commented in FederatedScope's codebase. This class is based on gRPC, a modern open source high performance Remote Procedure Call (RPC) framework. A gRPCCommManager
(i) keeps addresses of potential message receivers in a dict/list collection and (ii) has wrapper functions that call APIs in gRPC for message sending and receiving. Many similar variants of such wrapper functions have been widely adopted in related packages. Although FederatedScope obeys the Apache-2.0 License and had a declaration of FedML's copyright, in order to avoid the risk of unintended infringement and unnecessary disputes with FedML, we re-implement this class by refering to the examples in gRPC tutorial in this commit.
Google's landmark datasets (e.g., GLD-23k, GLD-160k) are widely used in evaluating federated optimization algos, and TFF has integrated the datasets (see https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/gldv2/load_data). I am wondering is it possible to use them in FederatedScope.
in Quick Start, build docker image and run with docker env command error,
in docs ,the command is {docker run --gpus device=all --rm --it --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash"},
it should be {docker run --gpus device=all --rm -it --name "fedscope" -w
Hi when I am trying to launch up the demo case, cuda relevant error was reported as below:
I am using conda to manage the environment. in other env I have the pytorch works on cuda without any problem.
I think this could be the installation issue-- I did not install anything by myself, totally following your guidance.
My cuda version:
NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
and my torch version: 1.10.1
(fedscope) liangma@lMa-X1:~/prj/FederatedScope$ python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml
...
2022-05-13 22:06:09,249 (server:520) INFO: ----------- Starting training (Round #0) -------------
Traceback (most recent call last):
File "/home/liangma/prj/FederatedScope/federatedscope/main.py", line 41, in <module>
_ = runner.run()
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 136, in run
self._handle_msg(msg)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 254, in _handle_msg
self.client[each_receiver].msg_handlers[msg.msg_type](msg)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/worker/client.py", line 202, in callback_funcs_for_model_para
sample_size, model_para_all, results = self.trainer.train()
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 374, in train
self._run_routine("train", hooks_set, target_data_split_name)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 208, in _run_routine
hook(self.ctx)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 474, in _hook_on_fit_start_init
ctx.model.to(ctx.device)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 899, in to
return self._apply(convert)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 897, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/cuda/__init__.py", line 208, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
as the title said.
I went throught the example of scope and didn't find a vertical federated learning example. Did flower support vertical federated learning and is there an example?
The global config should not be used here.
context
, different training routines will use the same optimizer.Therefore, the optimizer may track some state variables across different training routines. Considering the initialized model of each training routine is broadcast by the server, maybe it is unnecessary or even wrong to track past variables.
When drop_last=True
and batch_size
is larger than the number of clients' local samples, no data is used for local training and the following error happens:
File "/root/miniconda3/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/context.py", line 154, in pre_calculate_batch_epoch_num
num_train_epoch = math.ceil(local_update_steps / num_train_batch)
ZeroDivisionError: division by zero
As the title said.
https://federatedscope.io/docs/algozoo/#additional-functions-for-fedprox-algorithm
Here⬆️The article format seems to be a bit messed up.
The file name has a character ':', which is forbidden in Windows.
Will repeatedly opening a file deteriorate the efficiency?
As a vanilla python file stream (v.s. tf SummaryWriter), is it possible to cache the logs without blocking the main thread?
When incorporating other packages, this might affect the logging behaviors of involved packages.
The unit tests use the same shared global_cfg, which does not follow our principles for the use of config.
Besides, the backup operation for the configs may be redundant.
FederatedScope/federatedscope/core/fed_runner.py
Lines 224 to 225 in b4914e6
Here the model uses the global configuration instead of the client configuration.
The results (most are floating numbers) are printed to the stdout without controlling the preserved precision. Thus, the reported results look lengthy.
A Fedavg on 5% of FEMNIST trail will produce a 500 kb log each round:
with 80% eval logs like 2022-04-13 16:33:24,901 (client:264) INFO: Client #1: (Evaluation (test set) at Round #26) test_loss is 79.352451
. And 10% is server results and 10% is train informations.
If the round is 500, 1000, or much larger, the log files will take up too much space with a lot of redundancy. @yxdyc
Is it necessary to report server id? it is inconsistent now.
As the title said.
Some ideas can be borrowed from asynchronous SGD, including but not limited to:
A simulator can also be provided for running asynchronous FL standalone.
We aim to improve logging content with more metrics and better readability:
the systematic metrics are missing. We need to add more metrics to reflect system performance, such as communication and computational efficiency.
we may add metric to reflect the convergence, e.g., the number of rounds to converge
We can discuss specific metric requirements and logging timing here @rayrayraykk
Although FederatedScope provides different kinds of scripts in FederatedScope/scripts/, it is a little hard for users to understand these scripts without some guidance, such as which example is running for a certain script and how to use these scripts for customized tasks.
Maybe some detailed guidance on the scripts should be provided for users. Thanks :)
The monitoring can be improved in terms of the visualization support.
To use visualization tools such as wandb or tensorboard, we currently parse the log file after the results saved. We need to support results logging in real time rather the two-step style. Besides, the parsing process should be automatic for better usability.
We can discuss other requirements for visualization here @rayrayraykk
When doing regression task, it seems incorrect to use y_pred as input because it equals argmax of y_prob. Is there any suggested metric for regression?
When using the splitter for customized datasets, the label distributions of train and test sets are independent within one client.
And it might cause meaningless observation on the client-wise performance because of such independence of distribution of train and test set.
IMO, FederatedScope can provide an option for users to keep the label distributions of train and test set consistent within one client when using the splitter, which can be useful in some tasks such as personalization federated learning.
When we use subthread to execute FL (what we do in autotune module now), @rayrayraykk observe that it is extremely slow to instantiate an FL runner (data loader may be the bottleneck).
Will there be the vertical federated learning for the GAN network?
#95 修复后,windows无法下载数据包,根据日志记录
2022-05-20 22:12:56,551 (utils:89) INFO: the output dir is exp\sub_exp_20220520221256
2022-05-20 22:12:56,589 (utils:214) INFO: Downloading https://federatedscope.oss-cn-beijing.aliyuncs.com/shakespeare_all_data.zip)
似乎url后面多了一个 )。通过修改federatedscope\cv\dataset\leaf_cv.py 89行 download_url(f'{url}/{name})', self.raw_dir) 去除多余)后,可以正常下载。
Adam got an unexpected keyword argument 'momentum'.
Hello guys!
I have read the tutorial about the FederatedScope. It seems that the whole project is based on the Python and the cross-device part is just simulation.
I wonder is there any cross-language design to deal with the communication between the cllient and the server, for example, with Android(Java) in the mobile phone and Linux(Java/Python) in the server. Because, you know, some divices lack the Python enviroument.
What's more, is there any trial on the real devices especially cross-device part?
I will be appreciated if you cute guys can solve my doubts.
Thanks to you for your payment on the FederatedScope!
In each round, multiple evaluation results are reported in the logs, each of which seems to be the results on a fraction of clients.
Current finetune is implemented by reusing the training routine, so we have to store the variables in context that belong to the training process and recover them after finetuning. Besides, current finetune doesn't support training by epoch. So maybe we can sperate a single routine, and value of "ctx.cur_mode" for finetune, for example, ctx.cur_mode=="finetune"
.
There are metrics that have invalid names. It seems that this is caused by incorrect recursive concatenation of strings.
The folder to hold the log file is augmented incorrectly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.