burlachenkok / flpytorch Goto Github PK
View Code? Open in Web Editor NEWFL_PyTorch: Optimization Research Simulator for Federated Learning
License: Apache License 2.0
FL_PyTorch: Optimization Research Simulator for Federated Learning
License: Apache License 2.0
When we started, the GUI was simple. But the GUI code in terms of the number of lines has been expanded heavily.
Refactor that code with separation classes into different source files - it may be a good idea for reading.
Hi @burlachenkok,
I installed the project successfully but ended up with this runtime error "expected scalar type Long but found Double" that occurred with different configurations.
Another error I found while trying different datasets "Using a target size (torch.Size([500])) that is different to the input size (torch.Size([500, 10])) is deprecated. Please ensure they have the same size."
I would greatly appreciate any help.
Here is the log of the project:
Job '{simcounter}job_id{now}' with algorithm 'FEDAVG' has been sumbitted
python run.py
--rounds "3000"
--client-sampling-type "uniform"
--num-clients-per-round "10"
--global-lr "0.1"
--global-optimiser "sgd"
--global-weight-decay "0.0"
--number-of-local-iters "1"
--batch-size "500"
--local-lr "0.01"
--local-optimiser "sgd"
--local-weight-decay "0.0"
--dataset "cifar10_fl"
--loss "crossentropy"
--model "tv_resnet18"
--use-pretrained
--train-last-layer
--metric "top_1_acc"
--global-regulizer "none"
--global-regulizer-alpha "0.0"
--checkpoint-dir "../check_points"
--do-not-save-eval-checkpoints
--data-path "../data/"
--compute-type "fp64"
--gpu "-1"
--log-gpu-usage
--num-workers-train "0"
--num-workers-test "0"
--deterministic
--manual-init-seed "123"
--manual-runtime-seed "456"
--group-name ""
--comment ""
--hostname "nfl"
--eval-every "100"
--eval-async-threads "0"
--save-async-threads "0"
--threadpool-for-local-opt "0"
--run-id "1_job_id_1679617053"
--algorithm "fedavg"
--algorithm-options "internal_sgd:full-gradient"
--logfile "../logs/1_log_1679617053.txt"
--client-compressor "ident:5%"
--extra-track "full_gradient_norm_train,full_objective_value_train"
--allow-use-nv-tensorcores
--initialize-shifts-policy "zero"
--wandb-key ""
--wandb-project-name "fl_pytorch_simulation"
--loglevel "debug"
--logfilter ".*"
--out "1_job_id_1679617053.bin"
2023-03-24 11:19:48.269860
python run.py
--rounds "3000"
--client-sampling-type "uniform"
--num-clients-per-round "10"
--global-lr "0.1"
--global-optimiser "SGD"
--global-weight-decay "0.0"
--number-of-local-iters "1"
--batch-size "500"
--local-lr "0.01"
--local-optimiser "SGD"
--local-weight-decay "0.0"
--dataset "cifar10_fl"
--loss "CROSSENTROPY"
--model "tv_resnet18"
--use-pretrained
--train-last-layer
--metric "top_1_acc"
--global-regulizer "none"
--global-regulizer-alpha "0.0"
--checkpoint-dir "../check_points"
--do-not-save-eval-checkpoints
--data-path "../data/"
--compute-type "fp64"
--gpu "-1"
--log-gpu-usage
--num-workers-train "0"
--num-workers-test "0"
--deterministic
--manual-init-seed "123"
--manual-runtime-seed "456"
--group-name ""
--comment ""
--hostname "nfl"
--eval-every "100"
--eval-async-threads "0"
--save-async-threads "0"
--threadpool-for-local-opt "0"
--run-id "{simcounter}job_id{now}"
--algorithm "FEDAVG"
--algorithm-options "internal_sgd:full-gradient"
--logfile "../logs/{simcounter}log{now}.txt"
--client-compressor "ident:5%"
--extra-track "full_gradient_norm_train,full_objective_value_train"
--allow-use-nv-tensorcores
--initialize-shifts-policy "zero"
--wandb-key ""
--wandb-project-name "fl_pytorch_simulation"
--loglevel "DEBUG"
--logfilter ".*"
--out "current.bin"
Release unoccupied cache memory from PyTorch...
Running the garbage collector...
Done. 0.01 MB was removed from Virtual and Resident memory of interpreter. Current used amount of memory is 38366.88 MBytes
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 13.2.1 (x86_64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) [Clang 6.0 (clang-600.0.57)] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.0
[conda] Could not collect
I am manually creating spreadsheets to understand what experiments have been done, and what has not. That step need some automation.
@burlachenkok Hi, I am really interested in your project and have established the environment. However, when I ran it, I could not get the result successfully. One of the errors I met showed "ValueError: Cannot take a larger sample than population when 'replace=False'." Do you have any solutions for this? Thank you!
Configuration:
Matlab produces some kind of report for code snippets.
We may want to produce some automatic text report about experimental results - boilerplate text for experiments in Latex format.
Have ability to import command line script into GUI settings
Hi,
I am trying to install fl_pytorch on mac osx Ventura 13.0.1. with arm64 processor.
I know there is no cuda/gpu support for mac osx -- so does one need to edit the run.py file and comment out all the cuda/gpu lines of code?
If not, please suggest how this should be done!
Many thanks,
Haimonti
The storage space for experimental results is pretty big
(For Neural Nets, it can easily be 20-100 Gigabytes). Investigate how it can save memory in serialized at the end of the day server state "H."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.