GithubHelp home page GithubHelp logo

Comments (2)

selfcontrol7 avatar selfcontrol7 commented on August 15, 2024

Hi,

I am trying to run the distributed code for classification using the below bash command:

LOG_FILE="experiments/distributed/transformer_exps/fedavg_transformer_tc.log"
CLIENT_NUM=100
WORKER_NUM=10
SERVER_NUM=1
GPU_NUM_PER_SERVER=4
ROUND=250
CI=0

PROCESS_NUM=`expr $WORKER_NUM + 1`
echo $PROCESS_NUM
HOST_FILE=experiments/distributed/transformer_exps/mpi_host_file
hostname > $HOST_FILE

mpirun -np $PROCESS_NUM -hostfile $HOST_FILE \
python -m experiments.distributed.transformer_exps.main_text_classification \
    --gpu_mapping_file "experiments/distributed/transformer_exps/gpu_mapping.yaml" \
    --gpu_mapping_key mapping_ink-ron \
    --client_num_in_total $CLIENT_NUM \
    --client_num_per_round $WORKER_NUM \
    --comm_round $ROUND \
    --ci $CI \
    --dataset 20news \
    --data_file "data/data_files/20news_data.h5" \
    --partition_file "data/partition_files/20news_partition.h5" \
    --partition_method uniform \
    --model_type distilbert \
    --model_name distilbert-base-uncased \
    --do_lower_case True \
    --train_batch_size 8 \
    --eval_batch_size 8 \
    --max_seq_length 128 \
    --learning_rate 1e-5 \
    --server_lr 1e-5 \
    --server_optimizer adam \
    --epochs 1 \
    --output_dir "/tmp/20news_fedavg/" \
    --fed_alg fedavg \
    --fp16 
    # 2> ${LOG_FILE} &

but I always get the error message presented below:

11
No protocol specified
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 11
slots that were requested by the application:

  python

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------

Please, can you give me some hints on how to solve this issue?

Thank you

from fednlp.

MrigankRaman avatar MrigankRaman commented on August 15, 2024

Hi! We at FedML have launched a new platform for FedNLP where this issue should not be there. Can you please check whether you face the same issue there?
Here is the new FedNLP platform: https://github.com/FedML-AI/FedML/tree/master/python/app/fednlp

from fednlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.