I am encountering this error: WARNING - root - Changed type of confi

1.Make Arrow file. Conversion s are located in vilt/utils/write_*.py. Run make

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

export MASTER_ADDR=$DIST_0_IP export MASTER_PORT=$DIST_0_PORT export NOD

Thanks for the help. Even after running above 3 export command

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

python run.py with data_root="/arrows_flickr30k" num_gpus=1 num_nodes=1 task_finetune_irtr_f30k_randaug per_gpu_batchsize=4 load_path="vilt_200k_mlm_itm.ckpt",about dandelin/vilt

hongzhenwang commented on July 21, 2024 3

1.Make Arrow file. Conversion scripts are located in vilt/utils/write_*.py. Run make_arrow functions to convert the dataset to pyarrow binary file.
2. export MASTER_ADDR="0.0.0.0"
export MASTER_PORT="8000"
export NODE_RANK=0

from vilt.

dandelin commented on July 21, 2024

As error reports Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132), It's highly likely that your PyTorch and PyTorch-lightning versions mismatch with ours.

Please install the latest Pytorch version.

from vilt.

jkkishore1999 commented on July 21, 2024

Can you please let me know the recommended PyTorch and PyTorch-ligthining versions?
I have already done with the step :
pip install -r requirements.txt
pip install -e .

still the above error came.

Do we need to modify requirements.txt?

from vilt.

dandelin commented on July 21, 2024

@jkkishore1999
Pytorch is not in the requirements.txt, so you are using your own version of installed Pytorch.
Pytorch > 1.7 should work fine.

from vilt.

jkkishore1999 commented on July 21, 2024

My pytorch version is 1.8. Still there are some other errors

python run.py with data_root=/data2/dsets/dataset num_gpus=1 num_nodes=1 task_finetune_irtr_f30k_randaug per_gpu_batchsize=4 load_path="weights/vilt_200k_mlm_itm.ckpt"

WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
GPU available: True, used: True
INFO - lightning - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO - lightning - TPU available: None, using: 0 TPU cores
Using environment variable NODE_RANK for node rank ().
INFO - lightning - Using environment variable NODE_RANK for node rank ().
ERROR - ViLT - Failed after 0:00:05!
Traceback (most recent call last):
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
return self.run(
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
run()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/run.py", line 238, in call
self.result = self.main_function(*args)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "run.py", line 48, in main
trainer = pl.Trainer(
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
return fn(self, **kwargs)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 347, in init
self.accelerator_connector.on_trainer_init(
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 127, in on_trainer_init
self.trainer.node_rank = self.determine_ddp_node_rank()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 415, in determine_ddp_node_rank
return int(rank)
ValueError: invalid literal for int() with base 10: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 11, in
def main(_config):
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 190, in automain
self.run_commandline()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
print_filtered_stacktrace()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
return "".join(filtered_traceback_format(tb_exception))
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

Can you please help?

from vilt.

jkkishore1999 commented on July 21, 2024

Also sometimes, for the same execution, another error is coming,

WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
GPU available: True, used: True
INFO - lightning - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO - lightning - TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO - lightning - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Using native 16bit precision.
INFO - lightning - Using native 16bit precision.
Missing logger folder: result/finetune_irtr_f30k_randaug_seed0_from_vilt_200k_mlm_itm
WARNING - lightning - Missing logger folder: result/finetune_irtr_f30k_randaug_seed0_from_vilt_200k_mlm_itm
Global seed set to 0
INFO - lightning - Global seed set to 0
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
INFO - lightning - initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
INFO - root - Added key: store_based_barrier_key:1 to store for rank: 0
ERROR - ViLT - Failed after 0:00:05!
Traceback (most recent call last):
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
return self.run(
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
run()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/run.py", line 238, in call
self.result = self.main_function(*args)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
result = wrapped(*args, **kwargs)
File "run.py", line 71, in main
trainer.fit(model, datamodule=dm)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 268, in ddp_train
self.trainer.call_setup_hook(model)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 859, in call_setup_hook
self.datamodule.setup(stage_name)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
return fn(*args, **kwargs)
File "/others/cs16b114/ViLT/vilt/datamodules/multitask_datamodule.py", line 34, in setup
dm.setup(stage)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
return fn(*args, **kwargs)
File "/others/cs16b114/ViLT/vilt/datamodules/datamodule_base.py", line 137, in setup
self.set_train_dataset()
File "/others/cs16b114/ViLT/vilt/datamodules/datamodule_base.py", line 76, in set_train_dataset
self.train_dataset = self.dataset_cls(
File "/others/cs16b114/ViLT/vilt/datasets/f30k_caption_karpathy_dataset.py", line 15, in init
super().init(*args, **kwargs, names=names, text_column_name="caption")
File "/others/cs16b114/ViLT/vilt/datasets/base_dataset.py", line 53, in init
self.table_names += [name] * len(tables[i])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 11, in
def main(_config):
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 190, in automain
self.run_commandline()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
print_filtered_stacktrace()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
return "".join(filtered_traceback_format(tb_exception))
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

from vilt.

dandelin commented on July 21, 2024

export MASTER_ADDR=$DIST_0_IP
export MASTER_PORT=$DIST_0_PORT
export NODE_RANK=$DIST_RANK
please check you’ve set the above environment variables
is your data located in data_root=/data2/dsets/dataset or data_root=/arrows_flickr30k?

from vilt.

jkkishore1999 commented on July 21, 2024

Thanks for the help.

Even after running above 3 export commands. Those 3 environment vairables are set to null only
declare -x MASTER_ADDR=""
declare -x MASTER_PORT=""
declare -x NODE_RANK=""
I have changed by data_root to /data2/dsets/dataset
Can you please help?

from vilt.

dandelin commented on July 21, 2024

packages/pytorch_lightning/accelerators/accelerator_connector.py", line 415, in determine_ddp_node_rank
return int(rank)
ValueError: invalid literal for int() with base 10: ''

The error above seems due to that DDP is not properly initialized.

Do you mean the export command did not change the environment variables?
Setting those environment variables is necessary for PyTorch-lightning to do the DDP training properly.
Please make sure that those variables are set. (you can check current environment variables using the env command)
Also, check out this guide

File "/others/cs16b114/ViLT/vilt/datasets/base_dataset.py", line 53, in init
self.table_names += [name] * len(tables[i])
IndexError: list index out of range

Also, this error was probably raised because the list tables is empty.
Please check the dataset file is in f"{data_dir}/{name}.arrow" in advance.

from vilt.

Miazzzzx commented on July 21, 2024

Have you solved your problem? I‘v encountered the same problem. If it is possible could you please tell me how to solve it.

from vilt.

631212502 commented on July 21, 2024

Have you solved it? I have the same bug. The location of data and env have been checked. but 'print(tables)' always gets none(empty). Maybe there is something wrong with the way I set the address, can you tell me where the data should be placed in the root directory to make the command "data_root=/data2/dsets/dataset" can be run directly.

from vilt.

631212502 commented on July 21, 2024

Have you solved it? I have the same bug. The location of data and env have been checked. but 'print(tables)' always gets none(empty). Maybe there is something wrong with the way I set the address, can you tell me where the data should be placed in the root directory to make the command "data_root=/data2/dsets/dataset" can be run directly.

I have found the reason, the address is missing ”“

from vilt.

ThompsonISAT commented on July 21, 2024

Have you solved it? I have the same bug. The location of data and env have been checked. but 'print(tables)' always gets none(empty). Maybe there is something wrong with the way I set the address, can you tell me where the data should be placed in the root directory to make the command "data_root=/data2/dsets/dataset" can be run directly.

I have found the reason, the address is missing ”“

Hi even if I add the "" for address, I still get the same error. Could you help me to fix it? Thank you so much!

from vilt.

XX1nn commented on July 21, 2024

@jkkishore1999 I also noticed that you are using num_gpus=1 num_nodes=1. I use the same parameters with you. Now I have reported the same error as you.

File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
return fn(*args, **kwargs)
File "/others/cs16b114/ViLT/vilt/datamodules/datamodule_base.py", line 137, in setup
self.set_train_dataset()
File "/others/cs16b114/ViLT/vilt/datamodules/datamodule_base.py", line 76, in set_train_dataset
self.train_dataset = self.dataset_cls(
File "/others/cs16b114/ViLT/vilt/datasets/f30k_caption_karpathy_dataset.py", line 15, in init
super().init(*args, **kwargs, names=names, text_column_name="caption")
File "/others/cs16b114/ViLT/vilt/datasets/base_dataset.py", line 53, in init
self.table_names += [name] * len(tables[i])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 11, in
def main(_config):
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 190, in automain
self.run_commandline()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
print_filtered_stacktrace()
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
return "".join(filtered_traceback_format(tb_exception))
File "/others/cs16b114/anaconda3/envs/vilt/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

May I know how to resolve it in the end. Is it to first introduce environment variables and then determine whether the file exists? However, I use a non distributed training, how can I determine the parameters of environment variables(MASTER_ADDR="" MASTER_PORT="" NODE_RANK="")

from vilt.

XX1nn commented on July 21, 2024

1.Make Arrow file. Conversion scripts are located in vilt/utils/write_*.py. Run make_arrow functions to convert the dataset to pyarrow binary file. 2. export MASTER_ADDR="0.0.0.0" export MASTER_PORT="8000" export NODE_RANK=0
export MASTER_ADDR="0.0.0.0" export MASTER_PORT="8000" export NODE_RANK=0

@hongzhenwang
Thanks for your answer. Are the values you mentioned for non distributed applications? Is the meaning of 0.0.0.0 applicable to any IP? can i just use the value "0.0.0.0" and NODE_RANK=0 for my non distributed finetuing?

from vilt.

python run.py with data_root="/arrows_flickr30k" num_gpus=1 num_nodes=1 task_finetune_irtr_f30k_randaug per_gpu_batchsize=4 load_path="vilt_200k_mlm_itm.ckpt" about vilt HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs