This is the part of my log file where the error happens. I guess this error happened because of an incompatible library. Could you give me information about the version of transformers you use?. It would be helpful if you release the requirements.txt file
2%|▎ | 1000/40000 [15:49<10:18:48, 1.05it/s]{'loss': 0.5651, 'learning_rate': 0.0005, 'epoch': 4.35}
{'loss': 0.2853, 'learning_rate': 0.0004936708860759494, 'epoch': 8.7}
0%| | 0/7 [00:00<?, ?it/s]�[A
29%|██▊ | 2/7 [00:00<00:02, 2.49it/s]�[A
43%|████▎ | 3/7 [00:01<00:02, 1.75it/s]�[A
57%|█████▋ | 4/7 [00:02<00:01, 1.51it/s]�[A
71%|███████▏ | 5/7 [00:03<00:01, 1.40it/s]�[A
86%|████████▌ | 6/7 [00:04<00:00, 1.35it/s]�[A
100%|██████████| 7/7 [00:04<00:00, 1.64it/s]�[A
�[A
2%|▎ | 1000/40000 [15:54<10:18:48, 1.05it/s]
100%|██████████| 7/7 [00:04<00:00, 1.64it/s]�[A
�[A{'eval_loss': 0.27034154534339905, 'eval_f1': 81.3953488372093, 'eval_accuracy': 68.62745098039215, 'eval_runtime': 5.1909, 'eval_samples_per_second': 39.299, 'eval_steps_per_second': 1.349, 'epoch': 8.7}
Traceback (most recent call last):
File "/home/trunghn/DecomposedPromptTuning/train.py", line 704, in
main()
File "/home/trunghn/DecomposedPromptTuning/train.py", line 659, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 1914, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 2279, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 2355, in _save_checkpoint
self.save_model(staging_output_dir, _internal_call=True)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 2849, in save_model
self._save(output_dir)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/transformers/trainer.py", line 2905, in _save
safetensors.torch.save_file(state_dict, os.path.join(output_dir, SAFE_WEIGHTS_NAME))
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/safetensors/torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/home/trunghn/miniconda3/envs/ie/lib/python3.10/site-packages/safetensors/torch.py", line 467, in _flatten
raise RuntimeError(
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'base_model.decoder.embed_tokens.weight', 'base_model.lm_head.weight', 'base_model.encoder.embed_tokens.weight', 'base_model.shared.weight', 'word_embeddings.weight'}].
A potential way to correctly save your model is to use save_model
.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors