Comments (7)
使用的版本如下:
- paddlepaddle: 2.6.1
- paddlenlp: 2.8.0
from paddlenlp.
可以看一下你的checkpoint/checkpoint-170目录,是不是没有保存tokenizer,一个简单的解决方式是去掉参数:
load_best_model_at_end
from paddlenlp.
可以看一下你的checkpoint/checkpoint-170目录,是不是没有保存tokenizer,一个简单的解决方式是去掉参数:
load_best_model_at_end
是这样的,如果要使用early_stopping ,那么load_best_model_at_end是必须项。当报这个错的时候,类似checkpoint-170这种目录已经不存在了。我查看worklog发现,其实训练已经完成了。但是可能是多进程开启的原因,每个进程都想load_best_model_at_end。所以只有一个进程能成功。其它的进程应该都失败了。
python3 -m paddle.distributed.launch --nproc_per_node=24
这样是正确开启多进程的方式吗? 在CPU模式下
from paddlenlp.
不建议在cpu上训练,训练效率低,gpu的分布式训练文档参考:
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/launch_cn.html#launch
--nproc_per_node:每个节点启动的进程数,在 GPU 训练中,应该小于等于系统的 GPU 数量。例如 --nproc_per_node=8
from paddlenlp.
不建议在cpu上训练,训练效率低,gpu的分布式训练文档参考:
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/launch_cn.html#launch
--nproc_per_node:每个节点启动的进程数,在 GPU 训练中,应该小于等于系统的 GPU 数量。例如 --nproc_per_node=8
暂时手头没有GPU可用,使用CPU测试的。 示例任务使用24个CPU核心训练大概4个小时不到就够了。还可一用。我的意思是,CPU模式如果不用 paddle.distributed.launch 那么应该如何正确开启多线程或多进程训练?
from paddlenlp.
这个可以在框架下面提issue,cpu场景不是很高频,应该是不支持的,分布式训练可以参考文档:
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/06_distributed_training/index_cn.html
from paddlenlp.
这个可以在框架下面提issue,cpu场景不是很高频,应该是不支持的,分布式训练可以参考文档:
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/06_distributed_training/index_cn.html
OK,明白了。感谢
from paddlenlp.
Related Issues (20)
- document_intelligence如何修改底层代码阈值
- [Question]: uie Taskflow 推理和use_paddle_infer_backend 推理占用的显存不一样
- [Bug]: Cannot build fast_tokenizer with Clang
- [Question]: 使用text classification的微调训练显示KeyError: 'eval_accuracy' HOT 1
- [Question]: UIE Taskflow 关系抽取例子中无法返回客体的类型,只有主体的类型,请问如何获得客体的类型 HOT 1
- [Question]: ernie1.0不报错但是3.0报index out of range in self
- [Docs]: https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines 文档里有博彩网站 HOT 1
- [Bug]: llm merge_lora_params 合并后不保存 merge权重 HOT 4
- [Question]: 在aistudio上执行精调Llama-2-7b-chat时,执行单卡训练命令,出现分布式的错误 HOT 5
- [Question]: 是否有本地知识库RAG功能?
- [Bug]: AutoModelForCausalLM.from_pretrained 转换 huggingface 模型显存不足 HOT 2
- [Bug]: 使用PaddleNLP跑 llama2-7b 预训练时报错 AttributeError 和 UnboundLocalError(pipeline_parallel_degree=2 , tensor_parallel_degree=2) HOT 3
- [Question]: uie事件抽取效果
- [Bug]: Qwen2ForCausalLM.forward() got an unexpected keyword argument 'output_router_logits'
- llama3 模型获取 support HOT 6
- macOS 在Python解释器运行import paddlenlp报错 HOT 1
- [Question]: 进行chatglm2 lora微调时,设置pipeline parallel:4,报错 module 'paddlenlp.transformers.chatglm_v2.modeling' has no attribute 'ChatGLMv2ForCausalLMPipe' HOT 1
- 在centos安装paddlepaddle==2.6.1/paddlenlp==2.6.1/paddleocr==2.7.3进行UIE提取报错,问题如下面链接,请帮忙解决,多谢 HOT 3
- [Bug]: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token HOT 1
- 想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理,目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢,怎么回事,求指导? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddlenlp.