GithubHelp home page GithubHelp logo

Comments (8)

Facico avatar Facico commented on July 16, 2024

能看到

from chinese-vicuna.

zhoujx4 avatar zhoujx4 commented on July 16, 2024

请问有试过单机多卡的情况吗? 不是多机多卡,发现在单机多卡的时候 bash finetune.sh 时,会卡住,但也没报错,没有任务的训练时候 loss日志打印出来

from chinese-vicuna.

Facico avatar Facico commented on July 16, 2024

我们现在程序就是单机多卡,你那边有数据加载界面吗,我猜是卡在数据加载界面上了。

from chinese-vicuna.

Facico avatar Facico commented on July 16, 2024

如果卡在数据加载界面上,可能的原因是你用的数据是我们之前的版本“不是utf-8”格式的,看不到正常的中文,这个版本在一些系统上可能会存在问题。你可以看看你的数据能不能看到正常中文字符,如果不能可以参考这个issue,或者从huggingface或网盘中拉去现在的数据集

from chinese-vicuna.

zhoujx4 avatar zhoujx4 commented on July 16, 2024

我们现在程序就是单机多卡,你那边有数据加载界面吗,我猜是卡在数据加载界面上了。

你好,试了下,貌似不是卡在数据加载页面上,
单机单卡是能跑的,如下图
image
但是单机多卡,就卡住了,而且也没报错,就一直卡在那里,如下图,不知道是不是torchrun的参数问题?
image

from chinese-vicuna.

Facico avatar Facico commented on July 16, 2024

如果数据加载没问题的话,如果是只有多卡有问题看看是不是有下面的问题:
1、pytorch3.11的torchrun是有bug的,可以换成其他版本
2、确认多卡是否成功指定,同时那些卡是否存在问题,跑的时候可以nvidia-smi看看显存使用情况

from chinese-vicuna.

zhoujx4 avatar zhoujx4 commented on July 16, 2024

找到问题啦,已解决,我的机器是单机8张A6000,
改了bios,关闭ACS 后解决问题

from chinese-vicuna.

cbzhao79 avatar cbzhao79 commented on July 16, 2024

找到问题啦,已解决,我的机器是单机8张A6000, 改了bios,关闭ACS 后解决问题

可以说一下如何解决的吗?我现在也是碰到这个问题,非常感谢!

from chinese-vicuna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.