GithubHelp home page GithubHelp logo

Pre-training hangs about yanmtt HOT 3 CLOSED

jaspock avatar jaspock commented on August 21, 2024
Pre-training hangs

from yanmtt.

Comments (3)

prajdabre avatar prajdabre commented on August 21, 2024

Hi,

I am not entirely sure why this happens, but let me take a stab. It is most likely related to the --ipaddr flag and the line 884 in pretrain_nmt.py which is "os.environ['MASTER_PORT'] = '26023'".

It is possible that the default argument of --ipaddr as localhost may be an issue with docker. Or it might be the case that 26023 is a bad port which is already in use. Basically, it seems like the process is waiting for something. So playing with this may help.

Other than that I can suggest that you try outside a docker environment.

Hope this helps.

from yanmtt.

jaspock avatar jaspock commented on August 21, 2024

This issue seemed to be related to some incompatibilities between my CUDA and the versions of Tensorflow and/or Pytorch in requirements.txt. I have it working now using Python 3.6.8, Pytorch 1.10.1 and TensorFlow 2.4.3.

Just in case this is useful to someone else, this is the relevant part of my current Dockerfile:

FROM nvcr.io/nvidia/pytorch:20.12-py3

RUN apt-get update
RUN apt-get install -y wget tmux && rm -rf /var/lib/apt/lists/*

WORKDIR /setup
WORKDIR /app

RUN conda update conda
RUN conda create -n yanmtt python=3.6.8

SHELL ["conda", "run", "-n", "yanmtt", "/bin/bash", "-c"]
RUN git clone https://github.com/prajdabre/yanmtt
WORKDIR yanmtt
RUN pip install -r requirements.txt
WORKDIR transformers
RUN python setup.py install
RUN pip install tensorflow==2.4.3
SHELL ["/bin/bash", "-c"]

ENV PYTHONPATH=$PYTHONPATH:/app/yanmtt/transformers
RUN conda install -n yanmtt pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

WORKDIR /setup
RUN git clone --branch v0.1.95 https://github.com/google/sentencepiece.git
RUN mkdir sentencepiece/build 
WORKDIR sentencepiece/build
RUN cmake .. && make -j 4
RUN make install && ldconfig -v

RUN echo 'eval "$(conda shell.bash hook)"' >>~/.bashrc && echo 'conda activate yanmtt' >>~/.bashrc

WORKDIR /app

from yanmtt.

prajdabre avatar prajdabre commented on August 21, 2024

Oh fantastic. Could you make a contrib folder in the examples folder and write down these points and then send a pull request? It would really help people.

from yanmtt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.