GithubHelp home page GithubHelp logo

Train with single gpu about ultralytics HOT 3 CLOSED

TonightGo avatar TonightGo commented on July 24, 2024
Train with single gpu

from ultralytics.

Comments (3)

glenn-jocher avatar glenn-jocher commented on July 24, 2024

@TonightGo hi there,

Thank you for reaching out and providing a detailed description of the issue you're encountering. Let's work through this together.

First, to ensure we can effectively investigate and address the problem, please verify the following:

  1. Minimum Reproducible Example: Could you provide a minimal code snippet that reproduces the issue? This helps us isolate the problem more efficiently. You can refer to our Minimum Reproducible Example Guide for more details on how to create one.

  2. Package Versions: Ensure you are using the latest versions of torch and ultralytics. You can upgrade them using the following commands:

    pip install --upgrade torch ultralytics

Regarding the error you're encountering, it seems related to multiprocessing and data loading. Here are a few steps you can take to troubleshoot and potentially resolve the issue:

Multiprocessing Setup

The error suggests an issue with the multiprocessing setup. Ensure that the multiprocessing code is protected by if __name__ == "__main__": to avoid issues with spawning new processes:

if __name__ == "__main__":
    import sys
    sys.path.insert(0, 'ultralytics')
    import torch
    import cv2
    from ultralytics import YOLO

    torch.set_num_threads(1)
    cv2.setNumThreads(1)
    NUM_THREADS = 2

    import torch.multiprocessing as mp
    mp.set_start_method('spawn', force=True)

    model = YOLO("/content/drive/MyDrive/ultralytics/ultralytics/cfg/models/v8/test.yaml", verbose=True)
    model.train(data="/content/drive/MyDrive/ultralytics/ultralytics/cfg/datasets/test.yaml", epochs=300, project='test', name='debug', device='0', imgsz=640, batch=2, exist_ok=True, workers=1, resume=False, optimizer='SGD')

DataLoader Workers

The error message suggests trying to rerun with num_workers=0 for better error tracing. This can help identify if the issue is related to multiprocessing:

model.train(data="/content/drive/MyDrive/ultralytics/ultralytics/cfg/datasets/test.yaml", epochs=300, project='test', name='debug', device='0', imgsz=640, batch=2, exist_ok=True, workers=0, resume=False, optimizer='SGD')

Dataset and Cache

The assertion error in get_labels indicates a potential issue with the dataset cache. Try clearing the cache or ensuring that the dataset paths and annotations are correct:

# Clear cache
import os
cache_path = "/content/drive/MyDrive/ultralytics/ultralytics/cfg/datasets/test.cache"
if os.path.exists(cache_path):
    os.remove(cache_path)

Debugging

If the issue persists, consider running the training with a smaller dataset or fewer epochs to isolate the problem. Additionally, you can enable more verbose logging to get detailed insights into the training process.

Please try these steps and let us know if the issue persists. We're here to help!

from ultralytics.

TonightGo avatar TonightGo commented on July 24, 2024

image
I used the above code and it works. Thank you for your timely and effective reply.

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 24, 2024

Hi @TonightGo,

I'm glad to hear that the provided solution worked for you! 🎉 If you encounter any further issues or have additional questions, feel free to reach out. We're here to help!

Happy training! 🚀

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.