GithubHelp home page GithubHelp logo

Comments (5)

github-actions avatar github-actions commented on June 20, 2024

👋 Hello @Sachin-Wani, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

from ultralytics.

Burhan-Q avatar Burhan-Q commented on June 20, 2024

@Sachin-Wani just a quick couple of points to share.

  1. When checking inference times, it's always a good idea to warm up the GPU/model prior. The first few inference calls always have a higher (overall) processing time if not sufficiently warmed up. I arbitrarily use something like:
dummy = np.random.randint(0, 255, (640, 640, 3), np.uint8)  # update with your image size
_ = [model.predict(dummy) for _ in range(15)]  # warm up for 15 inference calls

which for my RTX2060 and yolov8n model is sufficient as a warm up, but you might find a different/better values for your model and hardware.

  1. TensorRT 10 is a lot easier to install on Windows. You can now just run a pip install (I have Windows 10) and it should work. It won't help the preprocessing speeds necessarily but it'll help with the overall inference speeds.

  2. A smaller resolution will help with inference times, but not really with preprocessing times.

  3. If I recall correctly, rectangular inference is supported for PyTorch models, but not on exported ones. I wouldn't recommend rectangular training if it's not absolutely necessary.

  4. You could also explore other options for ingesting your video stream, but I can't offer much advice here as I don't have any experience.

Overall I think when you do your timing checks, make sure to warm up the model every time, otherwise you'll see very high overall process times early on. Secondly, I suggest exploring the use of TensorRT as the recent update has made it easier to install on Windows.

from ultralytics.

Sachin-Wani avatar Sachin-Wani commented on June 20, 2024

Hi @Burhan-Q , thanks for sharing your insights.

  1. I usually do follow warming up the model couple of times for feeding in videos which helps me get consistent results (in terms of time taken).
  2. Thanks for this tip, I will certainly look into getting TensorRT, although its only going to speed up my inferencing.
  3. I was thinking since preprocessing is done through OpenCV (at the backend), is it possible to speed up the bottleneck of getting a frame and converting it into a format usable by the model?

Right now it takes me 56.2ms preprocess, 3.5ms inference, 0.1ms postprocess per image/frame.

I am not sure if this makes sense, but is there a way to do all the preprocessing on the GPU instead of CPU?

Looking at the preprocessing step in predict.py

Which essentially does:

def preprocess(self, img):
    img = torch.stack([self.transforms(Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))) for im in img], dim=0)
    img = (img if isinstance(img, torch.Tensor) else torch.from_numpy(img)).to(self.model.device)
    return img.half() if self.model.fp16 else img.float()  # uint8 to fp16/32

is this process the source of bottleneck or would it be the dataloader?

For now using vid_stride 3 makes it close to real time in rtsp stream but it'd be great to know if there's a way to speed up the image transforms.

Thanks again for the help and support!

P.S. I see users who have object detection running in real-time, I wonder if they have a better way on handling the data or using resolution that exactly matches the model input.

from ultralytics.

Burhan-Q avatar Burhan-Q commented on June 20, 2024

@Sachin-Wani so I ran a quick test using a public stream I found:

import time
from ultralytics import YOLO


def main():
    src = "http://77.222.181.11:8080/mjpg/video.mjpg"
    # 1/1: http://77.222.181.11:8080/mjpg/video.mjpg... Success ✅ (inf frames of shape 800x500 at 25.00 FPS)
    model = YOLO('yolov8n.pt')
    inference_stream = model.predict(src, stream=True, verbose=False)

    frames = 0
    t0 = time.time()
    times = []
    for result in inference_stream:
        frames += 1
        print(f"{result.speed = }")
        times.append(result.speed)

        if frames > 120 or time.time() - t0 > 120:
            break
    print("overall average process time: ", sum([sum(s.values()) for s in times]) / len(times))

if __name__ == "__main__":
    main()

Most of the time for my system was waiting on the stream to provide the next frame. The average process time at the end of a single test was $50 \text{ms}$ and as you can see I didn't do any warm up. The last few of preprocessing times in the terminal were around $2.0$ - $5.0 \text{ms}$.

Looking back at your original post, I noticed the CPU you're using has a very low max turbo clock speed of $3.10 \text{GHz}$ and base clock of $2.00 \text{GHz}$ which compared to my (significantly cheaper) CPU which can boost up to $4.80 \text{GHz}$ and has a base clock speed of $4.10 \text{GHz}$. I'd suspect (not certain) that the bottleneck could be the CPU itself. If you have another system you can test on that has a higher base/boost clock, I'd do a test there to see, because I can only speculate.

from ultralytics.

Sachin-Wani avatar Sachin-Wani commented on June 20, 2024

Hi @Burhan-Q , thanks for your suggestion. While I don't have a different system to run my code, just as a comparison, I ran the same program you have (with same address) in my PC. I ended up having this in my results (last 3 lines).

WARNING ⚠️ Waiting for stream 0
result.speed = {'preprocess': 0.0, 'inference': 15.627861022949219, 'postprocess': 0.0}
overall average process time: 18.615119713397064

Just the first time it took 375ms, then it was around 15-16ms. My preprocessing was always 0.0

For the sake of clarity, I also changed the yolov8n to yolov8n-cls (detection to classification and got these):

WARNING ⚠️ Waiting for stream 0
result.speed = {'preprocess': 15.516042709350586, 'inference': 0.0, 'postprocess': 0.0}
overall average process time: 15.364925816374004

In case of classifications, the number was swinging between preprocess and inference which was interesting to note.

I am not sure if its the CPU that's an issue looking at these numbers. Hope this helps getting closer to our solution.
Thanks again!

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.