Comments (5)
👋 Hello @Sachin-Wani, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.
If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
Install
Pip install the ultralytics
package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.
pip install ultralytics
Environments
YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
from ultralytics.
@Sachin-Wani just a quick couple of points to share.
- When checking inference times, it's always a good idea to warm up the GPU/model prior. The first few inference calls always have a higher (overall) processing time if not sufficiently warmed up. I arbitrarily use something like:
dummy = np.random.randint(0, 255, (640, 640, 3), np.uint8) # update with your image size
_ = [model.predict(dummy) for _ in range(15)] # warm up for 15 inference calls
which for my RTX2060 and yolov8n
model is sufficient as a warm up, but you might find a different/better values for your model and hardware.
-
TensorRT 10 is a lot easier to install on Windows. You can now just run a
pip
install (I have Windows 10) and it should work. It won't help the preprocessing speeds necessarily but it'll help with the overall inference speeds. -
A smaller resolution will help with inference times, but not really with preprocessing times.
-
If I recall correctly, rectangular inference is supported for PyTorch models, but not on exported ones. I wouldn't recommend rectangular training if it's not absolutely necessary.
-
You could also explore other options for ingesting your video stream, but I can't offer much advice here as I don't have any experience.
Overall I think when you do your timing checks, make sure to warm up the model every time, otherwise you'll see very high overall process times early on. Secondly, I suggest exploring the use of TensorRT as the recent update has made it easier to install on Windows.
from ultralytics.
Hi @Burhan-Q , thanks for sharing your insights.
- I usually do follow warming up the model couple of times for feeding in videos which helps me get consistent results (in terms of time taken).
- Thanks for this tip, I will certainly look into getting TensorRT, although its only going to speed up my inferencing.
- I was thinking since preprocessing is done through OpenCV (at the backend), is it possible to speed up the bottleneck of getting a frame and converting it into a format usable by the model?
Right now it takes me 56.2ms preprocess, 3.5ms inference, 0.1ms postprocess per image/frame.
I am not sure if this makes sense, but is there a way to do all the preprocessing on the GPU instead of CPU?
Looking at the preprocessing step in predict.py
Which essentially does:
def preprocess(self, img):
img = torch.stack([self.transforms(Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))) for im in img], dim=0)
img = (img if isinstance(img, torch.Tensor) else torch.from_numpy(img)).to(self.model.device)
return img.half() if self.model.fp16 else img.float() # uint8 to fp16/32
is this process the source of bottleneck or would it be the dataloader?
For now using vid_stride 3 makes it close to real time in rtsp stream but it'd be great to know if there's a way to speed up the image transforms.
Thanks again for the help and support!
P.S. I see users who have object detection running in real-time, I wonder if they have a better way on handling the data or using resolution that exactly matches the model input.
from ultralytics.
@Sachin-Wani so I ran a quick test using a public stream I found:
import time
from ultralytics import YOLO
def main():
src = "http://77.222.181.11:8080/mjpg/video.mjpg"
# 1/1: http://77.222.181.11:8080/mjpg/video.mjpg... Success ✅ (inf frames of shape 800x500 at 25.00 FPS)
model = YOLO('yolov8n.pt')
inference_stream = model.predict(src, stream=True, verbose=False)
frames = 0
t0 = time.time()
times = []
for result in inference_stream:
frames += 1
print(f"{result.speed = }")
times.append(result.speed)
if frames > 120 or time.time() - t0 > 120:
break
print("overall average process time: ", sum([sum(s.values()) for s in times]) / len(times))
if __name__ == "__main__":
main()
Most of the time for my system was waiting on the stream to provide the next frame. The average process time at the end of a single test was
Looking back at your original post, I noticed the CPU you're using has a very low max turbo clock speed of
from ultralytics.
Hi @Burhan-Q , thanks for your suggestion. While I don't have a different system to run my code, just as a comparison, I ran the same program you have (with same address) in my PC. I ended up having this in my results (last 3 lines).
WARNING
⚠️ Waiting for stream 0
result.speed = {'preprocess': 0.0, 'inference': 15.627861022949219, 'postprocess': 0.0}
overall average process time: 18.615119713397064
Just the first time it took 375ms, then it was around 15-16ms. My preprocessing was always 0.0
For the sake of clarity, I also changed the yolov8n to yolov8n-cls (detection to classification and got these):
WARNING
⚠️ Waiting for stream 0
result.speed = {'preprocess': 15.516042709350586, 'inference': 0.0, 'postprocess': 0.0}
overall average process time: 15.364925816374004
In case of classifications, the number was swinging between preprocess and inference which was interesting to note.
I am not sure if its the CPU that's an issue looking at these numbers. Hope this helps getting closer to our solution.
Thanks again!
from ultralytics.
Related Issues (20)
- Seeking Guidance on Integrating SuperPoint with YOLOv8 for Improved Keypoint and Object Detection HOT 2
- show_labels=False, show_conf=False parameters won't work (ultralytics==8.2.25) HOT 4
- Custom callback function HOT 7
- How to display OKS scores HOT 3
- Using OBB for pick and place on a robotic arm HOT 2
- Object Counting HOT 2
- Results of the same images different when used in validation or prediction HOT 2
- custom model architecture plot HOT 1
- Custom model in YOLOv8 HOT 3
- Custom Model Can Not Detection Object When Converted CoreML HOT 8
- Discrepancy in confusion matrix and Prediction.jon HOT 1
- MacOS error with TFLite model inference end2end model
- Segmentation for RTDERT HOT 2
- Change evaluation period HOT 4
- How does the confusion matrix of the object detection module works? HOT 3
- Difference between C2f and C2 HOT 4
- anchors of yolov8 HOT 3
- RTDETR training using OBB HOT 8
- NCNN model - passing config parameters HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.