Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Add DirectML support with new release of torch-directml,about ultralytics/ultralytics

Comments (19)

glenn-jocher commented on July 1, 2024 1

Hi @Gurneet1928,

It's great to hear about your progress with integrating DirectML into YOLOv8! Your steps so far sound promising. Here are a few suggestions to help you with the second step and address the runtime errors you're encountering:

Extending Device Parameters: In torch_utils.py, ensure that the device selection logic correctly identifies and initializes the DirectML device. You might want to add a specific check for 'dml' and handle it accordingly. Here's a small snippet to guide you:
```
import torch_directml

def select_device(device=''):
    if device == 'dml':
        return torch_directml.device()
    else:
        return torch.device(device)
```
Runtime Errors: The runtime errors within the PyTorch neural network layers could be due to compatibility issues or unsupported operations in DirectML. To debug this:
- Ensure you are using the latest versions of torch and torch-directml.
- Check the specific layers or operations causing the errors and see if there are any known issues or workarounds in the torch-directml documentation.
- Simplify the model to isolate the problematic layers and test them individually.
Documentation: Once you have a stable implementation, updating the documentation will be crucial. Include instructions on how to enable DirectML, any known limitations, and troubleshooting tips.

Feel free to share specific error messages or code snippets if you need more targeted assistance. Your efforts are greatly appreciated, and I'm sure the community will benefit from this enhancement. Keep up the great work! 🚀

from ultralytics.

glenn-jocher commented on July 1, 2024

@RobbStarkAustria hi Olaf,

Thank you for your suggestion and for providing the link to the new release of torch-directml. The idea of extending YOLOv8 to support DirectML is indeed intriguing and could potentially benefit users with a wider range of GPU hardware, including AMD accelerators.

To proceed, we need to ensure that the integration is seamless and does not introduce any unexpected issues. If you have any initial code or examples demonstrating how torch-directml can be integrated with YOLOv8, it would be greatly appreciated. This will help us better understand the implementation and testing requirements.

In the meantime, we encourage you to experiment with torch-directml on your setup and share any findings or issues you encounter. This collaborative effort will be invaluable in assessing the feasibility and performance of DirectML support in YOLOv8.

Thank you for your continued support and contributions to the YOLO community!

from ultralytics.

Gurneet1928 commented on July 1, 2024

@RobbStarkAustria hi Olaf,

Thank you for your suggestion and for providing the link to the new release of torch-directml. The idea of extending YOLOv8 to support DirectML is indeed intriguing and could potentially benefit users with a wider range of GPU hardware, including AMD accelerators.

To proceed, we need to ensure that the integration is seamless and does not introduce any unexpected issues. If you have any initial code or examples demonstrating how torch-directml can be integrated with YOLOv8, it would be greatly appreciated. This will help us better understand the implementation and testing requirements.

In the meantime, we encourage you to experiment with torch-directml on your setup and share any findings or issues you encounter. This collaborative effort will be invaluable in assessing the feasibility and performance of DirectML support in YOLOv8.

Thank you for your continued support and contributions to the YOLO community!

Hey @glenn-jocher , I have been following directml from past few months and I think I can implement it in yolov8 model. Let me see what I can do about it. Will it be beneficial to use directml with Yolo especially for AMD GPUs ?

from ultralytics.

glenn-jocher commented on July 1, 2024

Hi @Gurneet1928,

That's fantastic to hear that you're interested in implementing DirectML with YOLOv8! Leveraging DirectML can indeed be beneficial, especially for users with AMD GPUs or other DirectX12 compatible hardware. It can potentially broaden the accessibility and performance of YOLOv8 across a wider range of devices.

To ensure a smooth integration, here are a few steps you might consider:

Experimentation: Start by experimenting with torch-directml on your local setup. This will help you understand any potential challenges or limitations.
Integration: Implement DirectML support in YOLOv8 by extending the device parameter to include 'dml'. This will allow users to specify DirectML as their preferred backend.
Testing: Thoroughly test the integration to ensure it works seamlessly with existing functionalities and does not introduce any regressions.
Documentation: Update the documentation to guide users on how to enable and use DirectML with YOLOv8.

If you encounter any issues or have specific questions during the implementation, feel free to share them here. The community and the Ultralytics team are always here to help!

Looking forward to your contributions and findings. Your efforts will undoubtedly enhance the versatility and performance of YOLOv8 for many users. 🚀

from ultralytics.

Gurneet1928 commented on July 1, 2024

@glenn-jocher
Sure, already on it actually.

For the experimentation part, already using the DirectML on my local setup, so no issues with that. Infact, the microsoft shared YOLOv3 + DirectML sample works charm.
Trying to extend the device parameters in torch_utils.py file. It is accepting the DirectML device and is returning the same.
Testing to be done
Documentation to be done.

Let me know if you can help in 2nd step.
Thank you

update:

The torch_directml seems to be accepting the amd devices. However, there seems to be runtime errors inside the pytorch neural network layers. Gotta deep dive inside all the .py files.

from ultralytics.

Gurneet1928 commented on July 1, 2024

@glenn-jocher Thanks for the suggestion.

So, now I tried to run the inferences not from CLI but from python notebook using the clone repo (not the pip installed ultralytics). I individually imported predict.py file. This seems to be working and accepting the DirectML. Tried to inference on a 30sec video, the DirectML uses 22.9s whereas CPU takes 26.5s to inference the whole 750 frames. Added the screenshot as well.

CPU Output

Using DirectML

Another thing I noted is that, sometimes results are inconsistent between CPU and DML. Not sure why though, but I have a theory actually. DML shows two device_counts. These can be (CPU,GPU) pair or a (GPU,GPU) [dedicated,integrated] pair. In any case, the dedicated GPU and CPU are considerably slower than integrated GPU, so the results should be better in the latter.

Update:

Tried the webcam feed to predict in real-time and the results are much promising.

CPU Real-time

DirectML Real-time

from ultralytics.

glenn-jocher commented on July 1, 2024

Hi @Gurneet1928,

Thank you for sharing your detailed observations and results! It's great to see that you've made significant progress with DirectML integration and have provided comparative performance metrics. 🎉

To address the inconsistencies between CPU and DirectML results, here are a few suggestions:

Device Selection: Ensure that the correct device is being selected for DirectML. You can explicitly specify the device to avoid any ambiguity. For example:
```
import torch_directml

device = torch_directml.device()
model.to(device)
```
Reproducibility: If possible, please provide a minimum reproducible code example that demonstrates the inconsistencies. This will help us investigate the issue more effectively. You can refer to our minimum reproducible example guide for more details.
Version Check: Verify that you are using the latest versions of torch, torch-directml, and ultralytics. If not, please upgrade and try again:
```
pip install --upgrade torch ultralytics
```
Debugging: To further debug the inconsistencies, you might want to log the device properties and ensure that the same device is being used consistently across different runs.

If you continue to experience issues, please share the specific code snippets and any error messages you encounter. This will help us provide more targeted assistance.

Thank you for your contributions and for helping to enhance YOLOv8's compatibility with a broader range of hardware. Your efforts are greatly appreciated by the community! 🚀

from ultralytics.

Gurneet1928 commented on July 1, 2024

Hey @glenn-jocher .

One more promising update from my side.
A problem popped while using the "train.py" file, but seems to be resolved now, since the file was only accepting ["cpu","cuda","mps"] to its device list. Added "dml" into the list and now it works. The training now utilizes the GPU properly (visible from task manager).

Regarding the DML device count issue, it’s interesting to note that DML prioritizes the dedicated GPU over the integrated GPU, resulting in a [GPU,GPU] pair (Dedicated,Integrated).

Training Results on "yolov8n-cls" Model with "mnist" Data for 1 Epoch

Devices used:

CPU - Ryzen 5 6500H
- Performance: ~1.3-1.5it/s
- Utilizes default arguments.

CPU - Intel Xeon 2.2GHz (Colab)
- Performance: ~1-1.2it/s
- Comparable to Ryzen 5.

GPU (DirectML) - Radeon RX 6500M
- Dedicated Memory: 4GB
- Performance: ~3-6it/s
- Notable performance for a mobile GPU.

GPU (Cuda) - Nvidia T4 (Colab)
- Performance: ~10-12it/s
- Workstation GPU optimized for AI/ML tasks.

Results Summary:

The CPUs (Ryzen 5 and Intel Xeon) show expected performance levels.
The Radeon 6500M demonstrates a significant performance gap compared to CPUs, which is impressive for a mobile GPU.
Higher-end AMD GPUs are expected to deliver even better performance.
The Nvidia T4's performance aligns with expectations for a workstation GPU.

I would be happy to provide any additional metrics you need for comparison.

Training Code:

from ultralytics.models.yolo.classify import train
from ultralytics.models.yolo.classify import ClassificationTrainer

args = dict(model='yolov8n-cls.pt', data='mnist', epochs=1, device="dml")
trainer = ClassificationTrainer(overrides=args)
trainer.train()

Prediction Code:

from ultralytics.utils import ASSETS
from ultralytics.models.yolo.classify import ClassificationPredictor

args = dict(model='yolov8m-cls.pt', source=0, device="cpu",)
predictor = ClassificationPredictor(overrides=args)
predictor.predict_cli()

(Note: Ensure you run the above codes after cloning the repo and executing the python codes from within the cloned repository.)

IMP: Can you provide me a list of tests or files that I must check and conduct reviews to finalize the working of DirectML. Currently, I have tested the predict.py and train.py files in ultralytics>models>yolo>classify folder.

A simplified list will surely help in efficient testing of DirectML and will resolve errors before finalizing the repository.

from ultralytics.

glenn-jocher commented on July 1, 2024

Hi @Gurneet1928,

Thank you for the detailed update and for sharing your findings! It's fantastic to see the progress you've made with integrating DirectML into YOLOv8. Your performance comparisons across different devices are particularly insightful.

Regarding your request for a list of tests or files to check for finalizing the DirectML integration, here are some key areas to focus on:

Core Functionality:
- train.py: Ensure training works seamlessly across different datasets and configurations.
- predict.py: Verify that predictions are accurate and consistent across various input sources (images, videos, webcam).
Model Types:
- Detection: Test with YOLOv8 detection models.
- Segmentation: Ensure segmentation models work correctly.
- Classification: As you've already done, test classification models.
Device Compatibility:
- Test across different hardware setups to ensure compatibility and performance consistency.
- Validate that the device selection logic correctly prioritizes and utilizes the appropriate GPU.
Edge Cases:
- Handle scenarios where DirectML might not be available or compatible.
- Ensure fallback mechanisms to CPU or other supported devices are robust.
Documentation:
- Update the documentation to include instructions for enabling and using DirectML.
- Highlight any known limitations or special considerations.

Here's a simplified checklist to guide your testing:

Checklist for DirectML Integration

Training:
- Train detection models (yolov8n.pt) on a sample dataset.
- Train segmentation models (yolov8n-seg.pt) on a sample dataset.
- Train classification models (yolov8n-cls.pt) on a sample dataset.
Prediction:
- Run predictions on images using detection models.
- Run predictions on videos using detection models.
- Run predictions on webcam feed using detection models.
- Run predictions using segmentation models.
- Run predictions using classification models.
Device Handling:
- Verify device selection logic for DirectML.
- Test fallback to CPU when DirectML is not available.
- Ensure compatibility with different GPU setups (integrated vs. dedicated).
Edge Cases:
- Handle low-memory scenarios gracefully.
- Ensure consistent performance across different hardware.
Documentation:
- Update usage instructions for DirectML.
- Document any known issues or limitations.

By following this checklist, you can ensure a comprehensive and robust integration of DirectML into YOLOv8. If you encounter any specific issues or need further assistance, feel free to share the details here.

Thank you for your contributions and dedication to enhancing YOLOv8. Your efforts are greatly appreciated by the community! 🚀

from ultralytics.

Gurneet1928 commented on July 1, 2024

Hey @glenn-jocher ,

While working on segmentation and detection models, I came across something. Looks like all the pytorch modules are currently not supported by DirectML. As a result, the model is facing issues while computing the loss metrics in segmentation and object detection.

Only the object classification models work as of now. Other than that, all the files i.e. "predict.py", "train.py", "val.py" are working in tandem with object classification using directML.

Plus, I have added the logs to display the DirectML device name, and the device name list incase multiple DirectML supported devices are present. This allows the users to select specific AMD GPU based on the list by mentioning the device index in argument device="dml:{device_index}".

I am still trying to complete all the checks you provided, but this may be possible for classification models, while the future for segmentation and detection model still remains uncertain.

Thank you for looking into this.

from ultralytics.

RobbStarkAustria commented on July 1, 2024

Hi @Gurneet1928,

Thank you for looking into DirectML!

I have done also some tests with DirectML, but I started with object-detection not classification. I run also in this error and try to solve this by converting to numpy and use the return counts-operator of numpy in loss.py:

if targets.device.type == "privateuseone":
    i = i.cpu().numpy()
    _, counts = np.unique(i, return_counts=True)
    counts = torch.as_tensor(counts, dtype=torch.int)
else:
   _, counts = i.unique(return_counts=True)

I have some other incompatibilities to solve. A train run ends without a code error, but the loss will always 0.

Maybe my way could help you.

Regards

Olaf

from ultralytics.

Gurneet1928 commented on July 1, 2024

Hi @Gurneet1928,

Thank you for looking into DirectML!

I have done also some tests with DirectML, but I started with object-detection not classification. I run also in this error and try to solve this by converting to numpy and use the return counts-operator of numpy in loss.py:
if targets.device.type == "privateuseone":
    i = i.cpu().numpy()
    _, counts = np.unique(i, return_counts=True)
    counts = torch.as_tensor(counts, dtype=torch.int)
else:
   _, counts = i.unique(return_counts=True)
I have some other incompatibilities to solve. A train run ends without a code error, but the loss will always 0.

Maybe my way could help you.

Regards

Olaf

Hey, thanks for the suggestion. This seems to be working, but throws up another error

I am using the code u shared in loss.py as follows, in the object-detection preprocess function:

Thank you for your help

Regards
Gurneet

from ultralytics.

glenn-jocher commented on July 1, 2024

Hi Olaf,

Thank you for sharing your insights and the workaround using numpy for the DirectML incompatibility in loss.py. It's great to see the community collaborating to tackle these challenges!

Gurneet, it looks like you're on the right track by incorporating Olaf's suggestion. However, the new error indicates there might be additional layers or operations that are not yet supported by DirectML.

Here are a few steps to help debug and potentially resolve this issue:

Check for Other Incompatibilities: Review the entire training loop and identify any other operations that might not be supported by DirectML. You might need to apply similar workarounds as Olaf's for those operations.
Incremental Testing: Simplify your model and training loop to the most basic form and gradually add complexity. This can help isolate the specific operations causing issues.
Logging and Debugging: Add detailed logging to track the flow of data and identify where the process breaks. This can provide more context for the errors you're encountering.
Community and Documentation: Keep an eye on updates from the DirectML and PyTorch communities. New releases might address some of these incompatibilities.

If you can provide a minimum reproducible example that demonstrates the issue, it would be immensely helpful for further investigation. You can refer to our minimum reproducible example guide for more details.

Thank you both for your contributions and efforts in enhancing YOLOv8's compatibility with DirectML. Your collaboration is invaluable to the community!

from ultralytics.

Gurneet1928 commented on July 1, 2024

Hi Olaf,

Thank you for sharing your insights and the workaround using numpy for the DirectML incompatibility in loss.py. It's great to see the community collaborating to tackle these challenges!

Gurneet, it looks like you're on the right track by incorporating Olaf's suggestion. However, the new error indicates there might be additional layers or operations that are not yet supported by DirectML.

Here are a few steps to help debug and potentially resolve this issue:

Check for Other Incompatibilities: Review the entire training loop and identify any other operations that might not be supported by DirectML. You might need to apply similar workarounds as Olaf's for those operations.

Incremental Testing: Simplify your model and training loop to the most basic form and gradually add complexity. This can help isolate the specific operations causing issues.

Logging and Debugging: Add detailed logging to track the flow of data and identify where the process breaks. This can provide more context for the errors you're encountering.

Community and Documentation: Keep an eye on updates from the DirectML and PyTorch communities. New releases might address some of these incompatibilities.

If you can provide a minimum reproducible example that demonstrates the issue, it would be immensely helpful for further investigation. You can refer to our minimum reproducible example guide for more details.

Thank you both for your contributions and efforts in enhancing YOLOv8's compatibility with DirectML. Your collaboration is invaluable to the community!

Thanks for the suggestion @glenn-jocher . I am still trying to fix the errors through torch_directml documentations.

As far as Minimum Reproducible Example is considered, it would be difficult to replicate the same in "minimized" manner, since the every file is dependent on each other.

However, I have uploaded all the changed YoloV8 files alongside a trial.ipynb notebook on Github Repository (Link: https://github.com/Gurneet1928/yolov8_directml)

To run the codes, clone the repo using:
git clone https://github.com/Gurneet1928/yolov8_directml.git

And in the "trials.ipynb" file, there should be a section Training Different Models on DML [ Detection, Segmentation, Classification]

Try to execute the codes from the repo itself.

Thank you

from ultralytics.

RobbStarkAustria commented on July 1, 2024

Hi Gurneet(@Gurneet1928),

Hey, thanks for the suggestion. This seems to be working, but throws up another error

Yes, I also run in this error in tal.py. I tried to use the same 'trick' as before by copying the data to cpu-device. I use the following code. It throws no error, but I had no time to validate if the result is the same compared to the original code:

if topk_idxs.device.type == "privateuseone":
    count_tensor = count_tensor.cpu()
    count_tensor.scatter_add_(-1, topk_idxs[:, :, k : k + 1].cpu(), ones.cpu())
    count_tensor = count_tensor.to(topk_idxs.device)
else:
    count_tensor.scatter_add_(-1, topk_idxs[:, :, k : k + 1], ones)

Maybe you have the chance to compare the value of counter_tensor.

And the question is whether the performance doesn't suffer too much from the constant copying back and forth.

Regards

from ultralytics.

glenn-jocher commented on July 1, 2024

@RobbStarkAustria hi Olaf,

Thank you for sharing your workaround for the issue in tal.py. It's great to see the community collaborating to address these challenges! 😊

Gurneet, it looks like Olaf's approach might help you move forward. Here’s the modified code snippet for tal.py that you can try:

if topk_idxs.device.type == "privateuseone":
    count_tensor = count_tensor.cpu()
    count_tensor.scatter_add_(-1, topk_idxs[:, :, k : k + 1].cpu(), ones.cpu())
    count_tensor = count_tensor.to(topk_idxs.device)
else:
    count_tensor.scatter_add_(-1, topk_idxs[:, :, k : k + 1], ones)

This should help bypass the DirectML incompatibility by temporarily moving the tensors to the CPU for the scatter_add_ operation. However, as Olaf mentioned, it's important to validate whether this workaround maintains the same results as the original code and to assess any potential performance impact due to the constant copying between devices.

Next Steps:

Validation: Compare the results of the modified code with the original implementation to ensure consistency.
Performance Testing: Measure the performance to determine if the additional CPU-GPU transfers significantly affect the training speed.

If you encounter further issues or need additional assistance, please provide more details or share a minimum reproducible example. This will help us better understand the problem and provide more targeted support.

Thank you both for your contributions and efforts in enhancing YOLOv8's compatibility with DirectML. Your collaboration is invaluable to the community! 🚀

from ultralytics.

Gurneet1928 commented on July 1, 2024

Thanks for you suggestions @RobbStarkAustria @glenn-jocher ,

After using the shared code snippet, I seem to run into another issue this time. Would be great if someone can help me on this:

{
	"name": "RuntimeError",
	"message": "The parameter is incorrect.",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 8
      6 args = dict(model='yolov8n.pt', data=\"coco8.yaml\", epochs=5, device=\"dml\", imgsz=320)
      7 trainer = DetectionTrainer(overrides=args)
----> 8 trainer.train()

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\engine\\trainer.py:199, in BaseTrainer.train(self)
    196         ddp_cleanup(self, str(file))
    198 else:
--> 199     self._do_train(world_size)

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\engine\\trainer.py:424, in BaseTrainer._do_train(self, world_size)
    422 # Validation
    423 if self.args.val or final_epoch or self.stopper.possible_stop or self.stop:
--> 424     self.metrics, self.fitness = self.validate()
    425 self.save_metrics(metrics={**self.label_loss_items(self.tloss), **self.metrics, **self.lr})
    426 self.stop |= self.stopper(epoch + 1, self.fitness) or final_epoch

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\engine\\trainer.py:565, in BaseTrainer.validate(self)
    559 def validate(self):
    560     \"\"\"
    561     Runs validation on test set using self.validator.
    562 
    563     The returned dict is expected to contain \"fitness\" key.
    564     \"\"\"
--> 565     metrics = self.validator(self)
    566     fitness = metrics.pop(\"fitness\", -self.loss.detach().cpu().numpy())  # use loss as fitness measure if not found
    567     if not self.best_fitness or self.best_fitness < fitness:

File c:\\Users\\Gurneet Singh\\.conda\\envs\\directml\\Lib\\site-packages\\torch\\utils\\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\engine\\validator.py:187, in BaseValidator.__call__(self, trainer, model)
    185 # Postprocess
    186 with dt[3]:
--> 187     preds = self.postprocess(preds)
    189 self.update_metrics(preds, batch)
    190 if self.args.plots and batch_i < 3:

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\models\\yolo\\detect\\val.py:89, in DetectionValidator.postprocess(self, preds)
     87 def postprocess(self, preds):
     88     \"\"\"Apply Non-maximum suppression to prediction outputs.\"\"\"
---> 89     return ops.non_max_suppression(
     90         preds,
     91         self.args.conf,
     92         self.args.iou,
     93         labels=self.lb,
     94         multi_label=True,
     95         agnostic=self.args.single_cls,
     96         max_det=self.args.max_det,
     97     )

File c:\\Users\\Gurneet Singh\\Desktop\\projects\\yolov8_directml\\ultralytics\\utils\\ops.py:258, in non_max_suppression(prediction, conf_thres, iou_thres, classes, agnostic, multi_label, labels, max_det, nc, max_time_img, max_nms, max_wh, in_place, rotated)
    256 if multi_label:
    257     i, j = torch.where(cls > conf_thres)
--> 258     x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
    259 else:  # best class only
    260     conf, j = cls.max(1, keepdim=True)

RuntimeError: The parameter is incorrect."
}

from ultralytics.

RobbStarkAustria commented on July 1, 2024

Hi Gurneet, @Gurneet1928

for this error you can do this:

if prediction.device.type == "privateuseone":
    x = torch.cat((box[i].cpu(), x[i, 4 + j, None].cpu(), j[:, None].cpu().float(), mask[i].cpu()), 1)
    x = x.to(prediction.device)
else:
    x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)

Then the code runs without error, but the metrics are always '0'. So we have to dive deeper in the code to find out the reason for this behaviour.

Regards

Olaf

from ultralytics.

Gurneet1928 commented on July 1, 2024

Hey Olaf @RobbStarkAustria ,
Thanks for your code snippet again, it works now.

As far as 0 metrics are considered, I also face the same issue. I think I have a theory for this. Maybe, some where in the code, some of the results are being stored in cpu while some in dml. As a results, the metrics return 0. Since the metrics show values for "cpu" and not for "dml", this is the theory I come up it. I saw some places in code where the variables are being sent over to cpu. In our case, if the thing works, we don't do any change, otherwise we send it to cpu and then return it from there.

This is what I was able to come up with.

Thank you for helping

Regards
Gurneet

from ultralytics.

Add DirectML support with new release of torch-directml about ultralytics HOT 19 OPEN

Comments (19)

CPU Output

Using DirectML

CPU Real-time

DirectML Real-time

Training Results on "yolov8n-cls" Model with "mnist" Data for 1 Epoch

Devices used:

Results Summary:

Training Code:

Prediction Code:

IMP: Can you provide me a list of tests or files that I must check and conduct reviews to finalize the working of DirectML. Currently, I have tested the predict.py and train.py files in ultralytics>models>yolo>classify folder.

A simplified list will surely help in efficient testing of DirectML and will resolve errors before finalizing the repository.

Checklist for DirectML Integration

Next Steps:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs