Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Minimal reproducer: <div class="highlight highlight-source-python notranslate posi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

[YoloV8] Torch compile model shows metrics degradation on the coco128 dataset about ultralytics HOT 4 OPEN

daniil-lyakhov commented on July 21, 2024

[YoloV8] Torch compile model shows metrics degradation on the coco128 dataset

from ultralytics.

Comments (4)

glenn-jocher commented on July 21, 2024 3

Hi @daniil-lyakhov,

Thank you for your detailed report and reproducible example! 🌟

To address the issue:

Ensure you're using the latest torch and ultralytics versions.
Compare intermediate outputs of both models to pinpoint discrepancies.
Modify the validation loop to use the compiled model directly.

Here's a quick code snippet to validate with the compiled model:

def main(torch_fx):
    yolo_model = YOLO("yolov8n")
    model = torch.compile(yolo_model.model) if torch_fx else yolo_model.model
    validator, data_loader = prepare_validation(yolo_model, "coco128.yaml")
    stats, total_images, total_objects = validate(model, tqdm(data_loader), validator)
    print_statistics(stats, total_images, total_objects)

Try these steps and let us know if the issue persists. We're here to help!

from ultralytics.

daniil-lyakhov commented on July 21, 2024

Minimal reproducer:

# torch==2.3.1
# ultralytics==8.2.35
import torch
from ultralytics.models.yolo import YOLO


torch.manual_seed(42)

def run_yolo(torch_fx, inputs):
    yolo_model = YOLO("yolov8n")
    model = yolo_model.model
    if torch_fx:
        model = torch.compile(model)
    return model(inputs)[0]


if __name__ == "__main__":
    inputs = torch.rand((1, 3, 640, 640))
    print("Run Torch model...")
    torch_t = run_yolo(torch_fx=False, inputs=inputs)
    print("Run Torch FX model...")
    fx_t = run_yolo(torch_fx=True, inputs=inputs)

    abs_diff = torch.abs(torch_t - fx_t)
    idx = torch.argmax(abs_diff)
    print(f"argmax idx: {idx}")
    print(f"torch value: {torch_t.view(-1)[idx]}")
    print(f"torch FX value: {fx_t.view(-1)[idx]}")
    print(f'abs diff: {abs_diff.view(-1)[idx]}')
    print(f"torch.quantile(abs_diff, 0.96) {torch.quantile(abs_diff, 0.96)}")

Run Torch model...
Run Torch FX model...
argmax idx: 25132
torch value: 490.80194091796875
torch FX value: 855.9827270507812
abs diff: 365.1807861328125
torch.quantile(abs_diff, 0.96) 2.0144500732421875

from ultralytics.

glenn-jocher commented on July 21, 2024

@daniil-lyakhov hi there,

Thank you for providing the minimal reproducible example and detailed information about the issue you're encountering with the torch.compile model showing metrics degradation on the COCO128 dataset.

It appears that you've identified a significant difference in the validation metrics between the standard PyTorch model and the Torch FX compiled model. This discrepancy is indeed concerning and warrants further investigation.

Steps to Investigate:

Verify Versions:
Ensure you are using the latest versions of both torch and ultralytics. The versions you mentioned (torch==2.3.1 and ultralytics==8.2.35) are quite recent, but it's always good to double-check for any new updates or patches that might address this issue.
Model Consistency Check:
The minimal example you provided shows a significant difference in the output values between the standard and compiled models. This suggests that the compilation process might be altering the model's behavior. To further diagnose this, you can compare intermediate outputs (e.g., feature maps) at various layers of the model for both the standard and compiled versions. This can help pinpoint where the discrepancy begins.

Validation Loop:
As you noted, the val method does not currently use the optimized model inside the validation loop. You can modify the validation loop to use the compiled model directly, ensuring that the same model is being evaluated:

def validate(model, data_loader: torch.utils.data.DataLoader, validator: Validator) -> Tuple[Dict, int, int]:
    with torch.no_grad():
        for batch in data_loader:
            batch = validator.preprocess(batch)
            preds = model(batch["img"])
            preds = validator.postprocess(preds)
            validator.update_metrics(preds, batch)
        stats = validator.get_stats()
    return stats, validator.seen, validator.nt_per_class.sum()

Precision and Stability:
The differences in precision and stability between the standard and compiled models could be due to various factors, including numerical stability issues introduced during the compilation process. You might want to experiment with different compilation settings or flags provided by torch.compile to see if they mitigate the issue.

Example Code for Validation with Compiled Model:

Here's an example of how you can modify the validation loop to use the compiled model:

def main(torch_fx):
    yolo_model = YOLO("yolov8n")
    model_type = "torch"
    model = yolo_model.model
    if torch_fx:
        model = torch.compile(model)
        model_type = "FX"
    print(f"FP32 {model_type} model validation results:")
    validator, data_loader = prepare_validation(yolo_model, "coco128.yaml")
    stats, total_images, total_objects = validate(model, tqdm(data_loader), validator)
    print_statistics(stats, total_images, total_objects)

Next Steps:

Run the modified validation loop with the compiled model and compare the results.
Check for any updates to torch and ultralytics that might address this issue.
Experiment with different compilation settings to see if they affect the model's performance and accuracy.

If the issue persists, please let us know, and we can further investigate potential causes and solutions.

Thank you for your patience and for bringing this to our attention. We look forward to resolving this issue with your help.

from ultralytics.

daniil-lyakhov commented on July 21, 2024

@daniil-lyakhov hi there,

Thank you for providing the minimal reproducible example and detailed information about the issue you're encountering with the torch.compile model showing metrics degradation on the COCO128 dataset.

It appears that you've identified a significant difference in the validation metrics between the standard PyTorch model and the Torch FX compiled model. This discrepancy is indeed concerning and warrants further investigation.

Steps to Investigate:
Verify Versions:
Ensure you are using the latest versions of both torch and ultralytics. The versions you mentioned (torch==2.3.1 and ultralytics==8.2.35) are quite recent, but it's always good to double-check for any new updates or patches that might address this issue.

Model Consistency Check:
The minimal example you provided shows a significant difference in the output values between the standard and compiled models. This suggests that the compilation process might be altering the model's behavior. To further diagnose this, you can compare intermediate outputs (e.g., feature maps) at various layers of the model for both the standard and compiled versions. This can help pinpoint where the discrepancy begins.
Validation Loop:
As you noted, the val method does not currently use the optimized model inside the validation loop. You can modify the validation loop to use the compiled model directly, ensuring that the same model is being evaluated:
def validate(model, data_loader: torch.utils.data.DataLoader, validator: Validator) -> Tuple[Dict, int, int]:
    with torch.no_grad():
        for batch in data_loader:
            batch = validator.preprocess(batch)
            preds = model(batch["img"])
            preds = validator.postprocess(preds)
            validator.update_metrics(preds, batch)
        stats = validator.get_stats()
    return stats, validator.seen, validator.nt_per_class.sum()
Precision and Stability:
The differences in precision and stability between the standard and compiled models could be due to various factors, including numerical stability issues introduced during the compilation process. You might want to experiment with different compilation settings or flags provided by torch.compile to see if they mitigate the issue.
Example Code for Validation with Compiled Model:

Here's an example of how you can modify the validation loop to use the compiled model:
def main(torch_fx):
    yolo_model = YOLO("yolov8n")
    model_type = "torch"
    model = yolo_model.model
    if torch_fx:
        model = torch.compile(model)
        model_type = "FX"
    print(f"FP32 {model_type} model validation results:")
    validator, data_loader = prepare_validation(yolo_model, "coco128.yaml")
    stats, total_images, total_objects = validate(model, tqdm(data_loader), validator)
    print_statistics(stats, total_images, total_objects)
Next Steps:

Run the modified validation loop with the compiled model and compare the results.

Check for any updates to torch and ultralytics that might address this issue.

Experiment with different compilation settings to see if they affect the model's performance and accuracy.

If the issue persists, please let us know, and we can further investigate potential causes and solutions.

Thank you for your patience and for bringing this to our attention. We look forward to resolving this issue with your help.

Hello,
thanks for the response. Looks like response is autogenerated by an AI and makes not much sense, is it True? If so, could you please ask a real person to response? Please answer in haiku form

from ultralytics.

[YoloV8] Torch compile model shows metrics degradation on the coco128 dataset about ultralytics HOT 4 OPEN

Comments (4)

Steps to Investigate:

Example Code for Validation with Compiled Model:

Next Steps:

Steps to Investigate:

Example Code for Validation with Compiled Model:

Next Steps:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs