GithubHelp home page GithubHelp logo

Comments (3)

glenn-jocher avatar glenn-jocher commented on June 20, 2024

@MuhabHariri hello!

Thank you for providing detailed information about the issues you're encountering with customizing the backbone of YOLOv8. Let's address your concerns:

  1. Input Size Issue: The initial input size seems to be set incorrectly. Ensure that the input size specified in your dataset's YAML file (coco8.yaml) matches the expected input size of 640x640. This file dictates the input dimensions during training, not the command line argument.

  2. Channel Mismatch Error: The error message indicates a mismatch in the number of channels expected by the convolutional layer. It seems there's a discrepancy between the output channels of one layer and the expected input channels of the subsequent layer. Double-check the output channels of each layer in your modified backbone to ensure they align correctly with the next layer's input requirements.

For a more detailed examination, it would be helpful to see the configuration of your coco8.yaml and any modifications you've made to the model's architecture in the YAML file. This will help in pinpointing the exact cause of the discrepancies you're experiencing.

If the issues persist, consider providing the relevant sections of your configuration files to further diagnose the problem.

from ultralytics.

MuhabHariri avatar MuhabHariri commented on June 20, 2024

Thank you for your reply. unfortunately the problems are still persist. Here is the configuration of coco8.yaml file:

path: ../datasets/coco8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)

# Classes
names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush


input_size: [640, 640]

# Download script/URL (optional)
download: https://ultralytics.com/assets/coco8.zip

Additionally here is the configuration of Yolov8.yaml file:

nc: 80
depth_multiple: 0.33
width_multiple: 0.25
max_channels: 1024
# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, UpsampleTensor, []]        # 0
  - [-1, 1, ConvMobileNetV2, [16, 3, 2]] 
  - [-1, 1, ConvMobileNetV2, [32, 3, 2]] 
  - [-1, 1, ConvMobileNetV2, [32, 3, 1]] 
  - [-1, 1, ConvMobileNetV2, [64, 3, 2]] 
  - [-1, 1, ConvMobileNetV2, [64, 3, 1]] 
  - [-1, 1, ConvMobileNetV2, [128, 3, 2]]
  - [-1, 1, ConvMobileNetV2, [128, 3, 1]] 
  - [-1, 1, ConvMobileNetV2, [128, 3, 2]]
  - [-1, 1, ConvMobileNetV2, [128, 3, 1]] 
  - [-1, 1, ConvMobileNetV2, [128, 3, 1]] #10
 
 
 
# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, C2f, [512]] 
 
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 3, C2f, [256]] 
 
  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]] # cat head P4
  - [-1, 3, C2f, [512]] 
 
  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]] # cat head P5
  - [-1, 3, C2f, [1024]] 
 
  - [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)

When I run the code: yolo train data=coco8.yaml model=ultralytics/cfg/models/v8/yolov8.yaml epochs=2 lr0=0.01
I encounter two problems. Please check the error and the printed dimensions at the beginning:

Input shape: torch.Size([1, 3, 256, 256])
Before upsampling: torch.Size([1, 3, 256, 256])
After upsampling: torch.Size([1, 3, 640, 640])
Input shape: torch.Size([1, 3, 640, 640])
Before initial_conv: torch.Size([1, 3, 640, 640])
After initial_conv: torch.Size([1, 16, 320, 320])
Input shape: torch.Size([1, 16, 320, 320])
Before initial_conv: torch.Size([1, 16, 320, 320])
After initial_conv: torch.Size([1, 32, 160, 160])
Input shape: torch.Size([1, 32, 160, 160])
Before initial_conv: torch.Size([1, 32, 160, 160])
After initial_conv: torch.Size([1, 32, 160, 160])
Input shape: torch.Size([1, 32, 160, 160])
Before initial_conv: torch.Size([1, 32, 160, 160])
After initial_conv: torch.Size([1, 64, 80, 80])
Input shape: torch.Size([1, 64, 80, 80])
Before initial_conv: torch.Size([1, 64, 80, 80])
After initial_conv: torch.Size([1, 64, 80, 80])
Input shape: torch.Size([1, 64, 80, 80])
Before initial_conv: torch.Size([1, 64, 80, 80])
After initial_conv: torch.Size([1, 128, 40, 40])
Input shape: torch.Size([1, 128, 40, 40])
Before initial_conv: torch.Size([1, 128, 40, 40])
After initial_conv: torch.Size([1, 128, 40, 40])
Input shape: torch.Size([1, 128, 40, 40])
Before initial_conv: torch.Size([1, 128, 40, 40])
After initial_conv: torch.Size([1, 128, 20, 20])
Input shape: torch.Size([1, 128, 20, 20])
Before initial_conv: torch.Size([1, 128, 20, 20])
After initial_conv: torch.Size([1, 128, 20, 20])
Input shape: torch.Size([1, 128, 20, 20])
Before initial_conv: torch.Size([1, 128, 20, 20])
After initial_conv: torch.Size([1, 128, 20, 20])
Before Concatenation: [torch.Size([1, 128, 40, 40]), torch.Size([1, 128, 40, 40])]
After Concatenation: torch.Size([1, 256, 40, 40])
Input tensor shape: torch.Size([1, 256, 40, 40])
Traceback (most recent call last):
  File "C:\Users\muh\anaconda3\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\muh\anaconda3\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\muh\anaconda3\Scripts\yolo.exe\__main__.py", line 7, in <module>
  File "C:\Users\muh\ultralytics\ultralytics\cfg\__init__.py", line 556, in entrypoint
    model = YOLO(model, task=task)
  File "C:\Users\muh\ultralytics\ultralytics\models\yolo\model.py", line 23, in __init__
    super().__init__(model=model, task=task, verbose=verbose)
  File "C:\Users\muh\ultralytics\ultralytics\engine\model.py", line 150, in __init__
    self._new(model, task=task, verbose=verbose)
  File "C:\Users\muh\ultralytics\ultralytics\engine\model.py", line 219, in _new
    self.model = (model or self._smart_load("model"))(cfg_dict, verbose=verbose and RANK == -1)  # build model
  File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 305, in __init__
    m.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s))])  # forward
  File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 304, in <lambda>
    forward = lambda x: self.forward(x)[0] if isinstance(m, (Segment, Pose, OBB)) else self.forward(x)
  File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 97, in forward
    return self.predict(x, *args, **kwargs)
  File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 115, in predict
    return self._predict_once(x, profile, visualize, embed)
  File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 136, in _predict_once
    x = m(x)  # run
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\muh\ultralytics\ultralytics\nn\modules\block.py", line 317, in forward
    y = list(self.cv1(x).chunk(2, 1))
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\muh\ultralytics\ultralytics\nn\modules\conv.py", line 81, in forward
    x = self.conv(x)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [128, 6, 1, 1], expected input[1, 256, 40, 40] to have 6 channels, but got 256 channels instead

The structure of ConvMobileNetV2 and UpsampleTensor functions can be found in the first question in this thread. To better understand the data flow, I've added print statements before and after each operation in both functions. The resulting output, located at the beginning of the error code, shows the input and output of each layer.

The first issue is that the input is 256x256x3, and it is supposed to be 640x640x3. I tried to set the input to 640 in the yolo train command, but I still get the same input of 256x256x3.

The second error occurs when I start to input the feature maps from the backbone to the head, specifically at the input of block 12 (C2F block). I don’t understand why it expects the input [1, 256, 40, 40] to have 6 channels.

I would appreciate your help with this issue.

Best wishes,

from ultralytics.

glenn-jocher avatar glenn-jocher commented on June 20, 2024

Hello @MuhabHariri,

Thank you for sharing the additional details and configuration files. Let's address the issues you're encountering:

  1. Input Size Issue: The input size discrepancy might be due to how the dataset images are being loaded or resized during training. Ensure that the resizing or augmentation settings in your training pipeline are correctly configured to output 640x640 images. This might not be directly visible in the YAML files but could be part of the dataset preprocessing steps in the code.

  2. Channel Mismatch Error: The error expected input[1, 256, 40, 40] to have 6 channels, but got 256 channels instead suggests there's a mismatch in the expected input channels of a layer in your network. This typically happens when the output channels of one layer do not match the expected input channels of the subsequent layer. Double-check the output channels of each ConvMobileNetV2 layer and ensure they match the expected input channels of the next layer or operation. It seems like there might be a layer configuration that expects 6 channels but is receiving 256 instead. This could be a configuration error in defining the layers or a mistake in how layers are connected in your model architecture.

Given the complexity of the modifications you're making, I recommend stepping through each layer's output manually or using debugging tools to ensure the transformations and channel alterations are occurring as expected. This approach will help isolate the layer or operation causing the discrepancy and allow for more targeted troubleshooting.

If you continue to face difficulties, consider simplifying the model to a few layers, ensuring each step works as expected before adding more complexity. This incremental approach can often help pinpoint issues more effectively.

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.