Comments (3)
@MuhabHariri hello!
Thank you for providing detailed information about the issues you're encountering with customizing the backbone of YOLOv8. Let's address your concerns:
-
Input Size Issue: The initial input size seems to be set incorrectly. Ensure that the input size specified in your dataset's YAML file (
coco8.yaml
) matches the expected input size of 640x640. This file dictates the input dimensions during training, not the command line argument. -
Channel Mismatch Error: The error message indicates a mismatch in the number of channels expected by the convolutional layer. It seems there's a discrepancy between the output channels of one layer and the expected input channels of the subsequent layer. Double-check the output channels of each layer in your modified backbone to ensure they align correctly with the next layer's input requirements.
For a more detailed examination, it would be helpful to see the configuration of your coco8.yaml
and any modifications you've made to the model's architecture in the YAML file. This will help in pinpointing the exact cause of the discrepancies you're experiencing.
If the issues persist, consider providing the relevant sections of your configuration files to further diagnose the problem.
from ultralytics.
Thank you for your reply. unfortunately the problems are still persist. Here is the configuration of coco8.yaml file:
path: ../datasets/coco8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)
# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
input_size: [640, 640]
# Download script/URL (optional)
download: https://ultralytics.com/assets/coco8.zip
Additionally here is the configuration of Yolov8.yaml file:
nc: 80
depth_multiple: 0.33
width_multiple: 0.25
max_channels: 1024
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, UpsampleTensor, []] # 0
- [-1, 1, ConvMobileNetV2, [16, 3, 2]]
- [-1, 1, ConvMobileNetV2, [32, 3, 2]]
- [-1, 1, ConvMobileNetV2, [32, 3, 1]]
- [-1, 1, ConvMobileNetV2, [64, 3, 2]]
- [-1, 1, ConvMobileNetV2, [64, 3, 1]]
- [-1, 1, ConvMobileNetV2, [128, 3, 2]]
- [-1, 1, ConvMobileNetV2, [128, 3, 1]]
- [-1, 1, ConvMobileNetV2, [128, 3, 2]]
- [-1, 1, ConvMobileNetV2, [128, 3, 1]]
- [-1, 1, ConvMobileNetV2, [128, 3, 1]] #10
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]]
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]]
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]]
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]]
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
When I run the code: yolo train data=coco8.yaml model=ultralytics/cfg/models/v8/yolov8.yaml epochs=2 lr0=0.01
I encounter two problems. Please check the error and the printed dimensions at the beginning:
Input shape: torch.Size([1, 3, 256, 256])
Before upsampling: torch.Size([1, 3, 256, 256])
After upsampling: torch.Size([1, 3, 640, 640])
Input shape: torch.Size([1, 3, 640, 640])
Before initial_conv: torch.Size([1, 3, 640, 640])
After initial_conv: torch.Size([1, 16, 320, 320])
Input shape: torch.Size([1, 16, 320, 320])
Before initial_conv: torch.Size([1, 16, 320, 320])
After initial_conv: torch.Size([1, 32, 160, 160])
Input shape: torch.Size([1, 32, 160, 160])
Before initial_conv: torch.Size([1, 32, 160, 160])
After initial_conv: torch.Size([1, 32, 160, 160])
Input shape: torch.Size([1, 32, 160, 160])
Before initial_conv: torch.Size([1, 32, 160, 160])
After initial_conv: torch.Size([1, 64, 80, 80])
Input shape: torch.Size([1, 64, 80, 80])
Before initial_conv: torch.Size([1, 64, 80, 80])
After initial_conv: torch.Size([1, 64, 80, 80])
Input shape: torch.Size([1, 64, 80, 80])
Before initial_conv: torch.Size([1, 64, 80, 80])
After initial_conv: torch.Size([1, 128, 40, 40])
Input shape: torch.Size([1, 128, 40, 40])
Before initial_conv: torch.Size([1, 128, 40, 40])
After initial_conv: torch.Size([1, 128, 40, 40])
Input shape: torch.Size([1, 128, 40, 40])
Before initial_conv: torch.Size([1, 128, 40, 40])
After initial_conv: torch.Size([1, 128, 20, 20])
Input shape: torch.Size([1, 128, 20, 20])
Before initial_conv: torch.Size([1, 128, 20, 20])
After initial_conv: torch.Size([1, 128, 20, 20])
Input shape: torch.Size([1, 128, 20, 20])
Before initial_conv: torch.Size([1, 128, 20, 20])
After initial_conv: torch.Size([1, 128, 20, 20])
Before Concatenation: [torch.Size([1, 128, 40, 40]), torch.Size([1, 128, 40, 40])]
After Concatenation: torch.Size([1, 256, 40, 40])
Input tensor shape: torch.Size([1, 256, 40, 40])
Traceback (most recent call last):
File "C:\Users\muh\anaconda3\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\muh\anaconda3\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\muh\anaconda3\Scripts\yolo.exe\__main__.py", line 7, in <module>
File "C:\Users\muh\ultralytics\ultralytics\cfg\__init__.py", line 556, in entrypoint
model = YOLO(model, task=task)
File "C:\Users\muh\ultralytics\ultralytics\models\yolo\model.py", line 23, in __init__
super().__init__(model=model, task=task, verbose=verbose)
File "C:\Users\muh\ultralytics\ultralytics\engine\model.py", line 150, in __init__
self._new(model, task=task, verbose=verbose)
File "C:\Users\muh\ultralytics\ultralytics\engine\model.py", line 219, in _new
self.model = (model or self._smart_load("model"))(cfg_dict, verbose=verbose and RANK == -1) # build model
File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 305, in __init__
m.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s))]) # forward
File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 304, in <lambda>
forward = lambda x: self.forward(x)[0] if isinstance(m, (Segment, Pose, OBB)) else self.forward(x)
File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 97, in forward
return self.predict(x, *args, **kwargs)
File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 115, in predict
return self._predict_once(x, profile, visualize, embed)
File "C:\Users\muh\ultralytics\ultralytics\nn\tasks.py", line 136, in _predict_once
x = m(x) # run
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\muh\ultralytics\ultralytics\nn\modules\block.py", line 317, in forward
y = list(self.cv1(x).chunk(2, 1))
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\muh\ultralytics\ultralytics\nn\modules\conv.py", line 81, in forward
x = self.conv(x)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\muh\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [128, 6, 1, 1], expected input[1, 256, 40, 40] to have 6 channels, but got 256 channels instead
The structure of ConvMobileNetV2 and UpsampleTensor functions can be found in the first question in this thread. To better understand the data flow, I've added print statements before and after each operation in both functions. The resulting output, located at the beginning of the error code, shows the input and output of each layer.
The first issue is that the input is 256x256x3, and it is supposed to be 640x640x3. I tried to set the input to 640 in the yolo train command, but I still get the same input of 256x256x3.
The second error occurs when I start to input the feature maps from the backbone to the head, specifically at the input of block 12 (C2F block). I donβt understand why it expects the input [1, 256, 40, 40] to have 6 channels.
I would appreciate your help with this issue.
Best wishes,
from ultralytics.
Hello @MuhabHariri,
Thank you for sharing the additional details and configuration files. Let's address the issues you're encountering:
-
Input Size Issue: The input size discrepancy might be due to how the dataset images are being loaded or resized during training. Ensure that the resizing or augmentation settings in your training pipeline are correctly configured to output 640x640 images. This might not be directly visible in the YAML files but could be part of the dataset preprocessing steps in the code.
-
Channel Mismatch Error: The error
expected input[1, 256, 40, 40] to have 6 channels, but got 256 channels instead
suggests there's a mismatch in the expected input channels of a layer in your network. This typically happens when the output channels of one layer do not match the expected input channels of the subsequent layer. Double-check the output channels of eachConvMobileNetV2
layer and ensure they match the expected input channels of the next layer or operation. It seems like there might be a layer configuration that expects 6 channels but is receiving 256 instead. This could be a configuration error in defining the layers or a mistake in how layers are connected in your model architecture.
Given the complexity of the modifications you're making, I recommend stepping through each layer's output manually or using debugging tools to ensure the transformations and channel alterations are occurring as expected. This approach will help isolate the layer or operation causing the discrepancy and allow for more targeted troubleshooting.
If you continue to face difficulties, consider simplifying the model to a few layers, ensuring each step works as expected before adding more complexity. This incremental approach can often help pinpoint issues more effectively.
from ultralytics.
Related Issues (20)
- A problem with slow first startup recognition using Gpus HOT 2
- results object attributes and methods missing HOT 4
- mulgpu training error HOT 1
- Error caused by missing data definition in default.yaml. RuntimeError: Trying to create tensor with negative dimension -881: [0, -881] HOT 3
- I have a custom YOLOv8 model for detecting small to medium-sized objects in images. To further enhance inference speed, I aim to prune the model such that it avoids larger object detection layers. This optimization is crucial as my input images consistently fall within small or medium sizes. HOT 7
- Inference Time Variation HOT 4
- YOLOv8-ONNXRuntime doesn't applied to yolo-obb HOT 2
- Very slow data preparation training large dataset HOT 5
- Segmentation mask is too small for large detected objects (1280,500) on image size (1280, 720) HOT 3
- [Classify] The exported model's result differs from pt's result HOT 3
- yolov8n is the latest when using pip install ultralytics? what about v9 and v10?? HOT 2
- Deprecated code, please fix. HOT 1
- How to do cross Validation with yolov8? HOT 1
- How YOLOv8-pose works? HOT 8
- Some questions about fine-tuning and training HOT 3
- Difference between hub and local performance HOT 5
- Add export and inference pipeline for Rockchip / RKNN
- Evaluate Hailo-8 M.2 Module
- Docs Integration Page - Rockchip/RKNN HOT 1
- Docs Integration Page - Hailo AI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.