Hello all, I'm pretty new to AI so I'm not fully aware of its configuration
The problem
I can launch a training with the mastr1325 dataset, all my dependencies seem okay, however:
I have many datasets to use for trainings (plus the mastr1325 one) but my datasets don't have any IMU masks for the images. From what I have read in the README.md, there are models that use IMU and models that don't. Even by choosing an "non IMU" model, I can't run the train.py script without having a complete folder of IMU masks set in the config files.
Launching a training without the IMU
My directory
I have created four folders in the eWaSR root directory (same level as train.py, predict.py etc.) :
- images, the mastr1325 images
- gt_masks, the mastr1325 ground truth annotations
- imu_masks, the mastr1325 IMU masks
- empty_folder, the name speaks for itself, for tests purposes
Config files
mastr1325_train.yaml
image_dir: ../images
mask_dir: ../gt_masks
imu_dir: ../empty_folder
image_list: train_images.txt
mastr1325_val.yaml
image_dir: ../images
mask_dir: ../gt_masks
imu_dir: ../imu_masks
image_list: val_images.txt
I let the imu_dir of this file be the directory for the validation process.
Modifications to the files
models.py (starting at line 31)
elif model_name.startswith('wasr_resnet18'):
model = wasr_deeplabv2_resnet18(num_classes=num_classes, imu=False) # imu=imu
elif model_name.startswith('ewasr'):
backbone = model_name.split("_")[1].split("_")[0]
model = ewasr(num_classes = num_classes, imu = False, backbone=backbone, **kwargs) # imu=imu
else:
raise ValueError('Unknown model: %s' % model_name)
As I want to use eWaSR_resnet18, I edited the parameters of the associated defs from imu to False.
train.py (line 27)
MODEL = "ewasr_resnet18"#'wasr_resnet18_imu'
Command
python3 train.py --train_config configs/mastr1325_train.yaml --val_config configs/mastr1325_val.yaml \
--model_name my_ewasr --validation --batch_size 4 --epochs 2 --model ewasr_resnet18
Results
Namespace(batch_size=4, enricher='SS', epochs=2, focal_loss_scale='labels', gpus='auto', learning_rate=1e-06, log_steps=20, lr_decay_pow=0.9, mixer='CCCCSS', model='ewasr_resnet18', model_name='my_ewasr', momentum=0.9, monitor_metric='val/loss', monitor_metric_mode='min', no_augmentation=False, no_separation_loss=False, num_classes=3, output_dir='output', patience=None, precision=32, pretrained=True, pretrained_weights=None, project=False, random_seed=None, resume_from=None, separation_loss_lambda=0.01, train_config='configs/mastr1325_train.yaml', val_config='configs/mastr1325_val.yaml', validation=True, weight_decay=1e-06, workers=8)
/home/user/.local/lib/python3.8/site-packages/lightning_fabric/utilities/seed.py:40: No seed found, seed set to 3616557247
Seed set to 3616557247
/home/user/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/user/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
/home/user/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
/home/user/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:740: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count
Invalid MIT-MAGIC-COOKIE-1 keyGPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
-----------------------------------------------
0 | model | WaSR | 60.3 M
1 | val_accuracy | PixelAccuracy | 0
2 | val_iou_0 | ClassIoU | 0
3 | val_iou_1 | ClassIoU | 0
4 | val_iou_2 | ClassIoU | 0
-----------------------------------------------
60.3 M Trainable params
0 Non-trainable params
60.3 M Total params
241.013 Total estimated model params size (MB)
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]/home/user/.local/lib/python3.8/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
warnings.warn(
/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:77: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 4. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
Epoch 0: 0%| | 0/324 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 155, in <module>
main()
File "train.py", line 151, in main
train_wasr(args)
File "train.py", line 144, in train_wasr
trainer.fit(model, train_dl, val_dl)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
results = self._run_stage()
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
self.fit_loop.run()
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
self.advance()
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
self.advance(data_fetcher)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 202, in advance
batch, _, __ = next(data_fetcher)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fetchers.py", line 127, in __next__
batch = super().__next__()
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fetchers.py", line 56, in __next__
batch = next(self.iterator)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/combined_loader.py", line 326, in __next__
out = next(self._iterator)
File "/home/user/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/combined_loader.py", line 74, in __next__
out[i] = next(self.iterators[i])
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/user/.local/lib/python3.8/site-packages/torch/_utils.py", line 694, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/Documents/Semseg/eWaSR/datasets/mastr.py", line 89, in __getitem__
imu_mask = np.array(Image.open(imu_path))
File "/home/user/.local/lib/python3.8/site-packages/PIL/Image.py", line 3243, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Documents/Semseg/eWaSR/empty_folder/0546.png'
Epoch 0: 0%| | 0/324 [00:00<?, ?it/s]
My question
Is it really possible to run a training without IMU masks or is it a deprecated feature from WaSR?
Additional question
I've also tried to reverse engineering the code and in mastr.py I saw a few things like:
# line 51
self.imu_dir = (self.dataset_dir / Path(data['imu_dir'])).resolve() if 'imu_dir' in data else None
...
# line 85
if self.imu_dir is not None:
imu_path = str(self.imu_dir / ('%s.png' % img_name))
imu_mask = np.array(Image.open(imu_path))
data['imu_mask'] = imu_mask
So this file seems to read the line 'imu_dir' in the .yaml files and get the path, from which the file opens each mask.
# line 102
features = {'image': img}
labels = {}
if self.include_original:
features['image_original'] = torch.from_numpy(img_original.transpose(2,0,1))
if 'segmentation' in data:
labels['segmentation'] = torch.from_numpy(data['segmentation'].transpose(2,0,1))
if 'imu_mask' in data:
features['imu_mask'] = torch.from_numpy(data['imu_mask'].astype(bool))
So I thought if I set the .yaml 'imu_dir' lines to null, no 'imu_dir' index would be used and so no IMU masks.
Config files
mastr1325_train.yaml
image_dir: ../images
mask_dir: ../gt_masks
imu_dir: null
image_list: train_images.txt
mastr1325_val.yaml
image_dir: ../images
mask_dir: ../gt_masks
imu_dir: null
image_list: val_images.txt
Modifications to the files
models.py (starting at line 31)
elif model_name.startswith('wasr_resnet18'):
model = wasr_deeplabv2_resnet18(num_classes=num_classes, imu=False) # imu=imu
elif model_name.startswith('ewasr'):
backbone = model_name.split("_")[1].split("_")[0]
model = ewasr(num_classes = num_classes, imu = False, backbone=backbone, **kwargs) # imu=imu
else:
raise ValueError('Unknown model: %s' % model_name)
As I want to use eWaSR_resnet18, I edited the parameters of the associated defs from imu to False.
train.py (line 27)
MODEL = "ewasr_resnet18"#'wasr_resnet18_imu'
Command
python3 train.py --train_config configs/mastr1325_train.yaml --val_config configs/mastr1325_val.yaml \
--model_name my_ewasr --validation --batch_size 4 --epochs 2 --model ewasr_resnet18
Results
Namespace(batch_size=4, enricher='SS', epochs=2, focal_loss_scale='labels', gpus='auto', learning_rate=1e-06, log_steps=20, lr_decay_pow=0.9, mixer='CCCCSS', model='ewasr_resnet18', model_name='my_ewasr', momentum=0.9, monitor_metric='val/loss', monitor_metric_mode='min', no_augmentation=False, no_separation_loss=False, num_classes=3, output_dir='output', patience=None, precision=32, pretrained=True, pretrained_weights=None, project=False, random_seed=None, resume_from=None, separation_loss_lambda=0.01, train_config='configs/mastr1325_train.yaml', val_config='configs/mastr1325_val.yaml', validation=True, weight_decay=1e-06, workers=8)
/home/user/.local/lib/python3.8/site-packages/lightning_fabric/utilities/seed.py:40: No seed found, seed set to 1347808435
Seed set to 1347808435
Traceback (most recent call last):
File "train.py", line 155, in <module>
main()
File "train.py", line 151, in main
train_wasr(args)
File "train.py", line 102, in train_wasr
train_ds = MaSTr1325Dataset(args.train_config, transform=transform,
File "/home/user/Documents/Semseg/eWaSR/datasets/mastr.py", line 51, in __init__
self.imu_dir = (self.dataset_dir / Path(data['imu_dir'])).resolve() if 'imu_dir' in data else None
File "/usr/lib/python3.8/pathlib.py", line 1042, in __new__
self = cls._from_parts(args, init=False)
File "/usr/lib/python3.8/pathlib.py", line 683, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib/python3.8/pathlib.py", line 667, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
I thought that because I set the 'imu_dir' to null, it would be None in Python so no 'imu_dir' attribute would be created in the class MaSTr1325Dataset. But again, it does not work.\
Is it possible to run eWaSR training without IMU masks, if yes, how?
Thank you for taking in consideration my request.