PyTorch implementation of multi-task learning architectures, incl. MTI-Net (ECCV2020).

License: Other

Python 98.66% MATLAB 1.34%

multi-task-learning computer-vision pytorch nyud pascal segmentation scene-understanding eccv2020

multi-task-learning-pytorch's Introduction

Multi-Task Learning

This repo aims to implement several multi-task learning models and training strategies in PyTorch. The code base complements the following works:

Multi-Task Learning for Dense Prediction Tasks: A Survey

Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai and Luc Van Gool.

MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

Simon Vandenhende, Stamatios Georgoulis and Luc Van Gool.

An up-to-date list of works on multi-task learning can be found here.

Workshop

📢 📢 📢 We organized a workshop on multi-task learning at ICCV 2021 (Link).

Jan 13: The recordings of our invited talks are now available on Youtube.

Installation

The code runs with recent Pytorch version, e.g. 1.4. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
conda install imageio scikit-image		   	   # Image operations
conda install -c conda-forge opencv		           # OpenCV
conda install pyyaml easydict                 		   # Configurations
conda install termcolor                       		   # Colorful print statements

We refer to the requirements.txt file for an overview of the package versions in our own environment.

Usage

Setup

The following files need to be adapted in order to run the code on your own machine:

Change the file paths to the datasets in utils/mypath.py, e.g. /path/to/pascal/.
Specify the output directory in configs/your_env.yml. All results will be stored under this directory.
The seism repository is needed to perform the edge evaluation. See the README in ./evaluation/seism/.
If you want to use the HRNet backbones, please download the pre-trained weights here. The provided config files use an HRNet-18 backbone. Download the hrnet_w18_small_model_v2.pth and save it to the directory ./models/pretrained_models/.

The datasets will be downloaded automatically to the specified paths when running the code for the first time.

Training

The configuration files to train the model can be found in the configs/ directory. The model can be trained by running the following command:

python main.py --config_env configs/env.yml --config_exp configs/$DATASET/$MODEL.yml

Evaluation

We evaluate the best model at the end of training. The evaluation criterion is based on Equation 10 from our survey paper and requires to pre-train a set of single-tasking networks beforehand. To speed-up training, it is possible to evaluate the model only during the final 10 epochs by adding the following line to your config file:

eval_final_10_epochs_only: True

Support

The following datasets and tasks are supported.

Dataset	Sem. Seg.	Depth	Normals	Edge	Saliency	Human Parts
PASCAL	Y	N	Y	Y	Y	Y
NYUD	Y	Y	Aux	Aux	N	N

The following models are supported.

Backbone	HRNet	ResNet
Single-Task	Y	Y
Multi-Task	Y	Y
Cross-Stitch		Y
NDDR-CNN		Y
MTAN		Y
PAD-Net	Y
MTI-Net	Y

References

This code repository is heavily based on the ASTMT repository. In particular, the evaluation and dataloaders were taken from there.

Citation

If you find this repo useful for your research, please consider citing the following works:

@article{
  author={S. Vandenhende and S. Georgoulis and W. Van Gansbeke and M. Proesmans and D. Dai and L. Van Gool},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Multi-Task Learning for Dense Prediction Tasks: A Survey}, 
  year={2021},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TPAMI.2021.3054719}}

@article{vandenhende2020mti,
  title={MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning},
  author={Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={ECCV2020},
  year={2020}
}

@InProceedings{MRK19,
  Author    = {Kevis-Kokitsi Maninis and Ilija Radosavovic and Iasonas Kokkinos},
  Title     = {Attentive Single-Tasking of Multiple Tasks},
  Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  Year      = {2019}
}

@article{pont2015supervised,
  title={Supervised evaluation of image segmentation and object proposal techniques},
  author={Pont-Tuset, Jordi and Marques, Ferran},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2015},
}

Updates

For more information see issue #1.

The initial code used the NYUDv2 dataloader from ASTMT. This implementation was different from the one we used to run our experiments in the survey. Therefore, we have re-written the NYUDv2 dataloader to be consistent with our survey results. To avoid any issues, it is best to remove your old version of the NYUDv2 dataset. The python script will then automatically download the correct version when using the NYUDv2 dataset.

The depth task is evaluated in a pixel-wise fashion to be consistent with the survey. This is different from ASTMT, which averages the results across the images.

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here.

Acknoledgements

The authors acknowledge support by Toyota via the TRACE project and MACCHINA (KULeuven, C14/18/065).

multi-task-learning-pytorch's People

Stargazers

Watchers

Forkers

sgeorgou pkrouth saqibmobin mangalbhaskar wdjang liuguoyou kiminh flamehaze1115 frozenheart1998 mengze-96 simon4yan bapleliu linkonbsmrstu dlwbm123 davidocea yirui-fafa liaoxianglai tianhaofu gracewx acmllearner beoy yanbigong2 molybdenumyz manaspalaparthi pha-nguyen 798283635 eric-mingxiao dr-dahou-adrar chuanchuanzheng xuwei1111 danielp3011 royzon purpleleaves007 zys1994 quant-kangchen muftawoomar markwjj stc-cqupt sckmat dimitri-sinodinos daniels91 backswimming mengkunzhao ydl832 tina1994 bm21 vishalbelsare wf111hui tiffen mgsong mttsky evdcush jjaskirat hit-liuchen zhgsxf tzjtatata hlhfhmt jimmyiskandar dengxunzhi hanshan1 kashefy ganbadei jameschun0958 lvchigo j-wu97 nengwp suyanzhou626 xxlin-haa jdc08161063 chooron cagataysari xihechn neetmehta atlasgooo2 li-qingyun qi-zhangyang mncuevas tong-wj david-tangk jlhou gbyy422990 rahultgirijal klc5cr6k zmh0124 arjunroyihrpa iammuratc aks276 jawadefaj mldl hongshen-z mj-x junjie2008v onlynata krismxhe ml-d strifee abcdump steven-xiong adblu z-nkk

multi-task-learning-pytorch's Issues

About the code of pix acc & mean acc for segmentation and the code of rel & \delta 1 & \delta 2 & \delta 3 for depth

Hi, I noticed that this repository did't inculde the code of pix acc & mean acc for segmentation and the code of rel & \delta 1 & \delta 2 & \delta 3 for depth task. But in your papre you provide these metrics result .

Could you provide these evaluation code ？

Thank you so much :) 👍

the trained model

Hi, could you please release the trained model? Thank you!

Backbone

Hey, you reported the best result of NYUD with MTI-Net(HRNet-48), and in your code, the backbone in configs is HRNet-w18. If I want to reproduce the best result, is just changing HRNet-w18 to HRNet-48 enough? Thanks.

About PAP-Net

Hello, thanks for sharing your work. In your paper Multi-Task Learning for Dense Prediction Tasks: A Survey, I found that you've tried PAP-Net, but I can't find the code in your project. How can I get it?

About PAP-Net

hrnet18+padnet

Thank you very much for sharing the wonderful code! Your work is definitely very helpful for MTL community.

I am contacting you because I try to reproduce the result of hrnet18+padnet on nyud dataset. Literally

downloaded the pretrain model of hrnet_w18_small_model_v2.pth
python main.py --config_env configs/env.yml --config_exp configs/nyud/hrnet18/pad_net.yml
But the performance is only Semantic Segmentation mIoU: 33.4665 depth 0.7267, not the result in your paper, do you have idea why is it?

Looking forward to hearing from you soon!
Thank you!

About the pre-trained model and data augmentation

Thank you for your sharing about multi-task learning problem which is very useful!

You have mentioned you used the pre-trained models, e.g., pre-trained Resnet-50, and data augmentation trick in Sec 4.1.4 Training Setup. Therefore, is it right that the results in NYUD-v2 dataset which are reported in TABLE 5(c) in your paper are also based pre-trained Resnet-50 model with data augmentation? Besides, dose using the pre-trained models and data augmentation improve the performance very much?

Thanks for your patience again!

MTI-Net + Resnet FPN backbone

Hi,
Thanks for your code!

Could you provide your code of MTI-Net with Resnet FPN backbone in you paper?
BTW, did you use pretrained weight in your of resnet18-fpn?
If used, where could i find that weight files?

Thanks!

Hyperparams for HRNet-48

Please could you let me know the hyperparams used to train the HRNet-48 model from your paper (both for the 45.7% mIoU and the ~49% mIoU scores)? I have tried really hard to train HRNet-48 on single task in my repository, but it doesn't go beyond 44.8% mIoU.

Thank you.

MTL+Resnet50+nyud

Thank you for sharing the code. During multi-task learning, I encountered the following error while training ResNet50 on NYUD dataset.

RuntimeError: Given groups=1, weight of size 64 3 7 7, expected input[8, 480, 640, 3] to have 3 channels, but got 480 channels instead

Cityscapes depth estimation

Could you please support with the following questions:

In the survey paper (Revisiting Multi-Task Learning in the Deep Learning Era), it is mentioned that depth maps of cityscapes were generated using SGM. Would it be possible to provide the code for this ?
Is the depth map generated or the disparity ?
Disparity maps are made available by cityscapes ? Are these used ? In this case, the networks predict directly predict depth or they predict disparity which is then converted to depth.?

Thank you.

PADNet loss scheme interpolation

PADNet paper mentions that the intermediate loss functions L1 to L4 are done by re-scaling the ground truth map to 1/4 resolution in page 3.

However, the implementation of the PADNet loss scheme upsamples the initial predictions to the img size before applying the loss. This seems to be in contrast to what the paper reports.

Implementaion for decoder focused MTL methods based on ResNet-50

Hi,

It's a great piece of work and thanks for making the code puclic!

Do you have any plan to release the implementaion, hyper-parameters, and pretrained weights for the decoder focused MTL methods based on ResNet-50 as described in Tab. 5(a) in your survey paper?

Thanks in advance!

Setup issue

I am getting a path error since a part of the path is doubled in the code when running:

python main.py --config_env configs/env.yml --config_exp configs/pascal/hrnet18/mti_net.yml

The path looks like this:

../../../data2/yd/mti/datasets/PASCAL_MT/../../../data2/yd/mti/datasets/PASCAL_MT/human_parts/2008_000008.mat

For us it seems like there is an error in data/pascal_context.py here:
In line 111 the part_gt_dir is defined as an extension to self.root:

part_gt_dir = os.path.join(self.root, 'human_parts')

however in line 172 self root is joined with part_gt_dir, so with an extension of itself:

_human_part = os.path.join(self.root, part_gt_dir, line + ".mat")

How to calculate the edges in NYUD?

Hello, thanks for your great work. Could you please tell me how do you get the label for edge prediction in datasets like NYUD?

ResNet50 and PAD-net

First of all, thank you for opening up the code. I am learning multi-task learning recently. And from your paper "Multi-Task Learning for Dense Prediction Tasks: A Survey", it can be concluded that on NYDU-v2 dataset, the PAD-Net based on ResNet-50 seems to get the best results, but your program does not support this situation. May I ask if you can provide one or whether the current code can achieve this function.

Can we do Multiple Classification task

Hello SimonVandenhende,

Kudos to the great work. I just want to know if we can use this repo to train multiple classification tasks. For example Vehicle Make ,Color,Orientation,Model - each 4 attributes as individual tasks.

regards
akirs

About fixed weights from a grid search experiments setting

Hi, I am very impressed by your survey on MTL, from which I have learned a lot.
I am currently working on a MTL project, so I am very curious about the grid search experiments for the fixed weights.
I have not found details about this in your paper as well as this repo. Could you give me more information on this?
What exactly are those grid search weighs? And you used all the combinations of those weights to train the MTL network and evaluate it? If I want to find the best weights for my MTL network, I need to do the same experiments? Could you give me some suggestions on this?
Thank you so much!

Human Parts

Hello, thank you very much for your outstanding contribution. I easily found the method of coloring the semantic segmentation prediction map on the Internet, but there are few methods for coloring Pascal-context human body part segmentation. I want to know how you color the human body part segmentation prediction map?

MTI-NET NYUD task，some files lost

Hi，I'm running your code,but i find some files is not included in your respository:

FileNotFoundError: [Errno 2] No such file or directory: './results/NYUD/hrnet_w18/single_task/semseg/re
sults/NYUD_test_semseg.json'

could you please provide these files?
thanks a lot!

Adaptation to classification and detection problem

Hi @SimonVandenhende,

Thank you for sharing your code! I was wondering if you had tested the network also for other tasks (I am trying to train for joint 3D-object detection and classification & road lane detection and would like to know if you thought this is suitable with MTI-NET)

Thank you!

cannot find HRNet Pre-trained model

Hi, I can't find the pre-train model hrnet_w18_small_model_v2.pth in the https://github.com/HRNet/HRNet-Image-Classification. could you please upload this pre-trained weight again? Thanks~

Depth performance using ResNet-50 (Single-task performance)

Thank you very much for sharing the wonderful code!

I meet a question when running the code: while I can get a similar accuracy on Segmentation (43.5 on mIoU) using ResNet-50, the accuracy on depth is not so good (0.614 RMSE). I have read related issues (#1) and (#5). But I still cannot address the question in my case, could you please give me some suggestions about the single-task experiment in Depth?

Thanks and Regards

Epoch 100/100
----------
Adjusted learning rate to 0.00000
Train ...
Epoch: [99][ 0/99]      Loss depth 1.0003e-01 (1.0003e-01)      Loss Total 1.0003e-01 (1.0003e-01)
Epoch: [99][25/99]      Loss depth 1.4344e-01 (1.2583e-01)      Loss Total 1.4344e-01 (1.2583e-01)
Epoch: [99][50/99]      Loss depth 1.3219e-01 (1.2832e-01)      Loss Total 1.3219e-01 (1.2832e-01)
Epoch: [99][75/99]      Loss depth 1.2006e-01 (1.3160e-01)      Loss Total 1.2006e-01 (1.3160e-01)
Results for depth prediction
rmse           0.2232
log_rmse       0.0887
Evaluate ...
Save model predictions to ./results/NYUD/resnet50/single_task/depth/results
Files already downloaded
Initializing dataloader for NYUD val set
Number of dataset images: 654
Evaluate the saved images (depth)
Evaluating depth: 0 of 654 objects
Evaluating depth: 500 of 654 objects
Results for Depth Estimation
rmse           0.6204
log_rmse       0.2119
No new best depth estimation model 0.614 -> 0.620
Checkpoint ...
Evaluating best model at the end
Save model predictions to ./results/NYUD/resnet50/single_task/depth/results
Files already downloaded
Initializing dataloader for NYUD val set
Number of dataset images: 654
Evaluate the saved images (depth)
Evaluating depth: 0 of 654 objects
Evaluating depth: 500 of 654 objects
Results for Depth Estimation
rmse           0.6204
log_rmse       0.2119

About task balancing

Thank you for your excellent work and open-source code.

Could you please provide some code about task balancing, such as uncertainty weight, GradNorm multi-object Optim? Your paper has some experimental results about these methods. Thank you very much!

Question on Multi Task Evaluation Criteria

Your survey on MTL is awesome! Amazing work, I got a quick question, in section 4.2.1 Evaluation Criterion you mention:

where li = 1 if a lower value means better performance for metric Mi of task i, and 0 otherwise.

But in the function calculate_multi_task_performance

    for task in tasks:
        mtl = eval_dict[task]
        stl = single_task_dict[task]
        
        if task == 'depth': # rmse lower is better
            mtl_performance -= (mtl['rmse'] - stl['rmse'])/stl['rmse']

        elif task in ['semseg', 'sal', 'human_parts']: # mIoU higher is better
            mtl_performance += (mtl['mIoU'] - stl['mIoU'])/stl['mIoU']

        elif task == 'normals': # mean error lower is better
            mtl_performance -= (mtl['mean'] - stl['mean'])/stl['mean']

        elif task == 'edge': # loss lower is better
            mtl_performance += (mtl['odsF'] - stl['odsF'])/stl['odsF']

        else:
            raise NotImplementedError

task == 'edge': # loss lower is better but it's adding instead of subtracting, why is that?

Also I've trying to retrain individual tasks like depth or semseng in a multi-gpu setup but it crashes and reboots!! (I'll probably look into this latter, but wondering if you ever encountered it)

Again, awesome work! Thanks for sharing

Dataset URL

It looks like the URL specified in data/google_drive.py is not accessible anymore.

download_file_from_google_drive fails with UnboundLocalError: local variable 'token' referenced before assignment since response.cookies is empty for the NYU dataset.

[question] where is hrnet_w18_small_model_v2.pth?

sorry for my disturbance.
in https://github.com/HRNet/HRNet-Image-Classification/releases/tag/PretrainedWeights, hrnet_w18_small_model_v2.pth doesn't exist.

There only exist

HRNet_W18_C_cosinelr_cutmix_300epoch.pth.tar
HRNet_W18_C_pretrained.pth
HRNet_W18_C_ssld_pretrained.pth

so which pth is right?

Image visualizations

Hi @SimonVandenhende , could you let me know if there's a tool in this repo which would help generate colorful pascal images?

Thanks!

About dataset training

Is mtl training done sequentially on multiple datasets? Won't this lead to sub-optimization of some tasks?

About 'seism'

Hi! Thanks for sharing the code.
I had one problem that when I trained the model 'PADNet', is that the 'seism' necessary？Can I not use the seism?
Thank u for your answer.

ResNet-50 single task baseline hyperparameters

Hi there, thank you for open-sourcing your amazing work.

I trained ResNet-50 single-task baseline for segmentation task using config file "configs/nyud/resnet50/semseg.yml". However, I can get at best 40.4 mIoU on NYUD-v2 dataset, which is lower than 43.9 reported in the survey. Can you please provide the hyperparameter used in the survey or some hint to reach 43.9 mIoU?

Thanks in advance.

How it works with Batch Normalization?

Using dataloader by dataloader, sometimes memory is limited, so only little batch for each dataloader. It will disadvantage the performance of batch normalization? how to fix this problem?

Implemention of task balancing methods

Thank you very much for releasing the code! Is there any implemention of the task balancing methods like Uncertainty, GradNorm and DWA? Thanks!

[Question] About dataset condition for MTL

Thank you for your awesome work.
Your work might be greatly helpful to all people who interest in the MTL.

I'm about to study MTL and have one question.

I think the dataset for MTL should be in form of {Input: X(i), GT: Y_task1(i), Y_task2(i), ..., Y_taskT(i)}.

However, I think that it is difficult to satisfy this condition in a real-world environment.
When we should train task-specific datasets D_task1 {Input: X_task1, GT: Y_task1}, D_task2 {Input: X_task2, GT: Y_task2} simultaneously, how we do MTL?

For example, we aim to set MTL for both salient object detection and depth estimation.
For the salient object detection task, we use saliency labels from Pascal VOC dataset.
For the depth estimation task, we use depth-map labels from NYUD dataset.
(Both datasets totally consist of different input images, and Pascal VOC does not contain depth-map labels and NYUD does not contain saliency labels)

In this condition, how we construct MTL?
Does anyone know about MTL for task-specific datasets or related works?

simonvandenhende / multi-task-learning-pytorch Goto Github PK