GithubHelp home page GithubHelp logo

726761393 / gdsr-dctnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhaozixiang1228/gdsr-dctnet

0.0 0.0 0.0 73.54 MB

[CVPR 2022 Oral] Official implementation for "Discrete Cosine Transform Network for Guided Depth Map Super-Resolution."

Python 100.00%

gdsr-dctnet's Introduction

DCTNet

Codes for Discrete Cosine Transform Network for Guided Depth Map Super-Resolution (CVPR 2022 Oral)

Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Lin and Hanspeter Pfister.

-[Paper]
-[ArXiv]
-[Supplementary Materials]

Citation

@InProceedings{Zhao_2022_CVPR,
    author    = {Zhao, Zixiang and Zhang, Jiangshe and Xu, Shuang and Lin, Zudi and Pfister, Hanspeter},
    title     = {Discrete Cosine Transform Network for Guided Depth Map Super-Resolution},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {5697-5707}
}

Abstract

Guided depth super-resolution (GDSR) is an essential topic in multi-modal image processing, which reconstructs high-resolution (HR) depth maps from low-resolution ones collected with suboptimal conditions with the help of HR RGB images of the same scene. To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects. First, the Discrete Cosine Transform (DCT) module reconstructs the multi-channel HR depth features by using DCT to solve the channel-wise optimization problem derived from the image domain. Second, we introduce a semi-coupled feature extraction module that uses shared convolutional kernels to extract common information and private kernels to extract modality-specific information. Third, we employ an edge attention mechanism to highlight the contours informative for guided upsampling. Extensive quantitative and qualitative evaluations demonstrate the effectiveness of our DCTNet, which outperforms previous state-of-the-art methods with a relatively small number of parameters.

Usage

Network Architecture

Our DCTNet is implemented in model.py.

Training

Pretrained model is available in './models/DCTNet_4X.model', './models/DCTNet_8X.model', './models/DCTNet_16X.model' and './models/DCTNet_RealScene.model', which are responsible for the tasks of upsampling factors of 4, 8, and 16, and the RGBDD real-world branch task. We train it on NYU v2 (1000 image pairs). In the training phase, all images are resize to 256x256.

If you want to re-train this net, you need to download the original dataset at NYU V2, then use the same preprocessing as DKN and FDSR to get a training set like ./data/NYU_Train_imgsize_256_scale_4.h5(because the size of this dataset is 10+GB, we cannot upload it). Subsequently, you should run 'train.py' to retrain.

Testing

The test images used in the paper have been stored in './RawDatasets/Middlebury', './RawDatasets/NYUDepthv2_Test', './RawDatasets/Lu', './RawDatasets/RGBDD' and './RawDatasets/RGBDD_Test_Realscene', respectively.

The test data set can be downloaded at NYU v2, Middlebury, Lu and RGBDD.

Unfortunately, since the size of NYU v2 dataset is 600+MB and that of RGBDD in real-world branch is 100+MB, we only upload three image pairs from these two datasets respectively to prove the correctness of our codes. The other datasets contain all the test images.

If you want to inference with our DCTNet and obtain the RMSE results in our paper, please run 'processing_testsets.py' and get the the processed test set in './DatasetsAfterProcessing/'. Then run 'test.py' to test our method.

If you use the complete test datasets, the testing results will be printed in the terminal:

==============================================
The testing RMSE results of Middlebury Dataset
     X4         X8         X16
----------------------------------------------
[1.09937036 2.04951119 4.19195414]
==============================================
==============================================
The testing RMSE results of NYU V2 Dataset    
     X4         X8         X16
----------------------------------------------
[1.59155273 3.16303039 5.84125805]
==============================================
==============================================
The testing RMSE results of Lu Dataset        
     X4         X8         X16
----------------------------------------------
[0.88223213 1.84769642 4.38759089]
==============================================
==============================================
The testing RMSE results of RGBDD Dataset
     X4         X8         X16
----------------------------------------------
[1.07670105 1.73648119 3.04929352]
==============================================
==============================================
The testing RMSE results in RealScene RGBDD
DCTNet in real-world branch
----------------------------------------------
tensor([7.3676])
==============================================
==============================================
The testing RMSE results in RealScene RGBDD
DCTNet* in real-world branch
----------------------------------------------
tensor([5.4326])
==============================================

The above output represents the results of DCTNet in Tab. 2 and Tab. 3 in our paper. The first four parts correspond to the results of the four testsets in Tab. 2, and the last two parts show the RMSE values of DCTNet and DCTNet* in Tab. 3.

gdsr-dctnet's People

Contributors

zhaozixiang1228 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.