kaix90 / dctnet Goto Github PK

View Code? Open in Web Editor NEW

442.0 442.0 97.0 1.34 MB

Python 88.76% Shell 0.17% Dockerfile 0.04% C++ 3.73% Cuda 7.30%

dctnet's People

Contributors

Stargazers

Watchers

Forkers

chaoso wishgale wxiao-tj mahlermozart lbjcelsius changya1990 mentorezio lxptnd suyanzhou626 dorniwang rainbowlll jybbjybb domino2015 zlou monkeyqx zytx121 knxie minygd peiqinsun liforgit marmotatzju haiantyz basameera yubozuzu123 tophk fanfan7426 caozhengquan vinnibuh mengkunzhao lilujunai hdunbu lihuinb xiangyu19 handsomepinkpig quantum-soul keepgallop leeyegy luojie326 sleepyhead28 johnsnow511 noobyzy chisyliu tangohu17 chenyun-cy shaye440 shank2358 xrosliang haikangdiao uaicfs jfyao90 caogaofeng shao15xiang guanguanboy 24werewolf tntant vickylee91 saber-shi weiwuluxin tianhaofu sunchang2017 haoran-001 xlsean rocketzwei chlei233 huaisha1022 xiangyu8 yifanfanfanfan raja-kumar holmes-gu longjohncoder qinzhengmei tildenj xiongweiwu languohao zero-space-x distoramos sosong zjuzwb cheng-haha windaway uyi618 eoyuana muwutufu hurtdog jhlee17 janus103 seokhyunjeong xiaodongdreams zivzone yamm01 xiaofa-jpg sf-wind dennistang742 a-biao96 tommylitlle d-shuoshuo

dctnet's Issues

sorry, where is the code of DCT channel selection in data pre-process? i don't find it.

version of jpeg2dct, opencv, libjpeg

One more question, I think it may be better for you to do a little bit more explanation about the version. Since I find out that sometimes there exists a conflict of the version of libjpeg.

When I install opencv4 by using another way but not pip, the default version of libjpeg is 9, but the default version of libjpeg for jpeg2dct is version6.2, which is pretty painful.

Now it seems I should rebuild my docker image again to try your code... any help for me to have a quick try? Thank you.

Cifar model not found

Hi, the training code download link contains a script named "cifar.py" but upon executing it, an error occurs as following:
"ModuleNotFoundError: No module named 'models.cifar'".

Is there a module missing? Hope to seek clarification on this.

pretrained model can't open

Could you please provide the pretrained model again? The model downloaded from the original link shows that the compression is damaged and cannot be opened. plz help me

Hey, Thanks for your contribution. I really like the idea. I want to do some experiments and have some questions. (PS; I appologize beforehand, if my questions are basic, I am new to Deep learning. :) )

I could not find any training code in this repo, So are you guys planning to share training codes as well ?? If yes (I hope a yes), how soon ? If no :( , can you tell me how should a training function looks like..
I was thinking to test your method on my own dataset, how can I do that, I mean what functions will require changes (do I need to change the network input layers or not ? If I need to train it on my data set, do i need to change anything in the pre-processing steps or in the network ?
For evaluation, do you think it is feasible to run on a system without GPU ? I was trying it but I think the code is written for a GPU based system. If I want to convert it for the CPU based system, can you point out to the functions that i need to make changes at. ?

Have you tried this method on FT domain?

This method achieved exciting results on DCT domain. Have you ever tried it on FT domain? Will it be effective on FT domain?

why there are 2 parameters in the SE-like block?

Hi,

In the paper,

Then, Tensor 3 is converted to Tensor 4 in Figure 4 of the shape 1 × 1 × C × 2 by multiplying every element in Tensor 3 with two trainable parameters.

I am wondering why you use two parameters to calculate the p of the Bernoulli distribution.

In my view, one parameter will also work.
Could you explain the idea? Thanks~

How do you group the components with same frequency into one channel?

Exactly the detail process of DCT reshape process.

Thank you

How to generate a mask in frequency domain corresponding to the single image

I notice you mention "different input images may have different subsets of the frequency channels activated". I am curious about the mask in frequency domain corresponding to the single image rather than the heat maps for the whole datasets. But I didn't find that in your code released. So may i ask you could you share this part with us? It would be highly appreciated if you can share it with us.

the link to Training code is not available now ,could you update it again?

Gain from larger image or from frequency domain?

Hi! Wonderful work!
I have a question.
If we use the same resolution images as traditional model used(conventional spatial downsampling), can we gain improvement from learning in frequency domain?

About the mask-rcnn baseline. 1x or 2x ?

Great Job.
It's the first work that explores learning in the detection/segmentation.
But I noticed that the box Ap of the RGB-based Mask R-CNN baseline is 37.3, which is trained with
1x Lr scheduler.(lr=0.02 with batchsize=16, and 12 epochs).

Yours is:
lr = 0.02/8 = 0.0025 with batchsize=2
epochs=20.
The total epochs num is close to 2x lr scheduler( 24 epochs ).

Have you try same lr scheduler(i.e. 1x) with the RGB-based mask-rcnn?
Will it still Gain so much?
Thanks.

DCTNet in CenterNet

It's a Great work!
I was trying to combine it into CenterNet, I'm not sure the preprocessing part in mmdetection is in
[DCTNet/segmentation/mmdet/datasets/pipelines/] or other folders? is it the same process as in [/classification/datasets/dataset_imagenet_dct.py]?

You said that the gain in AP is because of the input image is larger(not need to resize), so if the input image is resized, the improvement will be little?

Last question, it seems DCT only change the backbone network, what if I use it in the one-stage detector, will it have any influence?

Thanks so much for your reply and hope you get a good job!

Pretrained Model and codes

the pretrained model cant find in the google Cloud disk and i dont find the pre-processing codes, Will you upload them? Then, why do you select channel by the standards of heat map, whether have other plans? Thanks.

The pretrained model is unreachable

https://drive.google.com/open?id=1GlImzw3_PRNFgieS-VsNWZRGqq-xGoKS
when I open this link, it shows that this folder is located in the owner's recycle bin. Can you check it again?

pretrained model can't be downloaded

plz help me！

Clarification about dataset

In the instructions, you mention that the ImageNet dataset should be present in the 'data' directory? Does this mean, that I have to download the entire Imagenet dataset to be able to run your script?

For example, what can I do if I want to run inference on a single image with your DCT-24 model, akin to running a trained Resnet-50 for inference on an image?

The code is so bad

The code is so bad, too much is missing

SE_ResNet50DCT

The codes of SE_ResNet50DCT are missing~

jpeg2dct or cv2.dct

This is a wonderful research idea and thought process.
But I have a question now: Are there many differences between jpeg2dct and cv2.dct with blockwise?
I have used them to do classification on Cifar10, but they have the same results almostly.

Question about the transform in data loader_imagenet_dct.py

Thanks for your contribution!
There is something I don't understand in the transform part:
it seems that the target of transforms.Resize(input_size1), CenterCrop(input_size2) and Upscale is the original image, while the rest transformations like TransformUpscaledDCT, ToTensorDCT... are transforming the images after dct. It seems that my program cannot perform both at the same time. Is there anyone who got the same problem with me?

pretrained model can't be downloaded

the link cant not open

Are there obvious differences in the experimental results brought about by different DCT implementations?

Thank you for your amazing works. I has some questions.
In my cifar-10 experiments, i use 4x4 blocks, and cv2.dct. which like:

    def run_DCT(self, signal):
        rows = (signal.shape[0] // 4)
        cols = (signal.shape[1] // 4)
        patch_matrix = np.zeros((rows, cols, 4, 4))
        for r in range(rows):
            for c in range(cols):
                patch = cv2.dct(signal[r*4 : (r+1)*4, c*4 : (c+1)*4] / 255.)
                patch_matrix[r, c] = patch
        return patch_matrix.reshape(rows, cols, 4*4)

but the performance is worse than in spatial-domain. you just try imageNet? Do you try other dataset of classification?
Is a problem of my preprocess data?

update training code

Training code can be downloaded at: https://drive.google.com/file/d/1uyaS805ttmvxWFWzo0MFFKDnvz8EmoSs/view?usp=sharing.

The link is invalid, can you please update this code again

RuntimeError: Wrong JPEG library version: library is 90, caller expects 62

Missing transforms (ToYCrCb, ChromaSubsample, UpsampleDCT, CenterCropDCT) in cvtransforms.py

There are several missing transforms (ToYCrCb, ChromaSubsample, UpsampleDCT, CenterCropDCT) in cvtransforms.py. Do u mind to provide these functions in the near future? Thanks

model can't be downloaded

https://drive.google.com/drive/folders/1UMwdQV73z-pk43X__gZQBidE-T_NlPaO

"此文件夹位于所有者的回收站中
要查看该文件夹，请要求其所有者（Kai XU）将其还原。"

What's the difference between DCT in 8x8 patch and a parameter defined 8x8 convolution kernel?

totally equals k=8, s=8, parameter defined layer

Ask for Train Dataloader Composer

Hello @calmevtime,

Thank you for sharing this code. I am very interested in the effect of DCT toward classification and would like to do some experiments based on your code. However, I am not quite sure whether you do some special data preprocessing before training. Would you share the "composer" part you use?

BTW, I find out that in "dataloader_imagenet_dct.py", you do have "transform4" & "transform5" in the unit test part. What's the difference between them?

Thank you.

jpeg2dct are not compatible with libjpeg9

could you provide a dockerfile? libjpeg62 makes me crazy.

Dataset Version Used

I have a question with the dataset used in this project I read from certain sites and including Kaggle there is Imagenet - ILSVRC2012 dataset which is around ~150Gb and has 1000 classes. But I also see from the Imagenet site that it obtains ~21,000 classes. Therefore, is the training in this paper done on the entire Imagenet dataset ~21,000 classes or on the smaller dataset of 1000 classes?

Also, do you think running your resnet_upscaled_static.sh script will provide similar results if I feed the dataset which obtains 1000 classes?

Also awesome paper btw!

Some differences between the code and paper

I'm confused about some differences between your code and paper. In the section 4.2 of your paper, you declare that you "pick the top 24 high-probability channels". I think the high-probability channels should follow the heat map in Fig 5. However, you pick the [0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27] -th channels in your code which forms a square but doesn't obey the above distribution. I would appreciate it if you could help me out.

Where can i find the group codes?

Nice work!
I wonder to know in the paper, you say that

we group the components of the same frequency in all the 8×8 blocks into one channel, maintaining their spatial relations at each
frequency.

Where can i find the corresponding code?

Object detection test.py throws an error

Hey, thanks for your work again. I was trying your object detection code on coco dataset and was hoping to see the code run for evaluation but when I ran the test.py it gives me this error..

from mmdet.apis import init_dist
Traceback (most recent call last):
File "", line 1, in
File "/home/sarfaraz/DCTNet/segmentation/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'
import mmdet
Traceback (most recent call last):
File "", line 1, in
File "/home/sarfaraz/DCTNet/segmentation/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'

Can you point out whats going wrong here ?

Should I merge the original mmdetection with the segmentation part?

@calmevtime , nice idea. I am interestd in it.

I have tried many ways to build and run the segmentation parts, as follows:

1、Installed and tested the original mmdetection successfully.
2、Try to cd the segmantation, run "python setup.py develop", failed.

---------------The most stupid part----------------------------------

3、So I merged the segmentation folder with the original mmdetection folder, run "python setup.py develop", failed.

Why I could test the official mmdetection successfully, but cannot test on your project?
Could you give me some advice how to run the test.py?

no libturbojpeg.so file，only the libturbojpeg.dylib file？

When installing libjpeg-turbo, I customized a write directory because I couldn't write to the dynamic library files in the build to DCMAKE_INSTALL_PREFIX=/usr and DCMAKE_INSTALL_DOCDIR=/usr/share/doc/libjpeg-turbo-2.0.3.
However, there is no libturbojpeg.so file under lib after writing to the directory, only the libturbojpeg.dylib file.
Should the TurboJPEG class load libturbojpeg.dylib?

Looking forward to your reply.
Thank you very much.

Do you transform the images from BGR to YCbCr color space?

Hi! Do you transform the images from BGR to YCbCr color space before converting them to the frequency domain?
As you have mentioned in your paper, “Then images are transformed to the YCbCr color space and converted to the frequency domain (DCT transform in Figure 2)."
But I cannot find any operation that transforms the image from BGR to YCbCr in your open-source code, and you feed the BGR images to the function "transform_dct" directly.
I'm not sure if I missed any details. I hope you can help me.

would you please let me know why return "img, F.upscale(img, self.upscale_factor, self.interpolation)" with 2 images? thanks

class Upscale(object):
def init(self, upscale_factor=2, interpolation='BILINEAR'):
self.upscale_factor = upscale_factor
self.interpolation = interpolation

def __call__(self, img):
    return img, F.upscale(img, self.upscale_factor, self.interpolation)

About mean and std in transfrom

Thanks for your great work! I have a question about how to get the normalized parameters like ‘train_y_mean’ and ‘train_y_std’？

look for "block_composition"

hello!
i find your code have this "from datasets.dct_resize.block_composition import block_composition",but i cant find "dct_resize"in datasets

GateModule192

Hi, I'm interested in GateModule192. But, I can't find codes where define the GateModule192, or I missing something?

thanks.

Questions about the dataloader and missing files

Hello @calmevtime, thank you for publishing the code. When I was trying to reproduce your code, I found the dataloader was not able to run properly, and some important functions are missing. For example, there seems to be a "dct_resize.py" file which is imported in almost all the files in datasets folder, but the "dct_resize.py" is nowhere to be found. Also, the "block_composition" function is indispensable for upsampleDCT, but the function is also missing with the "dct_resize.py".

I would be very grateful if you could upload the "dct_resize.py" file with the "block_composition" function. Looking forward to hearing from you.

how to decompress the pretrained model？

i have dowmload the pretrained model as tar file，but i cannot decompress it by using WinARA or 7-zip

pretrained models Link is failure and unavailable

https://github.com/calmevtime1990/supp/tree/master/segmentation

https://drive.google.com/open?id=1UKmNORizsulH9E4awxjBR4fAlW1KlC5s

https://drive.google.com/open?id=14eTsI_LMjjQHyx_uOsb_DSy2pAIZFny-
https://drive.google.com/open?id=18WbbwpQuoAt--GMlZuWhNLjxN83A0g_i

The mobilenetv2 cannot be evaluated with your pretrained models.

Hi! Thank you for your great work!
I have evaluated the ResNetDCT_Upscaled_Static with your pretrained parameters successfully.
But I cannot evaluate the "mobilenetv2dct_upscaled_subset" with your pretrained parameters (mobilenetv2dct_upscaled_static_24/32). Because the parameters do not match the model you define.
Actually, there is not anyone model matching with your pretrained parameters.
Did I miss something? I'm looking forward to your reply！

RuntimeError: Error(s) in loading state_dict for MobileNetV2DCT_Upscaled_Subset:
Missing key(s) in state_dict: "upconv_y.0.weight", "upconv_y.1.weight", "upconv_y.1.bias", "upconv_y.1.running_mean", "upconv_y.1.running_var", "upconv_cb.0.weight", "upconv_cb.2.weight", "upconv_cb.2.bias", "upconv_cb.2.running_mean", "upconv_cb.2.running_var", "upconv_cr.0.weight", "upconv_cr.2.weight", "upconv_cr.2.bias", "upconv_cr.2.running_mean", "upconv_cr.2.running_var".
size mismatch for features.0.conv.0.weight: copying a param with shape torch.Size([24, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 1, 3, 3]).
size mismatch for features.0.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.3.weight: copying a param with shape torch.Size([16, 24, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 32, 1, 1]).

Do you plan to release the training code?

关于通道选择操作，以及后续实验的上采样问题

你好,

（1）文中对于channel的选择是将YCbCr三个部分分开作为图4输入进行选择，然后再concatenate（文中图2展示），还是先concatenate成为192个通道feature map作为图4模块的输入进行整体选择呢（文章后续的实验细节这么描写）？能解释一下是哪一种吗？
（2）关于分割问题的上采样问题，因为文章的网络输入是W/8,H/8尺寸的，那么是不是要在后续进行上采样维持原分辨率呢？还是通过reshape？似乎在文章没发现描述。

期待你的回答，对此表示十分感谢！！！（文章的idea十分有趣）

Do you need to convert DCT frequency domain information back to RGB?

Hello, convert the RGB information to the DCT frequency domain information and enter the model. After learning the frequency domain information in the model, don't you need to convert the frequency domain information back to RGB?

Looking forward to your reply.

Image resized to 56 x 56 when pre-processed

Hi again, thanks for your work again. I am really interested in your work and trying to understand what is actually going on.
I have a question, Can you please explain why the every image is reduced to 56 by 56 when passed through the whole pre-processing pipeline.

kaix90 / dctnet Goto Github PK

dctnet's People

Contributors

Stargazers

Watchers

Forkers

dctnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs