GithubHelp home page GithubHelp logo

dctnet's People

Contributors

kaix90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dctnet's Issues

version of jpeg2dct, opencv, libjpeg

One more question, I think it may be better for you to do a little bit more explanation about the version. Since I find out that sometimes there exists a conflict of the version of libjpeg.

When I install opencv4 by using another way but not pip, the default version of libjpeg is 9, but the default version of libjpeg for jpeg2dct is version6.2, which is pretty painful.

Now it seems I should rebuild my docker image again to try your code... any help for me to have a quick try? Thank you.

Cifar model not found

Hi, the training code download link contains a script named "cifar.py" but upon executing it, an error occurs as following:
"ModuleNotFoundError: No module named 'models.cifar'".

Is there a module missing? Hope to seek clarification on this.

pretrained model can't open

Could you please provide the pretrained model again? The model downloaded from the original link shows that the compression is damaged and cannot be opened. plz help me

Training codes

Hey, Thanks for your contribution. I really like the idea. I want to do some experiments and have some questions. (PS; I appologize beforehand, if my questions are basic, I am new to Deep learning. :) )

  1. I could not find any training code in this repo, So are you guys planning to share training codes as well ?? If yes (I hope a yes), how soon ? If no :( , can you tell me how should a training function looks like..
  2. I was thinking to test your method on my own dataset, how can I do that, I mean what functions will require changes (do I need to change the network input layers or not ? If I need to train it on my data set, do i need to change anything in the pre-processing steps or in the network ?
  3. For evaluation, do you think it is feasible to run on a system without GPU ? I was trying it but I think the code is written for a GPU based system. If I want to convert it for the CPU based system, can you point out to the functions that i need to make changes at. ?

why there are 2 parameters in the SE-like block?

Hi,

In the paper,

Then, Tensor 3 is converted to Tensor 4 in Figure 4 of the shape 1 × 1 × C × 2 by multiplying every element in Tensor 3 with two trainable parameters.

I am wondering why you use two parameters to calculate the p of the Bernoulli distribution.

In my view, one parameter will also work.
Could you explain the idea? Thanks~

How to generate a mask in frequency domain corresponding to the single image

I notice you mention "different input images may have different subsets of the frequency channels activated". I am curious about the mask in frequency domain corresponding to the single image rather than the heat maps for the whole datasets. But I didn't find that in your code released. So may i ask you could you share this part with us? It would be highly appreciated if you can share it with us.

Gain from larger image or from frequency domain?

Hi! Wonderful work!
I have a question.
If we use the same resolution images as traditional model used(conventional spatial downsampling), can we gain improvement from learning in frequency domain?

About the mask-rcnn baseline. 1x or 2x ?

Great Job.
It's the first work that explores learning in the detection/segmentation.
But I noticed that the box Ap of the RGB-based Mask R-CNN baseline is 37.3, which is trained with
1x Lr scheduler.(lr=0.02 with batchsize=16, and 12 epochs).

Yours is:
lr = 0.02/8 = 0.0025 with batchsize=2
epochs=20.
The total epochs num is close to 2x lr scheduler( 24 epochs ).

Have you try same lr scheduler(i.e. 1x) with the RGB-based mask-rcnn?
Will it still Gain so much?
Thanks.

DCTNet in CenterNet

It's a Great work!
I was trying to combine it into CenterNet, I'm not sure the preprocessing part in mmdetection is in
[DCTNet/segmentation/mmdet/datasets/pipelines/] or other folders? is it the same process as in [/classification/datasets/dataset_imagenet_dct.py]?

You said that the gain in AP is because of the input image is larger(not need to resize), so if the input image is resized, the improvement will be little?

Last question, it seems DCT only change the backbone network, what if I use it in the one-stage detector, will it have any influence?

Thanks so much for your reply and hope you get a good job!

Pretrained Model and codes

the pretrained model cant find in the google Cloud disk and i dont find the pre-processing codes, Will you upload them? Then, why do you select channel by the standards of heat map, whether have other plans? Thanks.

Clarification about dataset

In the instructions, you mention that the ImageNet dataset should be present in the 'data' directory? Does this mean, that I have to download the entire Imagenet dataset to be able to run your script?

For example, what can I do if I want to run inference on a single image with your DCT-24 model, akin to running a trained Resnet-50 for inference on an image?

jpeg2dct or cv2.dct

This is a wonderful research idea and thought process.
But I have a question now: Are there many differences between jpeg2dct and cv2.dct with blockwise?
I have used them to do classification on Cifar10, but they have the same results almostly.

Question about the transform in data loader_imagenet_dct.py

Thanks for your contribution!
There is something I don't understand in the transform part:
it seems that the target of transforms.Resize(input_size1), CenterCrop(input_size2) and Upscale is the original image, while the rest transformations like TransformUpscaledDCT, ToTensorDCT... are transforming the images after dct. It seems that my program cannot perform both at the same time. Is there anyone who got the same problem with me?

Are there obvious differences in the experimental results brought about by different DCT implementations?

Thank you for your amazing works. I has some questions.
In my cifar-10 experiments, i use 4x4 blocks, and cv2.dct. which like:

    def run_DCT(self, signal):
        rows = (signal.shape[0] // 4)
        cols = (signal.shape[1] // 4)
        patch_matrix = np.zeros((rows, cols, 4, 4))
        for r in range(rows):
            for c in range(cols):
                patch = cv2.dct(signal[r*4 : (r+1)*4, c*4 : (c+1)*4] / 255.)
                patch_matrix[r, c] = patch
        return patch_matrix.reshape(rows, cols, 4*4)

but the performance is worse than in spatial-domain. you just try imageNet? Do you try other dataset of classification?
Is a problem of my preprocess data?

Ask for Train Dataloader Composer

Hello @calmevtime,

Thank you for sharing this code. I am very interested in the effect of DCT toward classification and would like to do some experiments based on your code. However, I am not quite sure whether you do some special data preprocessing before training. Would you share the "composer" part you use?

BTW, I find out that in "dataloader_imagenet_dct.py", you do have "transform4" & "transform5" in the unit test part. What's the difference between them?

Thank you.

Dataset Version Used

I have a question with the dataset used in this project I read from certain sites and including Kaggle there is Imagenet - ILSVRC2012 dataset which is around ~150Gb and has 1000 classes. But I also see from the Imagenet site that it obtains ~21,000 classes. Therefore, is the training in this paper done on the entire Imagenet dataset ~21,000 classes or on the smaller dataset of 1000 classes?

Also, do you think running your resnet_upscaled_static.sh script will provide similar results if I feed the dataset which obtains 1000 classes?

Also awesome paper btw!

Some differences between the code and paper

I'm confused about some differences between your code and paper. In the section 4.2 of your paper, you declare that you "pick the top 24 high-probability channels". I think the high-probability channels should follow the heat map in Fig 5. However, you pick the [0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27] -th channels in your code which forms a square but doesn't obey the above distribution. I would appreciate it if you could help me out.

Where can i find the group codes?

Nice work!
I wonder to know in the paper, you say that

we group the components of the same frequency in all the 8×8 blocks into one channel, maintaining their spatial relations at each
frequency.

Where can i find the corresponding code?

Object detection test.py throws an error

Hey, thanks for your work again. I was trying your object detection code on coco dataset and was hoping to see the code run for evaluation but when I ran the test.py it gives me this error..

from mmdet.apis import init_dist
Traceback (most recent call last):
File "", line 1, in
File "/home/sarfaraz/DCTNet/segmentation/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'
import mmdet
Traceback (most recent call last):
File "", line 1, in
File "/home/sarfaraz/DCTNet/segmentation/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'

Can you point out whats going wrong here ?

Should I merge the original mmdetection with the segmentation part?

@calmevtime , nice idea. I am interestd in it.

I have tried many ways to build and run the segmentation parts, as follows:

1、Installed and tested the original mmdetection successfully.
2、Try to cd the segmantation, run "python setup.py develop", failed.

---------------The most stupid part----------------------------------

3、So I merged the segmentation folder with the original mmdetection folder, run "python setup.py develop", failed.

Why I could test the official mmdetection successfully, but cannot test on your project?
Could you give me some advice how to run the test.py?

no libturbojpeg.so file,only the libturbojpeg.dylib file?

When installing libjpeg-turbo, I customized a write directory because I couldn't write to the dynamic library files in the build to DCMAKE_INSTALL_PREFIX=/usr and DCMAKE_INSTALL_DOCDIR=/usr/share/doc/libjpeg-turbo-2.0.3.
However, there is no libturbojpeg.so file under lib after writing to the directory, only the libturbojpeg.dylib file.
Should the TurboJPEG class load libturbojpeg.dylib?

Looking forward to your reply.
Thank you very much.

Do you transform the images from BGR to YCbCr color space?

Hi! Do you transform the images from BGR to YCbCr color space before converting them to the frequency domain?
As you have mentioned in your paper, “Then images are transformed to the YCbCr color space and converted to the frequency domain (DCT transform in Figure 2)."
But I cannot find any operation that transforms the image from BGR to YCbCr in your open-source code, and you feed the BGR images to the function "transform_dct" directly.
I'm not sure if I missed any details. I hope you can help me.

About mean and std in transfrom

Thanks for your great work! I have a question about how to get the normalized parameters like ‘train_y_mean’ and ‘train_y_std’?

look for "block_composition"

hello!
i find your code have this "from datasets.dct_resize.block_composition import block_composition",but i cant find "dct_resize"in datasets

GateModule192

Hi, I'm interested in GateModule192. But, I can't find codes where define the GateModule192, or I missing something?

thanks.

Questions about the dataloader and missing files

Hello @calmevtime, thank you for publishing the code. When I was trying to reproduce your code, I found the dataloader was not able to run properly, and some important functions are missing. For example, there seems to be a "dct_resize.py" file which is imported in almost all the files in datasets folder, but the "dct_resize.py" is nowhere to be found. Also, the "block_composition" function is indispensable for upsampleDCT, but the function is also missing with the "dct_resize.py".

I would be very grateful if you could upload the "dct_resize.py" file with the "block_composition" function. Looking forward to hearing from you.

The mobilenetv2 cannot be evaluated with your pretrained models.

Hi! Thank you for your great work!
I have evaluated the ResNetDCT_Upscaled_Static with your pretrained parameters successfully.
But I cannot evaluate the "mobilenetv2dct_upscaled_subset" with your pretrained parameters (mobilenetv2dct_upscaled_static_24/32). Because the parameters do not match the model you define.
Actually, there is not anyone model matching with your pretrained parameters.
Did I miss something? I'm looking forward to your reply!

RuntimeError: Error(s) in loading state_dict for MobileNetV2DCT_Upscaled_Subset:
Missing key(s) in state_dict: "upconv_y.0.weight", "upconv_y.1.weight", "upconv_y.1.bias", "upconv_y.1.running_mean", "upconv_y.1.running_var", "upconv_cb.0.weight", "upconv_cb.2.weight", "upconv_cb.2.bias", "upconv_cb.2.running_mean", "upconv_cb.2.running_var", "upconv_cr.0.weight", "upconv_cr.2.weight", "upconv_cr.2.bias", "upconv_cr.2.running_mean", "upconv_cr.2.running_var".
size mismatch for features.0.conv.0.weight: copying a param with shape torch.Size([24, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 1, 3, 3]).
size mismatch for features.0.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for features.0.conv.3.weight: copying a param with shape torch.Size([16, 24, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 32, 1, 1]).

关于通道选择操作, 以及后续实验的上采样问题

你好,

(1)文中对于channel的选择是将YCbCr三个部分分开作为图4输入进行选择,然后再concatenate(文中图2展示), 还是先concatenate成为192个通道feature map作为图4模块的输入进行整体选择呢(文章后续的实验细节这么描写)? 能解释一下是哪一种吗?
(2)关于分割问题的上采样问题,因为文章的网络输入是W/8,H/8尺寸的,那么是不是要在后续进行上采样维持原分辨率呢? 还是通过reshape?似乎在文章没发现描述。

期待你的回答,对此表示十分感谢!!! (文章的idea十分有趣)

Image resized to 56 x 56 when pre-processed

Hi again, thanks for your work again. I am really interested in your work and trying to understand what is actually going on.
I have a question, Can you please explain why the every image is reduced to 56 by 56 when passed through the whole pre-processing pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.