GithubHelp home page GithubHelp logo

gudovskiy / cflow-ad Goto Github PK

View Code? Open in Web Editor NEW
221.0 6.0 56.0 835 KB

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
unsupervised anomaly detection mvtec-ad mvtec normalizing-flows

cflow-ad's People

Contributors

gudovskiy avatar gudovskiyd avatar tekno-h avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cflow-ad's Issues

Detection AUROC and segmentation AUROC behavior

Hello!

I wonder if the detection AUROC snd segmentation AUROC have the same behavior. In other words, when the model achieves the best seg_auroc,should it be when it has also the best detection_auroc? Or can it have the best det_auroc and the best seg_auroc in diferent epochs?

PDT: congrats for this work

Custom datasets

Hello, I am using a custom data set, because my data set has a transparency, so it does not match, I would like to ask how to solve this problem?
image

U-L2?

Hello Gudovskiy,

I wonder if you could explain this a bit more.

Screenshot from 2023-01-26 16-47-45

I do not fully understand this part of the paper, thanks

RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

Hi!
I faced the RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient when tracing encoder (-enc mobilenet_v3_large) from the cflow-ad network with torch.jit.trace() function.
I fed in trace function simplified version of test_meta_epoch function and single image (as tensor torch.Size([1, 3, 512, 512])).
Script fails on line with _ = c.encoder(image).

Please explain to me, what should I change in code to resolve this task?

Traceback (most recent call last):
  File "convert_cflow-ad.py", line 187, in <module>
    traced_model = torch.jit.trace(test_model, image)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/jit/_trace.py", line 780, in trace
    traced = torch._C._create_function_from_trace(
  File "convert_cflow-ad.py", line 72, in test_model
    _ = c.encoder(image)  # BxCxHxW
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/Projects/project/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient

The speed of train so slowly

Hi, the speed of train is slowly and the GPU memory always is about 9.2G when i use differente image size :256256、512512 or 1024*1024. why? Thank you very much!

What is the B value here in extracting feature map

Hello, sir

Thanks for your wonderful repos and clearly explained paper.

But One point I have trouble understanding here is this line:

B, C, H, W = e.size()

What is the extracted B value, I assumed it was batch dimension when calling the feature extractor. But after inspecting the code, the train_meta_epoch already looping over the train_loader iterator --> We are only extracting feature maps from a single image at a time.

Doesn't this mean that the B value here always equal 1 ? If it is always 1, why do we take it into account in later computations (like E = B*S, fiber processing,...)

IoU Evaluation

Thanks for sharing the implementations!

Just a quick question, I noticed that when evaluating the mIOU, all the numbers obtained by normal testing samples are skipped. Is this some common practice?

Thanks.

why taking dataloader and training times very slow?

Hello, running the command below.

python3 main.py --gpu 0 --pro -inp 512 --dataset mvtec --class-name bottle

Can you tell me why it takes more time to train than the existing Anomaly algorithm and why it takes more time to bring dataloaders? Thank you for sharing code

Improper normalization of the scores?

In train.py, you normalize the scores according to:

test_map = [list() for p in pool_layers]
for l, p in enumerate(pool_layers):
    test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
    test_norm-= torch.max(test_norm) # normalize likelihoods to (-Inf:0] by subtracting a constant
    test_prob = torch.exp(test_norm) # convert to probs in range [0:1]
    test_mask = test_prob.reshape(-1, height[l], width[l])
    test_mask = test_prob.reshape(-1, height[l], width[l])
    # upsample
    test_map[l] = F.interpolate(test_mask.unsqueeze(1),
        size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
# score aggregation
score_map = np.zeros_like(test_map[0])
for l, p in enumerate(pool_layers):
    score_map += test_map[l]

This normalization is fine as long as it is done for only one map since this normalization function is monotonically increasing. By adding up the maps from the different layers, this makes no sense to me since the relative weighting of the score maps for aggregation (last line) depends on the test set or to be more precise on the maxima of the individual maps over the test set. Am I missing something here or is this normalization improper?

Did the encoder training parameters setting in the paper refer to the pretrain parameters setting of the encoder?

In the last paragraph of "5.1. Experimental setup" in the paper: "Adam optimizer with 2e-4 learning rate, 100 train epochs, 32 mini-batch size for encoder and cosine learning rate annealing with 2 warm-up epochs."
Did these training parameters of encoder refer to pretrain parameters? In train.py, encoder only loaded the pretrained weights on ImageNet and did not participate in the training process.
Thanks a lot.

Huge Training Loss after a certain number of epochs

While training with the wide resnet_50 on different surveilliance camera data, I am experiencing a huge loss.
What can be the reason for such outcome?

It will be really helpful if you can contribute to the problem.

Increase hyperparameter N

Hi,

When I increase hyperparameter N, for test_meta_epoch only, to 8192 I get an fps increase of 300% and SUM scores which are identical to N=256. I do not fully understand the code yet, so could you answer if this is wise to do?

Why use positional embedding as conditional vector?

Hi,
I have read your paper and i do not fully understand why are you using positional embedding as conditional for coupling flow layer. What is the benefit of doing this?

Is related to positional invariance?

Thanks

Did the encoder training parameters setting in the paper refer to the pretrain parameters setting of the encoder?

In the last paragraph of "5.1. Experimental setup" in the paper: "Adam optimizer with 2e-4 learning rate, 100 train epochs, 32 mini-batch size for encoder and cosine learning rate annealing with 2 warm-up epochs."
Did these training parameters of encoder refer to pretrain parameters? In train.py, encoder only loaded the pretrained weights on ImageNet and did not participate in the training process.
Thanks a lot.

WRN50 Transistor AUC97.99 ?

Hi, I have a question about the performance on type Transistor in your paper.
It seems that your AUROC results of WRN50 in Table3 are the maximums in Table2.
Table3 shows the AUROC of Transistor is 97.99. However, the corresponding maximum in Table2 is only 93.28.
What do I miss? Thanks.
image

About the difficulties of exporting onnx

Hello Denis,
Thanks a lot for your proposed CFlow method!
Do you have any plans to export the model to 'onnx' format? In my attempts to export, I had to use 'torch.jit.script' as an intermediate method to implement 'onnx' due to the presence of the loop in the forward code. But this seems to be difficult and the Freia library always fails to export.
Sincerely looking forward to your reply!

About detection speed

Hello, have you ever tested the time-consuming reasoning for each image?I face almost real-time detection scene , and I don't know if the algorithm is suitable,thanks very much

Calculate detection AUROC from anomaly map

Is it possible to get the detection performance from the anomaly map (segmentation)? I mean like the way that get the top k highest anomaly score in the anomaly map and calculate the mean of them as the threshold for decide as anomaly or not.

Is there any misunderstanding or something that I miss about this kind of way to calculate detection performance instead of calculate with label (1,0) ?

Because I observe that you have a better performance in segmentation than the detection in this model
maybe your model is flow-based (generative-based) so it have a better behavior in pixel level?

Inference speed increase with the time

Hello @gudovskiy .

I was measuring the proc time of this work in my GPU. I noted that the proc time per iteration increases. I wonder if there are some accumulation formula that could decrease the speed. You can see next:

image

I am sharing the code I'm using below. I already saw that you have a similar method but I want to measure all the pipeline untill the scores are achieved

input_data = torch.rand_like(next(iter(loader))[0], requires_grad=False, device=torch.device('cuda'))

#Warmup phase 
....
# end of warmup

torch.cuda.synchronize()
    print("Start timing...")
    timings = []
    with torch.no_grad():
        for i in tqdm(range(500), 'Measuring inference speed...'):
            start_time = time.time()
            _ = encoder(input_data)  # BxCxHxW
            # test decoder
            for l, layer in enumerate(pool_layers):
                e = activation[layer]  # BxCxHxW
                B, C, H, W = e.size()
                S = H * W
                E = B * S
                if i == 0:  # get stats
                    height.append(H)
                    width.append(W)
                p = positionalencoding2d(P, H, W).to(c.device).unsqueeze(0).repeat(B, 1, 1, 1)
                c_r = p.reshape(B, P, S).transpose(1, 2).reshape(E, P)  # BHWxP
                e_r = e.reshape(B, C, S).transpose(1, 2).reshape(E, C)  # BHWxC
                decoder = decoders[l]
                FIB = E // N + int(E % N > 0)  # number of fiber batches
                for f in range(FIB):
                    if f < (FIB - 1):
                        idx = torch.arange(f * N, (f + 1) * N)
                    else:
                        idx = torch.arange(f * N, E)
                    c_p = c_r[idx]  # NxP
                    e_p = e_r[idx]  # NxC
                    z, log_jac_det = decoder(e_p, [c_p, ])
                    decoder_log_prob = get_logp(C, z, log_jac_det)
                    log_prob = decoder_log_prob / C  # likelihood per dim
                    test_dist[l] = test_dist[l] + log_prob.detach().cpu().tolist()

            test_map = [list() for p in pool_layers]
            for l, p in enumerate(pool_layers):
                test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
                test_norm -= torch.max(test_norm)  # normalize likelihoods to (-Inf:0] by subtracting a constant
                test_prob = torch.exp(test_norm)  # convert to probs in range [0:1]
                test_mask = test_prob.reshape(-1, height[l], width[l])
                test_map[l] = F.interpolate(test_mask.unsqueeze(1), size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()

            score_map = np.zeros_like(test_map[0])
            for l, p in enumerate(pool_layers):
                score_map += test_map[l]
            score_mask = score_map
            super_mask = score_mask.max() - score_mask
            score_label = np.max(super_mask)

            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i % 100 == 0:
                print("It: {} average time: {}".format(i, np.mean(timings) * 1000))

    print("Input shape {}".format(input_data.shape))
    print("Average time {}".format(np.mean(timings) * 1000))

Thanks

NameError: name 'mobilenet_v3_large' is not defined

python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp 256
--class-name metal_nut --checkpoint mvtec_mobilenet_v3_large_freia-cflow_pl3_cb8_inp512_run0_metal_nut.pt
LR schedule: [12, 18, 22]
Number of pool layers = 3
Traceback (most recent call last):
File "main.py", line 91, in
main(c)
File "main.py", line 84, in main
train(c)
File "/home/radhakrishna/cflow-ad/train.py", line 262, in train
encoder, pool_layers, pool_dims = load_encoder_arch(c, L)
File "/home/radhakrishna/cflow-ad/model.py", line 158, in load_encoder_arch
encoder = mobilenet_v3_large(pretrained=True, progress=True).features
NameError: name 'mobilenet_v3_large' is not defined

Large test loss, and thresholds

First, it's a great job, thanks for your sharing.
And, when I test bottle, train loss is very small, 10 epochs later, 0.06. But the test loss growth quickly, larger than 27000. At the 25th epoch, test loss is 6636421 while train loss 0.05. So, is this normal? I use default settings.

Then, a problem confused me a lot time. Without GTs, how can I chose thresholds of SEG? Both the auroc socres and F1 scores require gt_mask, but the fact is no gt information for new data with only normal samples. Could you give some ideas? thks!

How to check visulization ?

How to visulized results ?

Compute loss and scores on test set: Epoch: 0 test_loss: 0.1731 and 23.68 fps Heights/Widths [32, 16, 8] [32, 16, 8] DET_AUROC: last: 97.26 max: 97.26 epoch_max: 0 SEG_AUROC: last: 98.16 max: 98.16 epoch_max: 0 :

test score_label Inconsistent when test images Inconsistent in Same image

First of all, thank you for presenting your work as open source.
When I tested model, I found test score_label changed when test same one image,When the batch_size is not the same, the test result is not the same,I tested 1 image, 2 images, and 3 images in sequence. One of them always exists, but the results are inconsistent in the 3 tests.
such as: Batch_size = 1
score_label: [2.193307 2.3324122 2.3044298]
score_label: [2.193307 2.3324122]
score_label: [2.159736],

When I add an image to the test data, the test result is different from the previous test result, on the same image。
How is this going and How can i modify?
Looking forward to hearing from you.

heatmap

Hello!
thanks for your working and code, I wonder how Ito get the heatmap on images? Please explain in more detail. Thanks!

How many GPU hours are required?

Thanks for sharing the implementations!

I wonder how many GPU hours to train for a subset in MVTec? I've tried your code on a single RTX2060ti, but I find it really slow...

how to inference single image?

great job
i am new about abnomal detection.
what i want to know is that ,in actually scene ,when i have trained one model with bottle class in mvtec, i give another bottle image, should the model give the class (nomal or abnomal),and location the abnomal area like GT?
could you give some demo code for that (load a model and predict single image)?

The motivation and mechanism of conditional flow

Hi,

Thanks for your incredible contribution! I have a question about understanding the term conditional in your work.
You concatenate the conditional vector in the input of coupling layers since it provides conditional prior. Can I understand it this way: You use conditional vector since your input of the flow part is the output of a pooling layer of a pre-trained CNN, which is impossible to do inverse without using conditional vector. In other words, the conditional vector (your PE) maintains pre-trained CNN's invertibility or makes the inverse more accurate.
If I understand it right, could you please provide some mathematical evidence or some supplementary content to prove this point? If not, how can I understand the conditional vector and positional prior and how does it work?
Thanks for your time.

Best regards

heat map visualization

First of all, thank you for presenting your work as open source. When I tested the bottle class with the

python3 main.py --gpu 0 --pro -enc mobilenet_v3_large --dataset mvtec --action-type norm-test -inp INPUT --class-name CLASS --checkpoint PATH/FILE.PT

command, I got results.
DET_AUROC: last: 100.00 max: 100.00 epoch_max: 0
SEG_AUROC: last: 98.93 max: 98.93 epoch_max: 0
SEG_AUPRO: last: 96.48 max: 96.48 epoch_max: 0

The question is how can i display heatmap on images.

value of N

#19 (comment)
Hi,
I am a bit confused about the impact of N, increase the value of N can improve the speed, will this affect the performance of the model? Do I need to keep the values of N in train and test the same?

If I set N to 512 during training and set it to a much larger value during testing to achieve higher inference speed, is this appropriate ?

Thanks a lot.

Calculate detection AUROC

In the line 334 of train file, the detection AUROC is calculated using truth label and score label.
Why the truth label is bolean and score label is a float? Im trying to replicate using the leather class and the score label values are between [0.97,2.7].

Tune parameters to get best results.

Hello Denis,
I have been studying and adjusting this great repository to fit my needs in defect detection.
I have trained several models successfully(using the mobilenetv3_large backbone) and achieved good results. However, I have eliminated some parts of your code, including the snippets that calculate the "seg_threshold" parameter from the ground truth.
Therefore, I am choosing it by hand (through trial&error), and although the results are OK, I think they can be further improved.

My questions are:
1- How can I choose a value for "seg_threshold" reliably in my given case?
2- What parameters do you recommend fine-tuning when training a new model (knowing that I have a good and balanced dataset of the same product but with different colours)?
3- My last question is related to exporting the model to "onnx" format, do you have any comments on how to achieve that? Do you plan on adding that capability?
4- When should I stop training ?

Thank you in advance, your work is truly inspiring.

Why using meta-epoch training paradigm

Hi,
Thanks to your code, it helps my way on researching alot.
One question comes to my mind when I try to implement cFlow under my code that usually a model is trained by a batch of data where the loss is reduced from the whole batch and backpropagated.
And I found that the loss is backpropagated for each sub-iteration -- only part of the batch is sampled.
This Training paradigm somehow confuses me, does it work better than the normal way ?
Here are points that I surmise why that works:

  • Only seeing part of the batch randomly makes the gradients move more stochastically which could produce a more robust model
  • It saves GPU memory for each forward/backward propagation with high batch size

Thanks!

Question about image resizing and evaluation methods in localization task

I'm a student who studies anomaly detection, Thank you for providing an interesting paper and well-designed code.
I would like to ask a question about image resizing and evaluation methods in the localization task.

The input image is resized when training in your code. And the information about the size of the resized image is given when the main.py is executed like this.
python3 main.py --gpu 0 --pro -inp 128 --dataset mvtec --class-name transistor
The original size of "transistor" class in MVTec dataset is 1024, so I guess the size of this image is resized from 1024 to 128 here.

My question:
Are the predicted mask image and GT mask image also resized when they are tested? (like 1024 -> 128 in "transistor" class)
If so, would reducing the resolution of the image used for testing not improve the evaluation value?

I'd appreciate it if you could answer my questions.

test loss

Hello, I directly used unmodified code for training and testing. I found that during model testing, the test loss became increasingly large. When epoch reached 6, the test loss reached 6000, while when epoch was 1, the test loss was only 30+. Is there an overfitting phenomenon? What is the reason for this phenomenon? I would greatly appreciate it if you could provide me with an answer

Same model to inspect different but similar objects.

Hello Denis,

I am using CFLOW to inspect similar labels on similar objects.
Your paper was very insightful, however I couldn't extract useful info for my use-case.
I was wondering, if you have any suggestions to adjust the training parameters to better tackle this challenge.
I am also working with grayscale images if that is helpful.

Anything can help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.