feinanshan / fanet Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 5.0 90 KB

License: MIT License

Python 71.30% C++ 9.85% Cuda 17.83% C 0.98% Shell 0.04%

fanet's People

Contributors

Stargazers

Watchers

Forkers

gitunit eashwar93 huanglei0809 guihuay mdarestani98

fanet's Issues

Training Details

Hello once again,

I tried creating a training model and FANet-18 with cityscapes dataset. I replaced the InPlaceABN layers with normal BN followed by Activation as I needed to parse the trained model to ONNX for deploying in my application.

These are my training configurations highly adapted from the paper:

Mini-Batch SGD with Batch Size 4 as I only have 8 gigs of GPU Memory, weight decay = 5e-4, momentum = 0.9
Initial Learning Rate(LR) = 1e-2, with update to LR multiplied by a factor (1-(iter/max_iter)pow(2))
Data Augmentation - Horizontal Flipping, random scaling (0.75 to 2)
Training iterations 80000

I resulted with a OHEM Cross Entropy loss of 0.3941 in the final iteration

I am yet to check the mIOU.

As a preliminary discussion I would like to compare it with BiseNet which was trained in a similar fashion but with auxiliary losses and resulted with a OHEM Cross Entropy Loss of 0.2947 which resulted in a mIOU of 0.63

Could you please give me more details on the training especially

What is the number of Iterations you trained the model for?
What was the Final Cross Entropy loss you ended up with?
Did you use Auxiliary losses as well for better Convergence resulting in lower loss?
Did you have any specific modified version of Cross Entropy Loss to achieve better convergence?

Is there any other thing that I am missing out to achieve better results

Trained model with mIoU of 75.5 for fa2

Hi, Can you please provide the trained model for the last experiment reported in Table VI of the paper, which achieved mIoU of 75.5:
"TABLE VI: Video semantic segmentation on Cityscapes. “+Temp” indicates FANet with spatial-temporal attention (t=2)"

Thanks

The cosine similarity is not used in FA???

Did I not understand the code properly? The cosine similarity is not used in FA, but the dot product is still used. The formula 1/n in the paper is not reflected in the code

why only 2 frames for spatial-temporal context aggregation

Hi. Your paper is very interesting and inspirational to read.
I was wondering why you just integrated the features of ONE neighboring frame to facilitate the inference of current frame.
Have you experimented on more frames? What's the effect?

Thank you.

Torch/Cuda version

Hi Author, thanks for you work : ) Could you please release the torch and cuda version for this repo. It raise "RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50" error when running the cmd CUDA_VISIBLE_DEVICES=1 python3 speeding.py on torch==1.1.0 and CUDA 10.2. Thanks!

Supplementary material

Hi, can I ask for the supplementary material of this paper?

Why not consider BN for test time calculation

Hi，For fairness comparison, previous work including ICnet. BiSegNet, SFnet report the speed with bn, here you merge bn in conv which results in much higher speed.

License for the code

Could you please provide license information for the repo? Is the code made available under an MIT license?

Features extracted by ResNet are different for BiseNet and FANet

Thank you for the great work!!

The below lines are from the Resnet model return adapted for FANet

feat4 = self.layer1(x)
feat8 = self.layer2(feat4) # 1/8
feat16 = self.layer3(feat8) # 1/16
feat32 = self.layer4(feat16) # 1/32
return feat4, feat8, feat16, feat32

The comments specify that feat8, feat16, feat32 are feature maps of 1/8, 1/16, 1/32 of the image size but the actual sizes are 1/16, 1/32, 1/64.

I also understand why this is happening . Below is the way we create Resnet18 model for FANet

def Resnet18(pretrained=True, norm_layer=None, **kwargs):
    model = ResNet(BasicBlock, [2,2,2,2],[2,2,2,2], norm_layer=norm_layer)
    if pretrained:
        model.init_weight(model_zoo.load_url(model_urls['resnet18']))
    return model

which is different from the way we create it in BiseNet

def Resnet18(pretrained=False, **kwargs):
    model = ResNet(BasicBlock, [2, 2, 2, 2],[1, 2, 2, 2])
    if pretrained:
        model.init_weight(model_zoo.load_url(model_urls['resnet18']))
    return model

The stride value for the first BasicBlock is different.

Could you please clarify that if the additional down-sampling is intended for the FANet or not.

Will it be possible for you to share the trained model as well? I don't have access to good GPUs to train it on Cityscapes dataset. It would be nice if you can share the trained model

Quietly waiting for the code to be released！

I want to know when the author can publish the code, thank you!

Resnet block feature map size

Hi,
Thanks for your great work.
I have a question regarding the feature map size of resnet blocks.
In your paper you say that the first res-block produces a feature map of h/4 x w/4 resolution.
But in the code the resolution of the feature map is h/8 x w/8

Is this an error ?
Or the implementation doesn't fully reproduce the paper description ?

Thank you

class BatchNorm2D in fanet.py

I am confusing about it! You defined batchnorm, but in the forward function, you only use activation, can you explain this?

`
class BatchNorm2d(nn.BatchNorm2d):
#(conv => BN => ReLU) * 2
def init(self, num_features, activation='none'):
super(BatchNorm2d, self).init(num_features=num_features)
if activation == 'leaky_relu':
self.activation = nn.LeakyReLU()
elif activation == 'none':
self.activation = lambda x:x
else:
raise Exception("Accepted activation: ['leaky_relu']")

def forward(self, x):
    return self.activation(x)

feinanshan / fanet Goto Github PK

fanet's People

Contributors

Stargazers

Watchers

Forkers

fanet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs