blackfeather-wang / isda-for-deep-networks Goto Github PK
View Code? Open in Web Editor NEWAn efficient implicit semantic augmentation method, complementary to existing non-semantic techniques.
An efficient implicit semantic augmentation method, complementary to existing non-semantic techniques.
Hi, ISDA is innovative.
But I want to use it to improve performances on other datasets(not imagenet).
Can I still use your visualization method (BigGAN) to visualize my dataset? If not, would you give me some suggestions on how to visualize my dataset?
Thanks a lot!
cifar-ISDA:
def forward(self, model, fc, x, target_x, ratio):
features = model(x)
y = fc(features)
self.estimator.update_CV(features.detach(), target_x)
isda_aug_y = self.isda_aug(fc, features, y, target_x, self.estimator.CoVariance.detach(), ratio)
loss = self.cross_entropy(isda_aug_y, target_x)
return loss, y
imagenet-ISDA:
def forward(self, model, x, target_x, ratio):
y, features = model(x, isda=True)
# y = fc(features)
self.estimator.update_CV(features.detach(), target_x)
isda_aug_y = self.isda_aug(model.module.fc, features, y, target_x, self.estimator.CoVariance.detach(), ratio)
loss = self.cross_entropy(isda_aug_y, target_x)
return loss, y
why there is difference?
To apply ISDA to other models, the final fully connected layer does not needs to be explicitly defined in imagenet? right?
It's a little bit like centerLoss needs to return x and feature, right?
labels = ((1 - label_mask).mul(labels) + label_mask * 19).long()
19 is num_class?
Hi, the ISDA is a nice work and very impressive.
I have a question about the effect of ISDA on shallow layers of the network. ISDA augments the deep features, which is the last conv layer of the network before FC. So it seems the augmented features are only used to train a robust FC layer. I'd like to know how ISDA will affect the training of previous conv layers.
Looking forward to your reply. Thanks!
weight_CV: tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
训练自己的数据的时候报错了,weight_cv一直都是NAN
我尝试过以下这些方案解决,结果不行:
1、调整学习率
2、对FC权重和特征进行归一化
我的分类数是3000多类,我看代码weight_cv,是对label的一个onehot转换。按理应该不会出现这种错误才对。
希望作者能指点一下,谢谢
thanks ....
I know it
Hi
I read your paper. it's a very wonderful work
what i am confusing is related to #33
in #33, you said that because last fc layer(classifier) is implemented outside of whole model so y is embedding feature not logit
then why you update covariance using features variable not the y variable before entering isda_aug method?
what i understood by reading your paper is that the proposed augmentation is done on the embedding feature ai with covariance matrix yi which is constructed by calculating element wise variance of features whose classes are yi. am i right?
according to my understanding, covariance should be updated by using y variable
Thanks a lot!
In yolov3 or v4 authors use bce loss for classification , is ISDA applicable to this kind of prediction?
Hi,I'm reading your parper and your code, very useful work. but I'm wondering how does ISDA work exactly, I'm not fully understand it yet. My question is, does ISDA work while training a model? like training a detection model, and mean while we use ISDA? or we use ISDA firstly, generate data, then we training the model?
I now thinking it is the first way to use, but I'm not sure. Am I right?
thanks a lot
Hi,
Thanks for sharing the codes. Is this code only for supervised learning?
How about the code of ISAD on unlabeled samples? Could you also show them?
I would really appreciate it.
Thanks.
Thank u very much for open the sourcecode.
I have read the paper , and trained it in vehicle-classification task (with 20 id and 10thousand images)and vehicle-reid task (with 200 id and 10 thousand images) with our own dataset,
and it worked respectively with resnest50 / resnet50 with lamda_0 = 7.5 .
but when i change the backbone with resnet101 ,it didn't work with lamda=7.5,
Wonderful work! But still left me with a question.
It does make sense in image classification task, as the main object in the image can have semantic transforms. But in segmentation task, is ISDA operated based on each pixel? How to explain semantic transform of an alone pixel?
It seems that you only use the diagonal elements of the covariance matrix. I wonder why not use the complete matrix to augment the feature? Looking forward to your reply.
I think that means and coVariances are calculated by each GPU in "update_CV()"
Do you plan to release the code to visualize CIFAR semantically augmented data any time soon ?
Dear author
Thank you very much for your release. I have a question to ask. In the last dim of the network, you use an fc (feature_dim, class_dim) as the example to represent your theory in code. What if the conv operate (feature_dim, class_dim, kernel_size, kernel_size) has to be used in the specific task. If there would have any differences? because I was confused here whether we should consider the parameter of kernel or not. Meanwhile, the code would be changed.
##is that right? NxW_ij = weight_m.expand(N,C,A) --》 NxW_ij = weight_m.expand(N,C,A,K,K)?
Looking forward to your reply
Best wishes
Hi,
thanks for the work. I saw the equation 7 in the paper that the probablity for the cross entropy is not pure softmax activations. In the numerator, the input of exp is wx+b
, while in the denormerator, the input of the exps are wx+b+(\lam vt \sigma v)/2
. Why do you use nn.CrossEntropyLoss
directly in the paper ? Should it be based on softmax ?
Also,As for as I know the covariance matrix of gaussian distribution is a square matrix, why is it defined with the shape of class_num x feature_num
?
Thank you for this repo.
I am trying to implement ISDA with my own model on the CIFAR-10 dataset, but for some reason I am getting issues.
My model is defined below :
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 64, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(64, 256, 5)
self.dp1 = nn.Dropout2d(p=.25)
self.fc1 = nn.Linear(256 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
residual = x
x = self.pool(mish(self.conv1(x)))
x = self.pool(mish(self.conv2(x)))
x = x.view(-1, 256 * 5 * 5)
x = self.dp1(x)
if residual.shape == x.shape:
x += residual
x = mish(self.fc1(x))
x = mish(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
batch_size = 4
feature_num = 84
num_classes = len(classes) #10
fc = Full_layer(feature_num, num_classes)
I have done every in the train function right but for some reason I am getting
RuntimeError: size mismatch, m1: [4 x 10], m2: [84 x 10] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:197
你好,
谢谢你的分享。我有一个问题,这个增强方法对于long-tailed problem是否work,或者说,如果很多类别中的样本仅仅只有极少的几张图片,这个方法能起作用吗?
In you paper, the logit in the molecular is not changed, only the logits in the denominator are changed.
However, in the code, I find all the logits are changed.
Hi, I read the EstimatorCV and found that the average feature are calculated by averaging all the running features, including the old features and new ones.
Should it be more reasonable to apply bigger weight to new features?
I found that you update the Arxiv paper on Aril 25 and the performance on ImageNet is largely improved, from 0.28% gain to 1.1% gain on ResNet-50. Is there any special change that leads to the larger improvements? Thanks.
Hi thanks for sharing the code. I had a look and i cannot understand why in ./networks/resnet.py at line 169 the value of self.feature_num=64 is the same for all model and instead for wideresnet and other networks the value is different.
Thanks
Your idea of implicit and efficient feature space augmentation is very nice and a valuable contribution.
It would be great if you can please share the code for reconstructing images from augmented samples in feature space?
Thanks for sharing your code!!!
I was wondering if I use your ISDA method to augment the training data for the Cityscapes' semantic segmentation task,
do I need to augment the label data as well? Because after reading your code, I only found the augmented image data.
Looking forward to your reply!
Thanks for sharing your implementation! I have a question about the implementation, how does the ISDA loss implementation relate to the formula 10 in the TPAMI paper? Could you please explain more about the implementation? Thanks a lot!
Thanks for your brilliant work! May I ask how ISDA is implemented in the object detection task? I can see the results of object detection on MS COCO, but object detection also includes regression, may I ask how ISDA is applied to the bounding box regression in object detection? (Or maybe ISDA is only applied to classification?)
Lastly, I think it will be great if you could please provide the code of ISDA implementation on the object detection task. Thanks a lot!
Hi, thanks a lot for your works. There are two questions that confused me a lot:
(1) I think CV_temp is the covariance matrix corresponding to each class,
but following augmentations happened on y, and y is logits. If features shape is [N, A], then y's shape is [N, C].
As your paper said, we should augment in features, should we augment on features not y?
As the sigma2 is [N, C] , features is [N, A], If I'd like to augment directly on features, where should I correct?
(2)And what's the meaning of these two steps on sigma2 computation?
I'm new to CV, and I learned a lot from your work. Can you share the code of Appendix C if convenient? Thank you.
hello,a errar happens:
y, features = model(x, isda=True)
File "C:\ProgramData\Anaconda3\envs\chango1\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'isda'
can you help me?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.