GithubHelp home page GithubHelp logo

the-record-on-cross-view-geo-localization's Introduction

the-record-on-Cross-view-Geo-localization

fianlly: Drone->Satellite:Recall@1:82.84 AP:85.31 Satellite->Drone:Recall@1:88.45 AP:77.60 image

Origin LPN: recall1:75.93 AP:79.14

week1:

在resnet50的layer3后加入一个局部分支,由于feature map是16*16采用5x5dwconv作为局部注意力 recall1:76.2 AP:79.41

用7x7卷积正好可以包含相邻的part: recall1:76.15 AP:79.41

input: image

ours: image

LPN: image

在layer3和layer都加入这种局部注意力同时接入残差 recall1:75.95 AP:79.22

没有残差recall1:75.15 AP:78.34

LPN: 1.每层之间的特征不一定能配上对,最外层的特征并没带来很多涨点

image

image

联想到之前一篇reid的工作中,动态规划匹配的方法

image

以及一种基于PCB的软划分方法

image

采用这种软划分方法:

训练其中的part classifier 60个epoch,然后在完整训练50个epoch recall1:56.5 AP:61.88

仔细检验发现,在代码中只是把2048个通道分为4部分,而不是把区域分为4部分,那是不是可以从LPN的4part入手,学习shuffle各个part直接哪些feature来做refine

2.从植被入手,尽量让注意力不要在植被上

week2: 在resnet50的featuremap上,统计每个像素位置的weight,得到一张16x16的heatmap,在用conv去得到4个channel,然后在4个channel上再做avgpool image

用LPN的预训练,recall1:58.02 AP:62.96 重头训练,recall1:48.22 AP:53.33

对四通道采用paration loss: recall1:63.55 AP:67.97

image image

paration loss只对方差进行了监督,同时因为log在零点的梯度大远反而小,将其改为:

image

recall:71.38 AP:75.29 image

week3: 将loss改为0.5次方: recall1:68.8 AP:72.92

image

recall1:69.09 AP:72.2 修改rpp结构从2层卷积换到3层和4层均掉点严重。

将pixel summary的操作换为1x1conv,recall1:69.07 AP:73.17

将avgpool换成maxpool,recall1:70.54 AP:74.17

在rpp中加入non local block,在两层卷积之后,recall1:70.11 AP:74.10 4个part可视化 image

生成了LPN的4张partial mask,将rpp生成的4张heatmap与其做smoothL1loss+paration loss,recall1:61.75,AP:66.6

4个part可视化: image

在rpp中多加一层conv3x3和conv1x1,增加heatmap生成的效果,recall1:62.10 AP:66.62 4个part可视化: image

还是觉得是从pixel summary这边,信息量丢失了太多了可能,从resnet的输出开始,用了3层卷积将2048逐步压缩到4维,采用smoothL1loss+paration loss,recall1:64.69 AP:69.04

image

采用3个CBAM注意力模块去生成3张mask,效果很烂。

week4:

只分2块,一块为中心一块为周边,在layer2-4采用Unified Attention Fusion Module(mean和max在part1和part2中分开计算),然后将part1和part2特征contact在一起在输出加入bnn, center loss权重越大,越关注在中心,lamda=0.001,recall1:73.29 AP:76.74,lamda=0.0005,recall1:70.74 AP:74.37,lamda=0.0001,recall1:74.13 AP:77.49

gradcam for layer2,layer3,layer4 image

将part1和part2都单独采用loss,现在能够注意到周围的建筑,但是掉点很多

image

在part2采用GEMpooling,Recall1:62.50 AP:66.76 猜想:当注意力在外侧时性能较差,是不是说明,网络还没有学习到外层环境和建筑物的关系,从而有点将其作为噪声的感觉,所以接下来的工作就是如何让网络学习这两者之间的关系

week5: 将part1和part2经过GEMpooling后特征contact在一起后,经过一次conv1x1将两部分融合在一起,通道维度从4096压缩为2048,recall1:75.07 AP:78.36 在part1和part2contact之后的特征上加入Barlow TwinsLoss,降低特征冗,recall1:79.56 AP:82.30 and satelite->drone recall1::84.74 AP:76.72

the-record-on-cross-view-geo-localization's People

Contributors

xcco1 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.