Light

xcco1 / the-record-on-cross-view-geo-localization Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 80 KB

the-record-on-cross-view-geo-localization's Introduction

the-record-on-Cross-view-Geo-localization

fianlly: Drone->Satellite:Recall@1:82.84 AP:85.31 Satellite->Drone:Recall@1:88.45 AP:77.60

Origin LPN: recall1:75.93 AP:79.14

week1：

在resnet50的layer3后加入一个局部分支，由于feature map是16*16采用5x5dwconv作为局部注意力 recall1:76.2 AP:79.41

用7x7卷积正好可以包含相邻的part： recall1:76.15 AP:79.41

input:

ours：

LPN:

在layer3和layer都加入这种局部注意力同时接入残差 recall1：75.95 AP：79.22

没有残差recall1：75.15 AP：78.34

LPN： 1.每层之间的特征不一定能配上对，最外层的特征并没带来很多涨点

联想到之前一篇reid的工作中，动态规划匹配的方法

以及一种基于PCB的软划分方法

采用这种软划分方法：

训练其中的part classifier 60个epoch，然后在完整训练50个epoch recall1：56.5 AP：61.88

仔细检验发现，在代码中只是把2048个通道分为4部分，而不是把区域分为4部分，那是不是可以从LPN的4part入手，学习shuffle各个part直接哪些feature来做refine

2.从植被入手，尽量让注意力不要在植被上

week2：在resnet50的featuremap上，统计每个像素位置的weight，得到一张16x16的heatmap，在用conv去得到4个channel，然后在4个channel上再做avgpool

用LPN的预训练，recall1：58.02 AP：62.96 重头训练，recall1：48.22 AP：53.33

对四通道采用paration loss: recall1:63.55 AP:67.97

paration loss只对方差进行了监督，同时因为log在零点的梯度大远反而小，将其改为：

recall：71.38 AP:75.29

week3：将loss改为0.5次方： recall1：68.8 AP：72.92

recall1：69.09 AP：72.2 修改rpp结构从2层卷积换到3层和4层均掉点严重。

将pixel summary的操作换为1x1conv,recall1:69.07 AP:73.17

将avgpool换成maxpool，recall1:70.54 AP:74.17

在rpp中加入non local block,在两层卷积之后，recall1：70.11 AP：74.10 4个part可视化

生成了LPN的4张partial mask,将rpp生成的4张heatmap与其做smoothL1loss+paration loss，recall1：61.75，AP：66.6

4个part可视化：

在rpp中多加一层conv3x3和conv1x1,增加heatmap生成的效果,recall1:62.10 AP:66.62 4个part可视化：

还是觉得是从pixel summary这边，信息量丢失了太多了可能，从resnet的输出开始，用了3层卷积将2048逐步压缩到4维，采用smoothL1loss+paration loss，recall1：64.69 AP:69.04

采用3个CBAM注意力模块去生成3张mask，效果很烂。

week4:

只分2块，一块为中心一块为周边，在layer2-4采用Unified Attention Fusion Module（mean和max在part1和part2中分开计算），然后将part1和part2特征contact在一起在输出加入bnn， center loss权重越大，越关注在中心，lamda=0.001，recall1：73.29 AP：76.74,lamda=0.0005,recall1:70.74 AP:74.37,lamda=0.0001,recall1:74.13 AP:77.49

gradcam for layer2，layer3，layer4

将part1和part2都单独采用loss，现在能够注意到周围的建筑，但是掉点很多

在part2采用GEMpooling，Recall1:62.50 AP:66.76 猜想：当注意力在外侧时性能较差，是不是说明，网络还没有学习到外层环境和建筑物的关系，从而有点将其作为噪声的感觉，所以接下来的工作就是如何让网络学习这两者之间的关系

week5：将part1和part2经过GEMpooling后特征contact在一起后，经过一次conv1x1将两部分融合在一起，通道维度从4096压缩为2048,recall1:75.07 AP:78.36 在part1和part2contact之后的特征上加入Barlow TwinsLoss,降低特征冗,recall1:79.56 AP:82.30 and satelite->drone recall1::84.74 AP:76.72

the-record-on-cross-view-geo-localization's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs