damo-cv / ada-nets Goto Github PK

This is an official implementation for "Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space" accepted at ICLR 2022.

License: MIT License

Python 96.68% Shell 3.32%

face-clustering gcn graph graph-structure-learning

ada-nets's Introduction

Ada-NETS

This is an official implementation for "Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space" accepted at ICLR 2022.

News

🔥 An improved method on face clustering (B-Attenion) is accepted by NeurIPS 2022!
🔥 Ada-NETS is accepted by ICLR 2022!

Introduction

This paper presents a novel Ada-NETS algorithm to deal with the noise edges problem when building the graph in GCN-based face clustering. In Ada-NETS, the features are first transformed to the structure space to enhance the accuracy of the similarity metrics. Then an adaptive neighbour discovery method is used to find neighbours for all samples adaptively with the guidance of a heuristic quality criterion. Based on the discovered neighbour relations, a graph with clean and rich edges is built as the input of GCNs to obtain state-of-the-art on the face, clothes, and person clustering tasks.

Main Results

Getting Started

Install

Clone this repo

git clone https://github.com/Thomas-wyh/Ada-NETS
cd Ada-NETS

Create a conda virtual environment and activate it

conda create -n adanets python=3.6 -y
conda activate adanets

Install Pytorch , cudatoolkit and other requirements.

conda install pytorch==1.2 torchvision==0.4.0a0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data preparation

The process of clustering on the MS-Celeb part1 is as follows:

The original data files are from here(The feature and label files of MSMT17 used in Ada-NETS are here). For convenience, we convert them to .npy format after L2 normalized. The original features' dimension is 256. The file structure should look like:

data
├── feature
│   ├── part0_train.npy
│   └── part1_test.npy
└── label
    ├── part0_train.npy
    └── part1_test.npy

Build the $k$NN by faiss:

sh script/faiss_search.sh

Obtain the top$K$ neighbours and distances of each vertex in the structure space:

sh script/struct_space.sh

Obtain the best neigbours by the candidate neighbours quality criterion:

sh script/max_Q_ind.sh

Train the Adaptive Filter

Train the adaptive filter based on the data prepared above:

sh script/train_AND.sh

Train the GCN and cluster faces

Generate the clean yet rich Graph:

sh script/gene_adj.sh

Train the GCN to obtain enhanced vertex features:

sh script/train_GCN.sh

Perform cluster faces:

sh script/cluster.sh

It will print the evaluation results of clustering. The Bcubed F-socre is about 91.4 and the Pairwise F-score is about 92.7.

Acknowledgement

This code is based on the publicly available face clustering codebase, codebase and the dmlc/dgl.

The k-nearest neighbor search tool uses faiss.

Citing Ada-NETS

@inproceedings{wang2022adanets,
  title={Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space},
  author={Yaohua Wang and Yaobin Zhang and Fangyi Zhang and Senzhang Wang and Ming Lin and YuQi Zhang and Xiuyu Sun},
  booktitle={International conference on learning representations (ICLR)},
  year={2022}
}

@misc{wang2022adanets,
      title={Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space}, 
      author={Yaohua Wang and Yaobin Zhang and Fangyi Zhang and Senzhang Wang and Ming Lin and YuQi Zhang and Xiuyu Sun},
      year={2022},
      eprint={2202.03800},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ada-nets's People

Contributors

Stargazers

Watchers

Forkers

wei-baldwin-zeng goga1992

ada-nets's Issues

使用msmt17_feature_label.zip未达到论文结果 cluster.py中的cos_sim_thres和sim_thres没有给出定义

使用http://idstcv.oss-cn-zhangjiakou.aliyuncs.com/Ada-NETS/MSMT17/msmt17_feature_label.zip给出的数据集跑不出论文中的结果
top改为40的时候bcubed: ave_pre: 0.8325, ave_rec: 0.6552, fscore: 0.7333, cluster_num: 15176
top改为80的时候bcubed: ave_pre: 0.7136, ave_rec: 0.6641, fscore: 0.6879, cluster_num: 14958, singleton_num: 11450
并且cluster.py文件中的cos_sim_thres和sim_thres都没有给出定义
请问应该如何修改

NameError: name 'knn' is not defined

also, when I run >>bash script/gene_adj.sh I got this:
Traceback (most recent call last):
File "tool/gene_adj_adanets.py", line 3, in
from knn import fast_knns2spmat, knns2ordered_nbrs, fast_knns2spmat_adaptivek
File "/home/nigar/Ada-NETS/tool/knn.py", line 289, in
class knn_brute_force(knn):
NameError: name 'knn' is not defined

TypeError: in method 'GpuIndexIVFFlat_train', argument 3 of type 'float const *'

I use custom .npy embeddings and labels of my face image dataset. The structure and namings are the same, but when I run >> sh script/faiss_search.sh I get this error : TypeError: in method 'GpuIndexIVFFlat_train', argument 3 of type 'float const *'
. I guess I extracted embeddings differently from the provided by you. Anyway, any help appreciated. I want to cluster a custom dataset

有一个不明白的问题？

作者，您好，关于struct space，如果struct_space能改变特征集的整体结构，那就可以直接很好的聚类或者识别了，所以不太明白后续的自适应的意义有多大？

failure.... need help

when executed "pip install -r requirements.txt", Error was shown as below:
ERROR: Could not find a version that satisfies the requirement faiss==1.6.3 (from versions: none)
ERROR: No matching distribution found for faiss==1.6.3
So i noted "#faiss==1.6.3"and"#apex==0.1"(cause for the same issue)，the residual requirements parts were installed successfully.
what to deal with the version issue? are there any alternatives version of faiss and apex?
thanks sincerely!

failed to get the same performance

Thank you for your great work.
I run the train and test scripts on the ms1m dataset these days,but I got poor performance.
Here are my result on ms1m part1_test:
Bcubed fscore:0.6264, parwise fscore 0.4994.
I wonder which step went wrong，could you share the training log?
When I run the train_GCN script ,the last loss is:
loss:1.9265 bclloss_pos:0.007 bclloss_neg:1.919

Expect number of features to match number of nodes (len(u)). Got 30152 and 4819 instead.

hi, I got the problem shown in the below when running the script"sh script/train_GCN.sh "

关于论文中没有给出η的值

在论文中的等式2中，定义了新的相似度等于jaccard相似度和cosine相似度的加权和，使用到参数η。但是我在论文中并没有找到实验时这个参数的定义，在代码中发现一个参数lamb=0.3，但是加权的是距离。

所以代码中的lamb=0.3相当于论文中的η=0.7？

AttributeError: 'DGLBlock' object has no attribute 'create_format_'

When running 'sh script/train_GCN.sh', meet questions below. My dgl version is 0.6.1. Have you met this before?

File "train.py", line 58, in collate
block1.create_format_()
AttributeError: 'DGLBlock' object has no attribute 'create_format_'

Got result with 0 and nan using MS-1M

Problem Description

Hi, I got some questions with this code. I converted the part0_train and part1_test parts of the MS1M data set into npy files as you described in the readme. The feature files were directly converted into npy files after l2 regularization and label file reading. And I followed the steps suggested by the readme. In the end, I only calculated the pairwise indicator (sim_thres was set to increase from 0.5 to 0.8 in steps of 0.05), but the results kept prompting that ave_pre was 0, ave_rec and fscore were nan. , I would like to ask how to solve this situation?

By the way, when running the faiss_search.sh file, since my GPU is not compatible with faiss-gpu, I used faiss-cpu to solve this problem.

Environment

GPU A100
CUDA 11.0
Python3.6 or Python 3.9
torch 1.7.1+cu110 or torch 1.10.1

FileNotFoundError: [Errno 2] No such file or directory: '../data/adj/test/adj_adanets.npz'

Obtain the top$K$ neighbours and distances of each vertex in the structure space. Whether the top$K$ neighbours (in the structure space) == top K neighbours (selected by faiss)? if it is right, why the SNR between KNN and KNN+SS are different?

关于MSMT17数据集的问题

恭喜您的团队论文高中，请问有公开MSMT17数据集提取后特征的计划吗？我们组也在研究相关工作，十分想follow一下您的工作。祝您生活愉快，工作顺利。

FileNotFoundError: [Errno 2] No such file or directory: 'AND/outpath/k_infer_pred.npy'

FileNotFoundError: [Errno 2] No such file or directory: 'outpath/ckpt_40000.pth'

Dear author,

when I run>>bash script/train_AND.sh

I get: FileNotFoundError: [Errno 2] No such file or directory: 'outpath/ckpt_40000.pth'

all the other necessary folders are created, but the 'AND/outpath' folder is actually empty. I downloaded the pretrained checkpoints but the mentioned one in the error is not there.
Maybe I miss downloading smtn else?

数据集问题

前辈您好：

    首先恭喜您的团队论文高中

   关于您提供的数据集，在上图数字“2”标注处，下载下来的数据，训练集数据是30248，测试集数据是93820，而上图数字“1”标注处，进入“1”的链接后下载的数据（Part1 (584K): GoogleDrive or BaiduYun (passwd: geq5)），训练集数据是576494，测试集数据是584013。
   在上图数字“2”标注处下载的数据上，也就是训练集数据是30248，测试集数据是93820，最终效果达不到您论文所述，和您论文结果相差很大。想请教您一下，您在训练模型和测试模型的时候，训练集和测试集数量分别设置的是多少？

struct space中相似度的计算代码和论文不一致

论文中相似度的计算是Jaccard距离和cosine相似度的加权和。

但是代码实现好像不一样。如果我没搞错的话这部分是在tool/struct_space.py中实现的，如下

sim = 1.0 * len(query_Rstarset & doc_Rstarset) / len(query_Rstarset | doc_Rstarset)
jd = 1 - sim
cd = D[query_nodeid, idx]
nd = (1-lamb) * jd + lamb * cd

其中D是在tool/faiss_search.py中使用faiss计算得到的，但是这里的D是保存的欧式距离？代码如下，这里是使用欧式距离进行索引查找的。而且我打印出来D的值也是由小到大的。

quantizer = faiss.IndexFlatL2(dim)      # 定义量化器/索引为l2距离(欧式距离)，越小越好
cpu_index = faiss.IndexIVFFlat(quantizer, dim, nlist)
cpu_index.nprobe = nprobe

请问是这样的吗，还是我有什么遗漏之处？