The dataset is available <a href="https://www.microsoft.com/en-us/research/project/ms-

The <a href="https://github.com/davidsandberg/facenet/blob/master/src/decode_msceleb_d

There is another one dataset <a href="https://www.microsoft.com/en-us/research/project

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I suppose there must be at least two stages: Remove real garb

Here are some results from the training: <a target="_blank" rel="noopener noreferr

Train on the Ms-Celeb-V1 dataset about facenet HOT 5 CLOSED

davidsandberg commented on May 1, 2024

Train on the Ms-Celeb-V1 dataset

from facenet.

Comments (5)

ugtony commented on May 1, 2024 1

The decode program does work,
but the comment and the assignment of img_name/img_string are inconsistent

# Column1: Freebase MID
# Column2: Query/Name
# Column3: ImageSearchRank
# Column4: ImageURL
# Column5: PageURL
# Column6: ImageData_Base64Encoded
.
.
.
img_name = fields[1] + '-' + fields[4] + '.' + args.output_format
img_string = fields[6]

The comment seems to be for the Full ImageThumnails version and the assignment for the cropped or the aligned version.
The inconsistency could be fixed by changing the comment:

# Column1: Freebase MID
# Column2: ImageSearchRank
# Column3: ImageURL
# Column4: PageURL
# Column5: FaceID
# Column6: FaceRectangle_Base64Encoded (four floats, relative coordinates of UpperLeft and BottomRight corner)
# Column7: FaceData_Base64Encoded

from facenet.

hudvin commented on May 1, 2024

There is another one dataset https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/ but it's very dirty. I trained separate model to remove garbage. Still, it's around 10 million images and I have no resources to process all of them.

from facenet.

davidsandberg commented on May 1, 2024

Hi @hudvin!
How did you plan to remove the garbage?
I did some tests with the casia dataset where I selected a subset of the casia images based on the distance for each image to the class center (the implementation can be found on the branch dataset_filtering). It was just intended as an experiment to try to validate the method, but it actually improved the accuracy (when trained on casia only) from 0.984 to 0.988 which was a bit surprising. The plan is to apply the same principle on the MsCeleb dataset which contains a lot more label noise and see what happens.

from facenet.

hudvin commented on May 1, 2024

I suppose there must be at least two stages:

Remove real garbage like photos without faces, postage stamps, drawings, bw and so on. I used basic caffe classifier for this. Works more or less ok, but quite slow. On my hardware it would take weeks to process whole dataset.
2.Some folders contain images of completely different persons. I suppose it's possible to calculate similarity between them or make some kind of clusterization. But again this would take a lot of time.

from facenet.

davidsandberg commented on May 1, 2024

Here are some results from the training:

I still need to test some more agressive dataset filtering settings but so far the best LFW accuracy is around 0.994.

from facenet.

Recommend Projects

Train on the Ms-Celeb-V1 dataset about facenet HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs