GithubHelp home page GithubHelp logo

Comments (5)

ugtony avatar ugtony commented on May 1, 2024 1

The decode program does work,
but the comment and the assignment of img_name/img_string are inconsistent

# Column1: Freebase MID
# Column2: Query/Name
# Column3: ImageSearchRank
# Column4: ImageURL
# Column5: PageURL
# Column6: ImageData_Base64Encoded
.
.
.
img_name = fields[1] + '-' + fields[4] + '.' + args.output_format
img_string = fields[6]

The comment seems to be for the Full ImageThumnails version and the assignment for the cropped or the aligned version.
The inconsistency could be fixed by changing the comment:

# Column1: Freebase MID
# Column2: ImageSearchRank
# Column3: ImageURL
# Column4: PageURL
# Column5: FaceID
# Column6: FaceRectangle_Base64Encoded (four floats, relative coordinates of UpperLeft and BottomRight corner)
# Column7: FaceData_Base64Encoded

from facenet.

hudvin avatar hudvin commented on May 1, 2024

There is another one dataset https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/ but it's very dirty. I trained separate model to remove garbage. Still, it's around 10 million images and I have no resources to process all of them.

from facenet.

davidsandberg avatar davidsandberg commented on May 1, 2024

Hi @hudvin!
How did you plan to remove the garbage?
I did some tests with the casia dataset where I selected a subset of the casia images based on the distance for each image to the class center (the implementation can be found on the branch dataset_filtering). It was just intended as an experiment to try to validate the method, but it actually improved the accuracy (when trained on casia only) from 0.984 to 0.988 which was a bit surprising. The plan is to apply the same principle on the MsCeleb dataset which contains a lot more label noise and see what happens.

from facenet.

hudvin avatar hudvin commented on May 1, 2024

I suppose there must be at least two stages:

  1. Remove real garbage like photos without faces, postage stamps, drawings, bw and so on. I used basic caffe classifier for this. Works more or less ok, but quite slow. On my hardware it would take weeks to process whole dataset.
    2.Some folders contain images of completely different persons. I suppose it's possible to calculate similarity between them or make some kind of clusterization. But again this would take a lot of time.

from facenet.

davidsandberg avatar davidsandberg commented on May 1, 2024

Here are some results from the training:
20160127_msceleb_accuracy
I still need to test some more agressive dataset filtering settings but so far the best LFW accuracy is around 0.994.

from facenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.