GithubHelp home page GithubHelp logo

Comments (5)

scotthong avatar scotthong commented on April 30, 2024 2

Here is another update with the performance improvements using the proposed changes above.

Depending on the GPU used for training, in this case using Titan X (Pascal), the time to train 10 epochs
Original select_triplets method
Total time to train 10 epochs = 2h 10m = 2*60 +10 = ~ 130 min
Time per epoch = 1300/10 = 13 m

With the proposed enhancement (one training session only)
Total time to train 10 epochs = 45 m
Time per epoch = 45/10 = 4.5 m

Speed up about 2.89x
13 / 4.5 = ~2.89

YMMV. It's nice to see this minor change can have this level of performance improvement,

from facenet.

davidsandberg avatar davidsandberg commented on April 30, 2024

Currently none of the above bullets are handled.
Is there a good way to compare the identities between the datasets? Or does it include a lot of manual work? For the MsCeleb dataset the MIDs from freebase has been used to get unique identities for the classes, and it would be good to have the same for the other datasets as well.

from facenet.

scotthong avatar scotthong commented on April 30, 2024

I use a simple Java program to merge the Casia and FaceScrub datasets and print out the classes/sub directories that have the same name. I validated a dozen of these merged classes and it seems to no problem at all. Imagine that celebraties have the same name? It would be very rare right! Let go with this assumption and simply merge the data sets together. One of the reason for doing that is because of the triplet selection for the semi-hard negative. Accoording to the algorithm, if the same person/class is selected ramdonly from the other dataset of the same person, it is very likely that the image will end up been selected as the semi-hard negative. Which is totally make the training going in the wrong direction.

Based on the same assumption (the sub-directory name in the dataset is unique among the datasets), All the classes exist in LFW will be removed from the merged dataset for training. This is how the merge dataset is been processed.

I have two training sessions that are running concurrently on the same machine, here are my observations so far:

  1. Compared with the chard you posted on wiki, and with my own training results using the original datasets, the LFW chart is smoother or less fluctuations (using Tensorboard with smoothing factor=0).
  2. LFW accuracy curve seems to be the same and I am currently waiting for the training to reach 250 epochs to compare the results from my previous run.

It seems that the "select_triplets" routine is utilizing more than 60% of the training time? Do you have any plan on improving the performance of this routine? Based on my initial testing, this following idea seems to be working faster.

# 
num_per_class =45
people_per_batch = 40
num_of_images = num_per_class * people_per_batch

# create a random array with values between 0..1
embeddings = np.random.random((num_of_images, 128)).astype('f')
# Create an array to save the distances
dists = np.zeros((num_of_images, num_of_images)).astype('f')

maxfloat = np.finfo(np.float32).max

# scalar distance between embeddings
for i in np.arange(0,num_of_images):
    dists[i] = np.sum(np.square(np.subtract(embeddings, embeddings[i])), 1)

# fill the diagnonal with max float 32 value to prevent it from been selected as negative
np.fill_diagonal(dists, maxfloat)

# Get the pos_dist out of the array and then fill them with maxfloat32 to prevent
# them from been selected as negative
# ...
# find the argmin as the index for the negative
idx_semi_hard_negs = np.argmin(dists, 1)
#  continue with the rest of the select_triplets routine.
# ...

Thanks,

from facenet.

scotthong avatar scotthong commented on April 30, 2024

Hi David,

I tried the performance improvement idea and it actually works pretty well. The triplet selection time has been improved from 10+ seconds (time/selection on the tensorboard) to less than 3 seconds. I believe further improvement is possible on these for loops. The key idea is that the distance matrix is pre-calculated in advance and reference the distance value in the for loops. I validated the result and the implementation should be correct.

def select_triplets(embeddings, num_per_class, image_data, people_per_batch, alpha):

def dist(emb1, emb2):
    x = np.square(np.subtract(emb1, emb2))
    return np.sum(x, 0)

nrof_images = image_data.shape[0]

# distance matrix
dists = np.zeros((nrof_images, nrof_images))
# pre-calculate the distance matrix
for i in np.arange(0, nrof_images):
    dists[i] = np.sum(np.square(np.subtract(embeddings, embeddings[i])), 1)

nrof_triplets = nrof_images - people_per_batch
shp = [nrof_triplets, image_data.shape[1], image_data.shape[2], image_data.shape[3]]
as_arr = np.zeros(shp)
ps_arr = np.zeros(shp)
ns_arr = np.zeros(shp)

trip_idx = 0
shuffle = np.arange(nrof_triplets)
np.random.shuffle(shuffle)
emb_start_idx = 0
nrof_random_negs = 0
for i in xrange(people_per_batch):
    n = num_per_class[i]
    for j in range(1,n):
        a_idx = emb_start_idx
        p_idx = emb_start_idx + j
        as_arr[shuffle[trip_idx]] = image_data[a_idx]
        ps_arr[shuffle[trip_idx]] = image_data[p_idx]

        # Select a semi-hard negative that has a distance
        #  further away from the positive exemplar.
        # pos_dist = dist(embeddings[a_idx][:], embeddings[p_idx][:])
        pos_dist = dists[a_idx, p_idx]
        # delta = pos_dist - dists[a_idx, p_idx]
        #if np.abs(delta) > 0.0001:
        #    print('pos_dist=%.3f - %.3f' % (pos_dist, dists[a_idx, p_idx]))

        sel_neg_idx = emb_start_idx
        while sel_neg_idx>=emb_start_idx and sel_neg_idx<=emb_start_idx+n-1:
            sel_neg_idx = (np.random.randint(1, 2**32) % nrof_images) - 1
            #sel_neg_idx = np.random.random_integers(0, nrof_images-1)
        # sel_neg_dist = dist(embeddings[a_idx][:], embeddings[sel_neg_idx][:])
        sel_neg_dist = dists[a_idx, sel_neg_idx]
        # delta = sel_neg_dist - dists[a_idx, sel_neg_idx]
        # if np.abs(delta) > 0.0001:
        #     print('sel_neg_dist=%.3f - %.3f' % (sel_neg_dist, dists[a_idx, sel_neg_idx]))

        random_neg = True
        for k in range(nrof_images):
            if k<emb_start_idx or k>emb_start_idx+n-1:
                # neg_dist = dist(embeddings[a_idx][:], embeddings[k][:])
                neg_dist = dists[a_idx, k]
                # delta = neg_dist - dists[a_idx, k]
                # if np.abs(delta) > 0.0001:
                #     print('pos_dist=%.3f - %.3f' % (neg_dist, dists[a_idx, k]))
                if pos_dist<neg_dist and neg_dist<sel_neg_dist and np.abs(pos_dist-neg_dist)<alpha:
                    random_neg = False
                    sel_neg_dist = neg_dist
                    sel_neg_idx = k

        if random_neg:
            nrof_random_negs += 1

        ns_arr[shuffle[trip_idx]] = image_data[sel_neg_idx]
        #print('Triplet %d: (%d, %d, %d), pos_dist=%2.3f, neg_dist=%2.3f, sel_neg_dist=%2.3f' % (trip_idx, a_idx, p_idx, sel_neg_idx, pos_dist, neg_dist, sel_neg_dist))
        trip_idx += 1

    emb_start_idx += n

triplets = (as_arr, ps_arr, ns_arr)

return triplets, nrof_random_negs, nrof_triplets

from facenet.

davidsandberg avatar davidsandberg commented on April 30, 2024

That is a very good speed-up indeed!! If you could make a pull request out of it I would be happy to merge it!
Lately I have been running the training as a classifier and it has worked out pretty well. But I like the triplet loss and I think it can be used for finetuning of the network also when training is started in classifier mode.
For the overlapping identities issue I think the most elegant solution would be to do the "filtering" when parsing through the dataset. But it will require some python hacking...
I guess there could also be classes from different datasets belonging to the same identity but due to differencies in spelling they are treated as different identities. But I'm not sure that it would be a big issue though.

from facenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.