GithubHelp home page GithubHelp logo

oneflow_face's Introduction

InsightFace in OneFlow

English | 简体中文

It introduces how to train InsightFace in OneFlow, and do verification over the validation datasets via the well-toned networks.

Contents

- InsightFace in OneFlow

- Contents

- Background

- InsightFace opensource project

- Implementation in OneFlow

- Preparations

- Install OneFlow

- Data preparations

- 1. Download datasets

- 2. Transformation from MS1M recordio to OFRecord

- Pretrained model

- Training and verification

- Training

- Varification

- Benchmark

Background

InsightFace opensource project

InsightFace is an open-source 2D&3D deep face analysis toolbox, mainly based on MXNet.

In InsightFace, it supports:

  • Datasets typically used for face recognition, such as CASIA-Webface、MS1M、VGG2(Provided with the form of a binary file which could run in MXNet, here is more details about the datasets and how to download.
  • Backbones of ResNet, MobilefaceNet, InceptionResNet_v2, and other deep-learning networks to apply in facial recognition.

  • Implementation of different loss functions, including SphereFace Loss、Softmax Loss、SphereFace Loss, etc.

Implementation in OneFlow

Based upon the currently existing work of Insightface, OneFlow ported basic models from it, and now OneFlow supports:

  • Training datasets of MS1M、Glint360k, and validation datasets of Lfw、Cfp_fp and Agedb_30, scripts for training and validating.

  • Backbones of ResNet100 and MobileFaceNet to recognize faces.

  • Loss function, e.g. Softmax Loss and Margin Softmax Loss(including Arcface、Cosface and Combined Loss).

  • Model parallelism and Partial FC optimization.

  • Model transformation via MXNet.

To be coming further:

  • Additional datasets transformation.

  • Plentiful backbones.

  • Full-scale loss functions implementation.

  • Incremental tutorial on the distributed configuration.

This project is open for every developer to PR, new implementation and animated discussion will be most welcome.

Preparations

First of all, before execution, please make sure that:

  1. Install OneFlow

  2. Prepare training and validation datasets in form of OFRecord.

Install OneFlow

According to steps in Install OneFlow install the newest release master whl packages.

python3 -m pip install --find-links https://release.oneflow.info oneflow_cu102 --user

Data preparations

According to Load and Prepare OFRecord Datasets, datasets should be converted into the form of OFREcord, to test InsightFace.

It has provided a set of datasets related to face recognition tasks, which have been pre-processed via face alignment or other processions already in InsightFace. The corresponding datasets could be downloaded from here and should be converted into OFRecord, which performs better in OneFlow. Considering the cumbersome steps, it is suggested to download converted OFrecord datasets:

MS1M-ArcFace(face_emore)

MS1MV3

It illustrates how to convert downloaded datasets into OFRecords, and take MS1M-ArcFace as an example in the following.

1. Download datasets

The structure of the downloaded MS1M-ArcFace is shown as follown:

faces_emore/

​    train.idx

​    train.rec

​    property

​    lfw.bin

​    cfp_fp.bin

​    agedb_30.bin

The first three files are MXNet recordio format files of MS1M training dataset, the last three .bin files are different validation datasets.

2. Transformation from MS1M recordio to OFRecord

Only need to execute 2.1 or 2.2 2.1 Use Python scripts directly

Run

python tools/mx_recordio_2_ofrecord_shuffled_npart.py  --data_dir datasets/faces_emore --output_filepath faces_emore/ofrecord/train --part_num 16

And you will get the number of part_num parts of OFRecord, it's 16 parts in this example, it showed like this

tree ofrecord/test/
ofrecord/test/
|-- _SUCCESS
|-- part-00000
|-- part-00001
|-- part-00002
|-- part-00003
|-- part-00004
|-- part-00005
|-- part-00006
|-- part-00007
|-- part-00008
|-- part-00009
|-- part-00010
|-- part-00011
|-- part-00012
|-- part-00013
|-- part-00014
`-- part-00015

0 directories, 17 files

2.2 Use Python scripts + Spark Shuffle + Spark partition

Run

python tools/dataset_convert/mx_recordio_2_ofrecord.py --data_dir datasets/faces_emore --output_filepath faces_emore/ofrecord/train

And you will get one part of OFRecord(part-0) with all data in this way. Then you should use Spark to shuffle and partition.

  1. Get jar package available You can download Spark-oneflow-connector-assembly-0.1.0.jar via Github or OSS

  2. Run in Spark Assign that you have already installed and configured Spark. Run

//Start Spark 
./Spark-2.4.3-bin-hadoop2.7/bin/Spark-shell --jars ~/Spark-oneflow-connector-assembly-0.1.0.jar --driver-memory=64G --conf Spark.local.dir=/tmp/
// shuffle and partition in 16 parts
import org.oneflow.Spark.functions._
Spark.read.chunk("data_path").shuffle().repartition(16).write.chunk("new_data_path")
sc.formatFilenameAsOneflowStyle("new_data_path")

Hence you will get 16 parts of OFRecords, it shown like this

tree ofrecord/test/
ofrecord/test/
|-- _SUCCESS
|-- part-00000
|-- part-00001
|-- part-00002
|-- part-00003
|-- part-00004
|-- part-00005
|-- part-00006
|-- part-00007
|-- part-00008
|-- part-00009
|-- part-00010
|-- part-00011
|-- part-00012
|-- part-00013
|-- part-00014
`-- part-00015

0 directories, 17 files

Pretrained model

The accuracy comparison of OneFlow and MXNet pretrained models on the verification set of the 1:1 verification accuracy on insightface recognition test (IFRT) are as follows:

Framework African Caucasian Indian Asian All
OneFlow 90.4076 94.583 93.702 68.754 89.684
MXNet 90.45 94.60 93.96 63.91 88.23

The download link of the OneFlow pretrain model:of_005_model.tar.gz

We also provide the MXNet model which converted from OneFlow:of_to_mxnet_model_005.tar.gz

OneFLow2ONNX

pip install oneflow-onnx==0.3.4
./convert.sh

Training and verification

Training

To reduce the usage cost of user, OneFlow draws close the scripts to Torch style, you can directly modify parameters via configs/*.py

./run.sh

Varification

Moreover, OneFlow offers a validation script to do verification separately, val.py, which facilitates you to check the precision of the pre-training model saved.

./val.sh

Benchmark

Training Speed Benchmark

Face_emore Dataset & FP32

Backbone GPU model_parallel partial_fc BatchSize / it Throughput img / sec
R100 8 * Tesla V100-SXM2-16GB False False 64 1836.8
R100 8 * Tesla V100-SXM2-16GB True False 64 1854.15
R100 8 * Tesla V100-SXM2-16GB True True 64 1872.81
R100 8 * Tesla V100-SXM2-16GB False False 96(Max) 1931.76
R100 8 * Tesla V100-SXM2-16GB True False 115(Max) 1921.87
R100 8 * Tesla V100-SXM2-16GB True True 120(Max) 1962.76
Y1 8 * Tesla V100-SXM2-16GB False False 256 14298.02
Y1 8 * Tesla V100-SXM2-16GB True False 256 14049.75
Y1 8 * Tesla V100-SXM2-16GB False False 350(Max) 14756.03
Y1 8 * Tesla V100-SXM2-16GB True True 400(Max) 14436.38

Glint360k Dataset & FP32

Backbone GPU partial_fc sample_ratio BatchSize / it Throughput img / sec
R100 8 * Tesla V100-SXM2-16GB 0.1 64 1858.57
R100 8 * Tesla V100-SXM2-16GB 0.1 115 1933.88

Evaluation on Lfw, Cfp_fp, Agedb_30

  • Data Parallelism
Backbone Dataset Lfw Cfp_fp Agedb_30
R100 MS1M 99.717 98.643 98.150
MobileFaceNet MS1M 99.5 92.657 95.6
  • Model Parallelism
Backbone Dataset Lfw Cfp_fp Agedb_30
R100 MS1M 99.733 98.329 98.033
MobileFaceNet MS1M 99.483 93.457 95.7
  • Partial FC
Backbone Dataset Lfw Cfp_fp Agedb_30
R100 MS1M 99.817 98.443 98.217

Evaluation on IFRT

r denotes the sampling rate of negative class centers.

Backbone Dataset African Caucasian Indian Asian ALL
R100 Glint360k(r=0.1) 90.4076 94.583 93.702 68.754 89.684

Max num_classses

node_num gpu_num_per_node batch_size_per_device fp16 Model Parallel Partial FC num_classes
1 1 64 True True True 2000000
1 8 64 True True True 13500000

More test details could refer to OneFlow DLPerf.

oneflow_face's People

Contributors

flowingsun007 avatar guo-ran avatar ldpe2g avatar mir-of avatar nlqq avatar olojuwin avatar ouyangyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oneflow_face's Issues

training

Hi, we've found a few problems in recently tasks.
It seems the code shown in files don't have any details.
The --data_dir in OFrecord running code is so confusing that we must set the path by ourselves cause the former isn't suitable at all.
anohter 'total_batch_number'
we seldom training structures by batch_number.
could u change the metric epoch instead?
the sh texts displayed maybe for the single gpu. we expect to operate the whole project with multi-gpus by OF.
rrr, it's do real unfriendly for user at the moment.

数据转化和切分

OF开发团队你们好:
我们最近在使用仓库代码的过程中发现了些问题。主要是运行mx_recordio_2_ofrecord.py的时间过长,而且仓库中也没有提供在多机多卡情况下的模型切分程序,虽然我们自行完成了切分功能,但是仓库提供的代码确实有所缺失。再细说一下数据切分的问题。理论上OF要求训练数据份数必须是gpu总个数的整数倍,在之前的分布式测试中我们一直使用的是3节点每节点5gpu也就是总共15卡进行训练,所以相应的数据切分也是15份。但是在实际的使用过程中会涉及到资源分配的问题,无法保证每次训练的确切gpu总数,可能这次是单机4卡,下次是3机18卡。有一种折中方案就是公约数较多的样本份数,比如90,可以满足gpu个数1、2、3、6、9、10、15、18、30、45、90等等情况。但是仍然不能满足所有条件,依然没有普遍性。
所以,想提出两点:
1、能否加快格式转化和切分过程
2、能否统一转化和切分程序或脚本

谢谢!

利用partial fc 训练的mobilefacenet转mxnet格式模型报错

Oneflow开发组你们好:
我训练了基于partial fc 的mobilefacenet模型, 想转换为mxnet格式, 但运行 ./tools/model_convert/mobilefacenet/of_model_2_mxnet_model.py 出现了错误.

执行的脚本为:

python of_model_2_mxnet_model.py --mxnet_load_prefix='./model-y1-test2/model' --mxnet_load_epoch=0 --mxnet_save_prefix='./mxnet_save' --mxnet_save_epoch=1 --of_model_dir='./snapshot_30/'

报错为:

[22:11:10] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.0.0. Attempting to upgrade...
[22:11:10] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!

fc1_gamma gamma error
Traceback (most recent call last):
File "of_model_2_mxnet_model.py", line 88, in
of_model_path + "-weight/out", dtype=np.float32
FileNotFoundError: [Errno 2] No such file or directory: './snapshot_30/fc1-weight/out'

查看了snapshot_30目录下生成的文件, 并未发现fc1-weight文件夹

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.