GithubHelp home page GithubHelp logo

luizgh / sigver_wiwd Goto Github PK

View Code? Open in Web Editor NEW
135.0 14.0 51.0 2.4 MB

Learned representation for Offline Handwritten Signature Verification. Models and code to extract features from signature images.

Home Page: https://www.etsmtl.ca/Unites-de-recherche/LIVIA/Recherche-et-innovation/Projets/Signature-Verification

License: BSD 2-Clause "Simplified" License

Python 10.00% Jupyter Notebook 90.00%
signature-verification representation-learning deep-learning convolutional-neural-networks handwriting

sigver_wiwd's Introduction

Learned representation for Offline Handwritten Signature Verification

NEW: Code for training the models (using Pytorch) is now available in a new repository: https://github.com/luizgh/sigver

This repository contains the code and instructions to use the trained CNN models described in [1] to extract features for Offline Handwritten Signatures. It also includes the models described in [2] that can generate a fixed-sized feature vector for signatures of different sizes.

[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)

[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Fixed-sized representation learning from Offline Handwritten Signatures of different sizes" https://doi.org/10.1007/s10032-018-0301-6 (preprint)

Topics:

  • Installation: How to set-up the dependencies / download the models to extract features from new signatures
  • Usage: How to use this code as a feature extractor for signature images
  • Using the features in Matlab: A script to facilitate processing multiple signatures and saving the features in matlab (.mat) format
  • Datasets: Download extracted features (using the proposed models) for the GPDS, MCYT, CEDAR and Brazilian PUC-PR datasets (for the methods presented in [1] - .mat files that do not require any pre-processing code)

Installation

Pre-requisites

The code is written in Python 21. We recommend using the Anaconda python distribution (link), and create a new environment using:

conda create -n sigver -y python=2
source activate sigver

The following libraries are required

  • Scipy version 0.18
  • Pillow version 3.0.0
  • OpenCV
  • Theano2
  • Lasagne2

They can be installed by running the following commands:

conda install -y "scipy=0.18.0" "pillow=3.0.0"
conda install -y jupyter notebook matplotlib # Optional, to run the example in jupyter notebook
pip install opencv-python
pip install "Theano==0.9"
pip install https://github.com/Lasagne/Lasagne/archive/master.zip

We tested the code in Ubuntu 16.04. This code can be used with or without GPUs - to use a GPU with Theano, follow the instructions in this link. Note that Theano takes time to compile the model, so it is much faster to instantiate the model once and run forward propagation for many images (instead of calling many times a script that instantiates the model and run forward propagation for a single image).

1 Python 3.5 can be also be used, but the feature vectors will differ from those generated from Python 2 (due to small differences in preprocessing the images). Either version can be used, but feature vectors generated from different versions should not be mixed. Note that the data on section Datasets has been obtained using Python 2.

2 Although we used Theano and Lasagne for training, you can also use TensorFlow to extract the features. See tf_example.py for details.

Downloading the models

  • Clone (or download) this repository
  • Download the pre-trained models from the links below:
    • Save / unzip the models in the "models" folder

Signet: https://drive.google.com/file/d/1KffsnZu8-33wXklsodofw-a-KX6tAsVN/view?usp=share_link

Signet SPP-models: https://drive.google.com/file/d/1KffsnZu8-33wXklsodofw-a-KX6tAsVN/view?usp=share_link

Testing

Run python example.py and python example_spp.py. These scripts pre-process a signature, and compare the feature vectors obtained by the model to the results obtained by the author. If the test fails, please check the versions of Scipy and Pillow. I noticed that different versions of these libraries produce slightly different results for the pre-processing steps.

Usage

The following code (from example.py) shows how to load, pre-process a signature, and extract features using one of the learned models:

from scipy.misc import imread
from preprocess.normalize import preprocess_signature
import signet
from cnn_model import CNNModel

# Maximum signature size (required for the SigNet models):
canvas_size = (952, 1360)  

# Load and pre-process the signature
original = imread('data/some_signature.png', flatten=1)

processed = preprocess_signature(original, canvas_size)

# Load the model
model_weight_path = 'models/signet.pkl'
model = CNNModel(signet, model_weight_path)

# Use the CNN to extract features
feature_vector = model.get_feature_vector(processed)

# Multiple images can be processed in a single forward pass using:
# feature_vectors = model.get_feature_vector_multiple(images)

Note that for the SigNet models (from [1]) the signatures used in the get_feature_vector method must always have the same size as those used for training the system (150 x 220 pixels).

For the SigNet-SPP methods (from [2]) the signatures can have any size. We provide models trained on signatures scanned at 300dpi and signatures scanned at 600dpi. Refer to the paper for more details on this method.

For an interactive example, use jupyter notebook:

jupyter notebook

Look for the notebook "interactive_example.ipynb". You can also visualize it directly here

Using the features in Matlab

While the code requires python (with the libraries mentioned above) to extract features, it is possible to save the results in a matlab format. We included a script that process all signatures in a folder and save the results in matlab files (one .mat file for each signature).

Usage:

python process_folder.py <signatures_path> <save_path> <model_path> [canvas_size]

Example:

python process_folder.py signatures/ features/ models/signet.pkl

This will process all signatures in the "signatures" folder, using the SigNet model, and save one .mat file in the folder "features" for each signatures. Each file contains a single variable named "feature_vector" with the features extracted from the signature.

Datasets

To facilitate further research, we are also making available the features extracted for each of the four datasets used in this work (GPDS, MCYT, CEDAR, Brazilian PUC-PR), using the models SigNet, SigNet-F (with lambda=0.95) and SigNet-SPP-300dpi.

Dataset SigNet SigNet-F SigNet-SPP-300dpi
GPDS GPDS_signet GPDS_signet_f GPDS_signetspp_300dpi
MCYT MCYT_signet MCYT_signet_f MCYT_signetspp_300dpi**
CEDAR CEDAR_signet CEDAR_signet_f CEDAR_signetspp_300dpi**
Brazilian PUC-PR* brazilian_signet brazilian_signet_f Brazilian_signetspp_300dpi**

There are two files for each user: real_X.mat and forg_X.mat. The first contains a matrix of size N x 2048, containing the feature vectors of N genuine signatures from that user. The second contains a matrix of size M x 2048, containing the feature vectors of each of the M skilled forgeries made targetting the user.

* Note: for the brazilian PUC-PR dataset, the first 10 forgeries are "Simple forgeries", while the last 10 forgeries are "Skilled forgeries".

** Note: These results are without finetuning the network to the particular datasets. We used the model trained with "SPP Fixed", and considered images in 300dpi, centered in a canvas of size defined in GPDS (428 X 612; larger images were processed in the original size). Note that this is different than the protocol used in the paper, since in the paper we were randomly splitting the datasets in 50% train(for finetuning) and 50% test.

Loading the feature vectors in matlab

f = load('real_2.mat')
% f.features: [Nx2048 single]

Loading the feature vectors in python

from scipy.io import loadmat
features = loadmat('real_2.mat')['features']
# features: numpy array of shape (M, 2048)

Citation

If you use our code, please consider citing the following papers:

[1] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks" http://dx.doi.org/10.1016/j.patcog.2017.05.012 (preprint)

[2] Hafemann, Luiz G., Robert Sabourin, and Luiz S. Oliveira. "Fixed-sized representation learning from Offline Handwritten Signatures of different sizes" https://doi.org/10.1007/s10032-018-0301-6 (preprint)

If using any of the four datasets mentioned above, please cite the paper that introduced the dataset:

GPDS: Vargas, J.F., M.A. Ferrer, C.M. Travieso, and J.B. Alonso. 2007. “Off-Line Handwritten Signature GPDS-960 Corpus.” In Document Analysis and Recognition, 9th I nternational Conference on, 2:764–68. doi:10.1109/ICDAR.2007.4377018.

MCYT: Ortega-Garcia, Javier, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, et al. 2003. “MCYT Baseline Corpus: A Bimodal Biometric Database.” IEE Proceedings-Vision, Image and Signal Processing 150 (6): 395–401.

CEDAR: Kalera, Meenakshi K., Sargur Srihari, and Aihua Xu. 2004. “Offline Signature Verification and Identification Using Distance Statistics.” International Journal of Pattern Recognition and Artificial Intelligence 18 (7): 1339–60. doi:10.1142/S0218001404003630.

Brazilian PUC-PR: Freitas, C., M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier, F. Bortolozzi, and R. Sabourin. 2000. “Bases de Dados de Cheques Bancarios Brasilei ros.” In XXVI Conferencia Latinoamericana de Informatica.

License

The source code is released under the BSD 2-clause license. Note that the trained models used the GPDS dataset for training (which is restricted for non-comercial use).

sigver_wiwd's People

Contributors

luizgh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sigver_wiwd's Issues

Further work

hey, what if we add a new user with different class. then do we need to train it again?

dataset for ssp

Hi,
Will you make available the features extracted using SSP models for each of the four datasets?

Thanks

No such file or directory: 'models/signet.pkl'

sir i am getting error
No such file or directory: 'models/signet.pkl'

IOErrorTraceback (most recent call last)
in ()
----> 1 model = CNNModel(signet, model_weight_path)

/content/drive/My Drive/sigver_wiwd-master/cnn_model.py in init(self, model_factory, model_weight_path)
18 model_weights_path (str): A file containing the trained weights
19 """
---> 20 with open(model_weight_path, 'rb') as f:
21 if six.PY2:
22 model_params = cPickle.load(f)

IOError: [Errno 2] No such file or directory: 'models/signet.pkl'

WI classifier (CNN) training

Dear Sir,

Please reply these questions regarding training WI classifier (CNN),
1.Both real and forgeries of a user form development set used? ,and if both were used then were they given same label or different?
2.Were all forgeries for all users given same label?

Thanks,
Chunky

Getting error while reading signet.pkl file

Sir,
I tried to read "signet.pkl" file in python 3.6 but I am getting error:

code:
from six.moves import cPickle
with open('signet.pkl', 'rb') as f:
model_params = cPickle.load(f)

error:

Traceback (most recent call last):
File "<ipython-input-39-0593160b4786>", line 3, in <module>
model_params = cPickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

I tried
with open('signet.pkl', 'rb').read().decode('utf8') but its not working.

Google Colab compatability

Hello! I want to try running this code but my machine is not powerful enough for it. Would it be possible for this to run in google colaboratory?

Loading the weights - Bias term

Hello,

I am trying to load the model weights you made available, but it doesn't seem that I can access the bias term of the Convolutional Layers.

When I run:

[l.shape for l in lasagne.layers.get_all_param_values(model.model['fc2'])]

Those are the params I get:

[W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std, W, beta, gamma, mean, inv_std]

and

[l.shape for l in lasagne.layers.get_all_param_values(model.model['fc2'])]
      

I get:

[(96, 1, 11, 11), (96,), (96,), (96,), (96,), (256, 96, 5, 5), (256,), (256,), (256,), (256,), (384, 256, 3, 3), (384,), (384,), (384,), (384,), (384, 384, 3, 3), (384,), (384,), (384,), (384,), (256, 384, 3, 3), (256,), (256,), (256,), (256,), (3840, 2048), (2048,), (2048,), (2048,), (2048,), (2048, 2048), (2048,), (2048,), (2048,), (2048,)]

As we can see, those are the params related to the weights of the convolutional layers and the params of the batch normalization, it doesnt seem that this method returns the bias term as one of the params.

Any idea how can I get the bias information?

I am asking this because I want to port this model to Tensorflow.

Best regards,
Victor.

generating .npy files

I have been working with this code repository very recently after going through the paper "Learning features for offline handwritten signature verification using deep convolutional neural networks by Luiz G. Hafemann, Robert Sabourin ,LuizS.Oliveira". The problem I am facing is that how can I generate .npy files placed in the data/ directory of this project, so that we might verify any other user defined signature image other than "some_signature.png". The aim behind asking is that a user defined image can be used for signature verification since the code "example.py" actually compared the .npy files at the time of testing.
My question might sound as I am new to this field.
Thanking you in anticipation.
Best Regards
Tahir

Unable to replicate using extracted features

Hello,

I am trying to train and test a WD classifier using features extracted by Signet and Signet-F(0.95), however my error scores are usually a little bit higher (1.5-2.5%) than the reported scores on Vv (Table 5, EERglobalthreshold on paper). I am using:

C=1, class_weight='balanced', gamma=2**(-11)

'balanced' option should match the skew according to the sklearn documentation. Both kernels have similar results.

Is this a normal behaviour or is there something else I might be missing?

Another question: GPDS960 consists of 960 users, according to the research group, but on your research paper, you state that there are 881 users, on the extracted features dataset, I can also see some missing user_id's, what's the reason for that?

Thanks a lot!

EDIT: I could replicate and even surpass the CEDAR scores by using:

gamma='scale'

which hints that my implementation is correct, in my opinion.

EDIT2: The errors are also a little bit higher for MCYT. For 10 signatures training, rbf kernel svm, my 10-fold average error is around 3.91%, compared to 2.87% reported on paper.

Training script required to retrain the network on different data sets

I was going through the source files and could not find the training script. I would like to retrain the network on a few different data sets as the accuracy for the pre-trained model(s) on the data I have is not very good, and I think there can be significant improvements after re-training using the data I am working with. Could you please provide the training script that was used for the published work (and used to train the models provided) or a brief guide stating how to go about doing this would be really helpful? Thank you for your assistance.

ImportError: libhdf5.so.10: cannot open shared object file: No such file or directory

Running Ubuntu 16.04. When running python example.py I get the below error:

Traceback (most recent call last):
  File "example.py", line 10, in <module>
    from preprocess.normalize import preprocess_signature
  File "/home/jash/Documents/Miscellaneous/sigver_wiwd/preprocess/normalize.py", line 1, in <module>
    import cv2
ImportError: libhdf5.so.10: cannot open shared object file: No such file or directory

Request you to help.

signet_spp_300dpi and signet_spp_600dpi

Hi , is there any way i can use this signet_spp_300dpi and signet_spp_600dpi using tensor flow, as i do not want to use theano. My current setup is in tensorflow.

I appreciate your suggestion.
Thanks

Change Cassifiers.

I want to change classifiers SVM to different one. Please guide me wheres the classifiers are used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.