GithubHelp home page GithubHelp logo

anhaidgroup / py_stringmatching Goto Github PK

View Code? Open in Web Editor NEW
134.0 134.0 16.0 9.91 MB

A comprehensive and scalable set of string tokenizers and similarity measures in Python

Home Page: https://sites.google.com/site/anhaidgroup/projects/py_stringmatching

License: BSD 3-Clause "New" or "Revised" License

Python 94.40% PowerShell 0.99% Batchfile 1.17% Cython 3.45%

py_stringmatching's People

Contributors

abdealiloko avatar alihitawala avatar anhaidgroup avatar anson-doan avatar chakshuahuja avatar christiemj09 avatar kvpradap avatar paulgc avatar pjmartinkus avatar rishab93 avatar ruijianw avatar srujithpoondla avatar wynksaiddestroy avatar zware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py_stringmatching's Issues

Issues when installing

My version is 3.7.3. and I get an error message when I try to install..And i

`Collecting py_stringmatching
Using cached https://files.pythonhosted.org/packages/e8/11/a7d8568eaac88e167fedd857640fe04e8950511e5fbe0700a42e12900a48/py_stringmatching-0.4.0.tar.gz
Requirement already satisfied: numpy>=1.7.0 in /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages (from py_stringmatching) (1.15.3)
Requirement already satisfied: six in /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages (from py_stringmatching) (1.11.0)
Building wheels for collected packages: py-stringmatching
Running setup.py bdist_wheel for py-stringmatching: started
Running setup.py bdist_wheel for py-stringmatching: finished with status 'error'
Complete output from command /home/suzil/anaconda3/envs/py3.7/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-r8vqgfyt/py-stringmatching/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-yg45dyr0 --python-tag cp37:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/py_stringmatching
copying py_stringmatching/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching
copying py_stringmatching/utils.py -> build/lib.linux-x86_64-3.7/py_stringmatching
creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaro.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/bag_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/monge_elkan.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/partial_ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/token_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/partial_token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/smith_waterman.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/levenshtein.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/tversky_index.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/dice.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/generalized_jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/needleman_wunsch.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/overlap_coefficient.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/hybrid_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/editex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/hamming_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/phonetic_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/soft_tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/sequence_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/affine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/cosine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaro_winkler.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
creating build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/whitespace_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/delimiter_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/alphanumeric_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/alphabetic_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/qgram_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/definition_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
creating build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_sim_Soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_simfunctions.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_tokenizers.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
running egg_info
writing py_stringmatching.egg-info/PKG-INFO
writing dependency_links to py_stringmatching.egg-info/dependency_links.txt
writing requirements to py_stringmatching.egg-info/requires.txt
writing top-level names to py_stringmatching.egg-info/top_level.txt
reading manifest file 'py_stringmatching.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'py_stringmatching.egg-info/SOURCES.txt'
copying py_stringmatching/similarity_measure/cython/cython_affine.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_jaro.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_jaro_winkler.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_levenshtein.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_needleman_wunsch.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_smith_waterman.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_utils.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
running build_ext
building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/py_stringmatching
creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure
creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -m64 -fPIC -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -m64 -fPIC -fPIC -I/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include -I/home/suzil/anaconda3/envs/py3.7/include/python3.7m -c py_stringmatching/similarity_measure/cython/cython_levenshtein.c -o build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython/cython_levenshtein.o
In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^~~~~~~
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSave’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18818:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18819:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18820:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionReset’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18832:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18833:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18834:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18835:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18836:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18837:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_GetException’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18880:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18881:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18882:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18883:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18884:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18885:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSwap’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18907:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18908:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18909:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18910:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = *type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18911:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = *value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18912:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = *tb;
^~~~~~~~~~~~~
curexc_traceback
In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
At top level:
/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^~~~~~~~~~~~~
error: command 'gcc' failed with exit status 1

Running setup.py clean for py-stringmatching
Failed to build py-stringmatching
Installing collected packages: py-stringmatching
Running setup.py install for py-stringmatching: started
Running setup.py install for py-stringmatching: finished with status 'error'
Complete output from command /home/suzil/anaconda3/envs/py3.7/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-r8vqgfyt/py-stringmatching/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-d_vgnbcd/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/py_stringmatching
copying py_stringmatching/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching
copying py_stringmatching/utils.py -> build/lib.linux-x86_64-3.7/py_stringmatching
creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaro.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/bag_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/monge_elkan.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/partial_ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/token_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/partial_token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/smith_waterman.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/levenshtein.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/tversky_index.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/dice.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/generalized_jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/needleman_wunsch.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/overlap_coefficient.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/hybrid_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/editex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/hamming_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/phonetic_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/soft_tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/sequence_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/affine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/cosine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
copying py_stringmatching/similarity_measure/jaro_winkler.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
creating build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/whitespace_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/delimiter_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/alphanumeric_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/alphabetic_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/qgram_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
copying py_stringmatching/tokenizer/definition_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
creating build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_sim_Soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_simfunctions.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
copying py_stringmatching/tests/test_tokenizers.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/init.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
running egg_info
writing py_stringmatching.egg-info/PKG-INFO
writing dependency_links to py_stringmatching.egg-info/dependency_links.txt
writing requirements to py_stringmatching.egg-info/requires.txt
writing top-level names to py_stringmatching.egg-info/top_level.txt
reading manifest file 'py_stringmatching.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'py_stringmatching.egg-info/SOURCES.txt'
copying py_stringmatching/similarity_measure/cython/cython_affine.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_jaro.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_jaro_winkler.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_levenshtein.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_needleman_wunsch.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_smith_waterman.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
copying py_stringmatching/similarity_measure/cython/cython_utils.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
running build_ext
building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/py_stringmatching
creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure
creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -m64 -fPIC -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -m64 -fPIC -fPIC -I/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include -I/home/suzil/anaconda3/envs/py3.7/include/python3.7m -c py_stringmatching/similarity_measure/cython/cython_levenshtein.c -o build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython/cython_levenshtein.o
In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^~~~~~~
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSave’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18818:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18819:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18820:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionReset’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18832:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18833:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18834:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18835:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18836:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18837:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_GetException’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18880:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18881:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18882:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18883:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18884:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18885:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSwap’:
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18907:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18908:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18909:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18910:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = *type;
^~~~~~~~
curexc_type
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18911:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = *value;
^~~~~~~~~
curexc_value
py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18912:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = *tb;
^~~~~~~~~~~~~
curexc_traceback
In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
At top level:
/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^~~~~~~~~~~~~
error: command 'gcc' failed with exit status 1`

Install fails with --use-pep517

Thanks for maintaining this project. The flag ends up being a problem because poetry calls --use-pep517 when installing dependencies. (python-poetry/poetry#3433)

Steps to reproduce

$ mkdir /tmp/repro \
      && cd /tmp/repro \
      && python3 -m venv .venv \
      && source .venv/bin/activate
$ pip install --use-pep517 py_stringmatching
Collecting py_stringmatching
  Using cached py_stringmatching-0.4.2.tar.gz (661 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/private/tmp/repro/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/private/tmp/repro/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/private/tmp/repro/.venv/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/private/var/folders/vf/5h145x1d77b5vm70kq4xcnj80000gn/T/pip-build-env-e7h9dqmw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/private/var/folders/vf/5h145x1d77b5vm70kq4xcnj80000gn/T/pip-build-env-e7h9dqmw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "/private/var/folders/vf/5h145x1d77b5vm70kq4xcnj80000gn/T/pip-build-env-e7h9dqmw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 483, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/private/var/folders/vf/5h145x1d77b5vm70kq4xcnj80000gn/T/pip-build-env-e7h9dqmw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 14, in <module>
      ImportError: pip is not installed.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Release plan 0.1.0

Features:

  • 3 tokenizers (delim, word, qgram)
  • 13 similarity functions (levenstein, hamming, jaro, jaro_winkler, needleman_wunsch, smith_waterman, affine, jaccard, overlap, cosine, monge_elkan, tfidf, soft_tfidf

Owners:

Incompatible with setuptools >= 50.0.0

This cannot be installed from pypi if you have the latest version of setuptools installed.

Errors with:

distutils.errors.DistutilsClassError: command class <class 'setuptools.command.egg_info.egg_info'> must subclass Command

Deploy wheels to pypi.org

Would it be possible to deploy wheels to pypi.org for various platforms including Linux and Windows?

py_stringmatching's jaro_winkler is slower than pure-Python jellyfish's jaro_winkler

Hi, I suppose there's something wrong with cython_jaro_winkler. Looks like it's much slower than a pure-Python implementation, like the one in jellyfish project.

On my machine, macOS Mojave 10.14.2 (18C54):

In [1]: import timeit                                                                                                                                                            

In [2]: timeit.Timer('jw.get_raw_score(\'DIXON\', \'DICKSONX\')', setup='from py_stringmatching.similarity_measure.jaro_winkler import JaroWinkler; jw = JaroWinkler()').timeit(number=10000)
Out[2]: 0.1117161939619109

In [3]: timeit.Timer('jellyfish.jaro_winkler(\'DIXON\', \'DICKSONX\')', setup='import jellyfish').timeit(number=10000) 
Out[3]: 0.004220786038786173

Also tested on a Ubuntu 16.04.4 server. Similar results.

Versions:
Python 3.6.7
jellyfish==0.6.1
py-stringmatching==0.4.0

Setup requires numpy version incompatible with Python 3.6

Building the wheel for this package is broken for Python 3.6 because the setup_requires specifies numpy as 'numpy >= 1.7.0' . This pulls in numpy 1.21.x which is not supported with Python 3.6. Seems numpy is not resolving this constraint at install time and instead throws a runtime error.

Is it possible to specify a maximum numpy version of 1.20.x as a setup requirements for Python 3.6?

Install broken for Python 3.9

On macOS 11.2.3, with python 3.9.4 and pip 21.0.1, the native code building part is failing because of the use of some deprecated CPython functions by Cython. Upgrading Cython to 0.29.23 and rebuilding the project should do it. Let me know if you'd like additional information, like error logs.

Issues when installing: Cannot open include file: 'basetsd.h': No such file or directory

Hi, I am having trouble trying to install this library. I have tried to install via pip and locally by installing setup.py. I am using Windows 10 and Python 3.8. Pip, wheel and setuptools are all installed and updated. I also have Visual Studio Build tools 2019 and the Windows SDK. Not sure what else to try. This is the entire error log:

pip install py_stringmatching
Collecting py_stringmatching
  Using cached py_stringmatching-0.4.2.tar.gz (661 kB)
Requirement already satisfied: numpy>=1.7.0 in c:\users\user\appdata\local\programs\python\python38\lib\site-packages (from py_stringmatching) (1.19.1)
Requirement already satisfied: six in c:\users\user\appdata\roaming\python\python38\site-packages (from py_stringmatching) (1.15.0)    
Building wheels for collected packages: py-stringmatching
  Building wheel for py-stringmatching (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'c:\users\user\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"'; __file__='"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\user\AppData\Local\Temp\pip-wheel-aw0f9k02'
       cwd: C:\Users\user\AppData\Local\Temp\pip-install-t6i7q05j\py-stringmatching\
  Complete output (81 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.8
  creating build\lib.win-amd64-3.8\py_stringmatching
  copying py_stringmatching\utils.py -> build\lib.win-amd64-3.8\py_stringmatching
  copying py_stringmatching\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching
  creating build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\affine.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\bag_distance.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\cosine.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\dice.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\editex.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\generalized_jaccard.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure      
  copying py_stringmatching\similarity_measure\hamming_distance.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\hybrid_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure  copying py_stringmatching\similarity_measure\jaccard.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\jaro.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\jaro_winkler.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\levenshtein.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\monge_elkan.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\needleman_wunsch.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\overlap_coefficient.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure      
  copying py_stringmatching\similarity_measure\partial_ratio.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\partial_token_sort.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure       
  copying py_stringmatching\similarity_measure\phonetic_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\ratio.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\sequence_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure       
  copying py_stringmatching\similarity_measure\smith_waterman.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\soft_tfidf.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\soundex.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\tfidf.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\token_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure 
  copying py_stringmatching\similarity_measure\token_sort.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\tversky_index.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  copying py_stringmatching\similarity_measure\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
  creating build\lib.win-amd64-3.8\py_stringmatching\tests
  copying py_stringmatching\tests\test_simfunctions.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
  copying py_stringmatching\tests\test_sim_Soundex.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
  copying py_stringmatching\tests\test_tokenizers.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
  copying py_stringmatching\tests\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
  creating build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\alphabetic_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\alphanumeric_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\definition_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\delimiter_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\qgram_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\whitespace_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  copying py_stringmatching\tokenizer\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
  creating build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython   
  running egg_info
  writing py_stringmatching.egg-info\PKG-INFO
  writing dependency_links to py_stringmatching.egg-info\dependency_links.txt
  writing requirements to py_stringmatching.egg-info\requires.txt
  writing top-level names to py_stringmatching.egg-info\top_level.txt
  reading manifest file 'py_stringmatching.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching 'requirements.txt'
  writing manifest file 'py_stringmatching.egg-info\SOURCES.txt'
  copying py_stringmatching\similarity_measure\cython\cython_affine.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\cython_jaro.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython 
  copying py_stringmatching\similarity_measure\cython\cython_jaro_winkler.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\cython_levenshtein.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\cython_needleman_wunsch.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\cython_smith_waterman.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
  copying py_stringmatching\similarity_measure\cython\cython_utils.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython  running build_ext
  building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
  creating build\temp.win-amd64-3.8
  creating build\temp.win-amd64-3.8\Release
  creating build\temp.win-amd64-3.8\Release\py_stringmatching
  creating build\temp.win-amd64-3.8\Release\py_stringmatching\similarity_measure
  creating build\temp.win-amd64-3.8\Release\py_stringmatching\similarity_measure\cython
  C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\user\appdata\local\programs\python\python38\lib\site-packages\numpy\core\include -Ic:\users\user\appdata\local\programs\python\python38\include -Ic:\users\user\appdata\local\programs\python\python38\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29333\include" "-IC:\Program Files (x86)\Windows Kits\10\Include\10.0.18362.0\ucrt" /Tcpy_stringmatching/similarity_measure/cython/cython_levenshtein.c /Fobuild\temp.win-amd64-3.8\Release\py_stringmatching/similarity_measure/cython/cython_levenshtein.obj
  cython_levenshtein.c
  c:\users\user\appdata\local\programs\python\python38\include\pyconfig.h(206): fatal error C1083: Cannot open include file: 'basetsd.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29333\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
  ----------------------------------------
  ERROR: Failed building wheel for py-stringmatching
  Running setup.py clean for py-stringmatching
Failed to build py-stringmatching
Installing collected packages: py-stringmatching
    Running setup.py install for py-stringmatching ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\user\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = 
'"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"'; __file__='"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\user\AppData\Local\Temp\pip-record-v7wir2a3\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\user\appdata\local\programs\python\python38\Include\py-stringmatching'
         cwd: C:\Users\user\AppData\Local\Temp\pip-install-t6i7q05j\py-stringmatching\
    Complete output (81 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.8
    creating build\lib.win-amd64-3.8\py_stringmatching
    copying py_stringmatching\utils.py -> build\lib.win-amd64-3.8\py_stringmatching
    copying py_stringmatching\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching
    creating build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\affine.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\bag_distance.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\cosine.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\dice.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\editex.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\generalized_jaccard.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure    
    copying py_stringmatching\similarity_measure\hamming_distance.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure       
    copying py_stringmatching\similarity_measure\hybrid_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\jaccard.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\jaro.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\jaro_winkler.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\levenshtein.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\monge_elkan.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\needleman_wunsch.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure       
    copying py_stringmatching\similarity_measure\overlap_coefficient.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure    
    copying py_stringmatching\similarity_measure\partial_ratio.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\partial_token_sort.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure     
    copying py_stringmatching\similarity_measure\phonetic_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\ratio.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\sequence_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure     
    copying py_stringmatching\similarity_measure\smith_waterman.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\soft_tfidf.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\soundex.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\tfidf.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\token_similarity_measure.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\token_sort.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\tversky_index.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    copying py_stringmatching\similarity_measure\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure
    creating build\lib.win-amd64-3.8\py_stringmatching\tests
    copying py_stringmatching\tests\test_simfunctions.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
    copying py_stringmatching\tests\test_sim_Soundex.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
    copying py_stringmatching\tests\test_tokenizers.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
    copying py_stringmatching\tests\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\tests
    creating build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\alphabetic_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\alphanumeric_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\definition_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\delimiter_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\qgram_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\whitespace_tokenizer.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    copying py_stringmatching\tokenizer\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\tokenizer
    creating build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\__init__.py -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython 
    running egg_info
    writing py_stringmatching.egg-info\PKG-INFO
    writing dependency_links to py_stringmatching.egg-info\dependency_links.txt
    writing requirements to py_stringmatching.egg-info\requires.txt
    writing top-level names to py_stringmatching.egg-info\top_level.txt
    reading manifest file 'py_stringmatching.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching 'requirements.txt'
    writing manifest file 'py_stringmatching.egg-info\SOURCES.txt'
    copying py_stringmatching\similarity_measure\cython\cython_affine.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_jaro.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_jaro_winkler.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_levenshtein.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_needleman_wunsch.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_smith_waterman.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    copying py_stringmatching\similarity_measure\cython\cython_utils.c -> build\lib.win-amd64-3.8\py_stringmatching\similarity_measure\cython
    running build_ext
    building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
    creating build\temp.win-amd64-3.8
    creating build\temp.win-amd64-3.8\Release
    creating build\temp.win-amd64-3.8\Release\py_stringmatching
    creating build\temp.win-amd64-3.8\Release\py_stringmatching\similarity_measure
    creating build\temp.win-amd64-3.8\Release\py_stringmatching\similarity_measure\cython
    C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL 
/DNDEBUG /MD -Ic:\users\user\appdata\local\programs\python\python38\lib\site-packages\numpy\core\include -Ic:\users\user\appdata\local\programs\python\python38\include -Ic:\users\user\appdata\local\programs\python\python38\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29333\include" "-IC:\Program Files (x86)\Windows Kits\10\Include\10.0.18362.0\ucrt" /Tcpy_stringmatching/similarity_measure/cython/cython_levenshtein.c /Fobuild\temp.win-amd64-3.8\Release\py_stringmatching/similarity_measure/cython/cython_levenshtein.obj
    cython_levenshtein.c
    c:\users\user\appdata\local\programs\python\python38\include\pyconfig.h(206): fatal error C1083: Cannot open include file: 'basetsd.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29333\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\user\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"'; __file__='"'"'C:\\Users\\user\\AppData\\Local\\Temp\\pip-install-t6i7q05j\\py-stringmatching\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\user\AppData\Local\Temp\pip-record-v7wir2a3\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\user\appdata\local\programs\python\python38\Include\py-stringmatching' Check the logs for full command output. 

Problem with Py_stringmatching GeneralizedJaccard

I'm using GeneralizedJaccard from Py_stringmatching package to measure the similarity between two strings. According to this document:

... If the similarity of a token pair exceeds the threshold, then the token pair is considered a match ...

For example for word pair 'method' and 'methods' we have:

print(sm.Levenshtein().get_sim_score('method','methods'))
>>0.8571428571428572

The similarity between example word pair is 0.85 and greater than 0.80 ,So this pair must considered a match and I expect that the final GeneralizedJaccard output for two near-duplicate sentences to be equal to 1 but it's 0.97:

import py_stringmatching as sm

str1='All tokenizers have a tokenize method'
str2='All tokenizers have a tokenize methods'

alphabet_tok_set = sm.AlphabeticTokenizer(return_set=True)

gj = sm.GeneralizedJaccard(sim_func=sm.Levenshtein().get_sim_score, threshold=0.8)
print(gj.get_raw_score(alphabet_tok_set.tokenize(str1),alphabet_tok_set.tokenize(str2)))

>>0.9761904761904763

So what is the problem?!

Cannot works with numpy >= 1.24

In [1]: import py_stringmatching as sm
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 1
----> 1 import py_stringmatching as sm

File ~/.local/share/conda/envs/blocking/lib/python3.10/site-packages/py_stringmatching/__init__.py:21
     19 from py_stringmatching.similarity_measure.jaro import Jaro
     20 from py_stringmatching.similarity_measure.jaro_winkler import JaroWinkler
---> 21 from py_stringmatching.similarity_measure.levenshtein import Levenshtein
     22 from py_stringmatching.similarity_measure.monge_elkan import MongeElkan
     23 from py_stringmatching.similarity_measure.needleman_wunsch import NeedlemanWunsch

File ~/.local/share/conda/envs/blocking/lib/python3.10/site-packages/py_stringmatching/similarity_measure/levenshtein.py:4
      1 from __future__ import division
      3 from py_stringmatching import utils
----> 4 from py_stringmatching.similarity_measure.cython.cython_levenshtein import levenshtein
      5 from py_stringmatching.similarity_measure.sequence_similarity_measure import \
      6     SequenceSimilarityMeasure
      9 class Levenshtein(SequenceSimilarityMeasure):

File py_stringmatching/similarity_measure/cython/cython_levenshtein.pyx:11, in init py_stringmatching.similarity_measure.cython.cython_levenshtein()

File ~/.local/share/conda/envs/blocking/lib/python3.10/site-packages/numpy/__init__.py:284, in __getattr__(attr)
    281     from .testing import Tester
    282     return Tester
--> 284 raise AttributeError("module {!r} has no attribute "
    285                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'int'

related blog about this.

Contribute with tokenizers

Hi, I'm looking your amazing project and see that you don't have some Deep Learning Tokenizers available. I really want to contribute with it. I tried to started a discussion on google groups, but I can't access: https://groups.google.com/forum/#!forum/py_stringmatching

Do you already have some requirements or decisions about Deep Learning Tokenizers? Can I start to contribute?

Thank you for your attention

release plan 0.3.0 (May 2017)

Release owner: Pradap Konda and Paul Suganthan GC

Release plan:

  • add six measures written by Rishab Kalra (this is the only thing new compared to the current version).

Typo in docs/Contributing.rst

The hyperlinking at this line is broken:

Like many packages, py_stringmatching uses the standard ``unittest testing library <https://docs.python.org/3/library/unittest.html>_.

Properly cite py_stringmatching in a paper

Not actually an issue, just a question. Is there any recommended way to cite py_stringmatching in a paper? Surely a link to the project's homepage might do, but if there any official publication for the library, I,d be happy to know.

Failure to install with Python 3.7

I am able to install with Python 3.6, but I get an error message when I try to install with Python 3.7.0. I am using Ubuntu 18.04.1 LTS.

pip install py_stringmatching
Collecting py_stringmatching
  Using cached https://files.pythonhosted.org/packages/e8/11/a7d8568eaac88e167fedd857640fe04e8950511e5fbe0700a42e12900a48/py_stringmatching-0.4.0.tar.gz
Requirement already satisfied: numpy>=1.7.0 in /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages (from py_stringmatching) (1.15.3)
Requirement already satisfied: six in /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages (from py_stringmatching) (1.11.0)
Building wheels for collected packages: py-stringmatching
  Running setup.py bdist_wheel for py-stringmatching: started
  Running setup.py bdist_wheel for py-stringmatching: finished with status 'error'
  Complete output from command /home/suzil/anaconda3/envs/py3.7/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-r8vqgfyt/py-stringmatching/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-yg45dyr0 --python-tag cp37:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/py_stringmatching
  copying py_stringmatching/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching
  copying py_stringmatching/utils.py -> build/lib.linux-x86_64-3.7/py_stringmatching
  creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/jaro.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/bag_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/monge_elkan.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/partial_ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/token_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/partial_token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/smith_waterman.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/levenshtein.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/tversky_index.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/dice.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/generalized_jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/needleman_wunsch.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/overlap_coefficient.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/hybrid_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/editex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/hamming_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/phonetic_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/soft_tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/sequence_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/affine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/cosine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  copying py_stringmatching/similarity_measure/jaro_winkler.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
  creating build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/whitespace_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/delimiter_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/alphanumeric_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/alphabetic_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/qgram_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  copying py_stringmatching/tokenizer/definition_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
  creating build/lib.linux-x86_64-3.7/py_stringmatching/tests
  copying py_stringmatching/tests/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
  copying py_stringmatching/tests/test_sim_Soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
  copying py_stringmatching/tests/test_simfunctions.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
  copying py_stringmatching/tests/test_tokenizers.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
  creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  running egg_info
  writing py_stringmatching.egg-info/PKG-INFO
  writing dependency_links to py_stringmatching.egg-info/dependency_links.txt
  writing requirements to py_stringmatching.egg-info/requires.txt
  writing top-level names to py_stringmatching.egg-info/top_level.txt
  reading manifest file 'py_stringmatching.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'py_stringmatching.egg-info/SOURCES.txt'
  copying py_stringmatching/similarity_measure/cython/cython_affine.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_jaro.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_jaro_winkler.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_levenshtein.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_needleman_wunsch.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_smith_waterman.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  copying py_stringmatching/similarity_measure/cython/cython_utils.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  running build_ext
  building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/py_stringmatching
  creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure
  creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -m64 -fPIC -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -m64 -fPIC -fPIC -I/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include -I/home/suzil/anaconda3/envs/py3.7/include/python3.7m -c py_stringmatching/similarity_measure/cython/cython_levenshtein.c -o build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython/cython_levenshtein.o
  In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
                   from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                   from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                   from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
  /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   #warning "Using deprecated NumPy API, disable it by " \
    ^~~~~~~
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSave’:
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18818:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       *type = tstate->exc_type;
                       ^~~~~~~~
                       curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18819:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       *value = tstate->exc_value;
                        ^~~~~~~~~
                        curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18820:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       *tb = tstate->exc_traceback;
                     ^~~~~~~~~~~~~
                     curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionReset’:
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18832:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tmp_type = tstate->exc_type;
                          ^~~~~~~~
                          curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18833:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tmp_value = tstate->exc_value;
                           ^~~~~~~~~
                           curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18834:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tmp_tb = tstate->exc_traceback;
                        ^~~~~~~~~~~~~
                        curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18835:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tstate->exc_type = type;
               ^~~~~~~~
               curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18836:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tstate->exc_value = value;
               ^~~~~~~~~
               curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18837:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tstate->exc_traceback = tb;
               ^~~~~~~~~~~~~
               curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_GetException’:
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18880:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tmp_type = tstate->exc_type;
                          ^~~~~~~~
                          curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18881:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tmp_value = tstate->exc_value;
                           ^~~~~~~~~
                           curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18882:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tmp_tb = tstate->exc_traceback;
                        ^~~~~~~~~~~~~
                        curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18883:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tstate->exc_type = local_type;
               ^~~~~~~~
               curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18884:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tstate->exc_value = local_value;
               ^~~~~~~~~
               curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18885:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tstate->exc_traceback = local_tb;
               ^~~~~~~~~~~~~
               curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSwap’:
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18907:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tmp_type = tstate->exc_type;
                          ^~~~~~~~
                          curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18908:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tmp_value = tstate->exc_value;
                           ^~~~~~~~~
                           curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18909:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tmp_tb = tstate->exc_traceback;
                        ^~~~~~~~~~~~~
                        curexc_traceback
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18910:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
       tstate->exc_type = *type;
               ^~~~~~~~
               curexc_type
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18911:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
       tstate->exc_value = *value;
               ^~~~~~~~~
               curexc_value
  py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18912:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tstate->exc_traceback = *tb;
               ^~~~~~~~~~~~~
               curexc_traceback
  In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
                   from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                   from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
  At top level:
  /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
   _import_array(void)
   ^~~~~~~~~~~~~
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Running setup.py clean for py-stringmatching
Failed to build py-stringmatching
Installing collected packages: py-stringmatching
  Running setup.py install for py-stringmatching: started
    Running setup.py install for py-stringmatching: finished with status 'error'
    Complete output from command /home/suzil/anaconda3/envs/py3.7/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-r8vqgfyt/py-stringmatching/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-d_vgnbcd/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/py_stringmatching
    copying py_stringmatching/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching
    copying py_stringmatching/utils.py -> build/lib.linux-x86_64-3.7/py_stringmatching
    creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/jaro.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/bag_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/monge_elkan.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/partial_ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/token_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/partial_token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/smith_waterman.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/levenshtein.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/tversky_index.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/dice.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/generalized_jaccard.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/needleman_wunsch.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/overlap_coefficient.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/ratio.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/hybrid_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/editex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/token_sort.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/hamming_distance.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/phonetic_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/soft_tfidf.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/sequence_similarity_measure.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/affine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/cosine.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    copying py_stringmatching/similarity_measure/jaro_winkler.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure
    creating build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/whitespace_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/delimiter_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/alphanumeric_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/alphabetic_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/qgram_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    copying py_stringmatching/tokenizer/definition_tokenizer.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tokenizer
    creating build/lib.linux-x86_64-3.7/py_stringmatching/tests
    copying py_stringmatching/tests/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
    copying py_stringmatching/tests/test_sim_Soundex.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
    copying py_stringmatching/tests/test_simfunctions.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
    copying py_stringmatching/tests/test_tokenizers.py -> build/lib.linux-x86_64-3.7/py_stringmatching/tests
    creating build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/__init__.py -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    running egg_info
    writing py_stringmatching.egg-info/PKG-INFO
    writing dependency_links to py_stringmatching.egg-info/dependency_links.txt
    writing requirements to py_stringmatching.egg-info/requires.txt
    writing top-level names to py_stringmatching.egg-info/top_level.txt
    reading manifest file 'py_stringmatching.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'py_stringmatching.egg-info/SOURCES.txt'
    copying py_stringmatching/similarity_measure/cython/cython_affine.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_jaro.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_jaro_winkler.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_levenshtein.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_needleman_wunsch.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_smith_waterman.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    copying py_stringmatching/similarity_measure/cython/cython_utils.c -> build/lib.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    running build_ext
    building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/py_stringmatching
    creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure
    creating build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -m64 -fPIC -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -m64 -fPIC -fPIC -I/home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include -I/home/suzil/anaconda3/envs/py3.7/include/python3.7m -c py_stringmatching/similarity_measure/cython/cython_levenshtein.c -o build/temp.linux-x86_64-3.7/py_stringmatching/similarity_measure/cython/cython_levenshtein.o
    In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1821:0,
                     from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                     from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                     from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
    /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
     #warning "Using deprecated NumPy API, disable it by " \
      ^~~~~~~
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSave’:
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18818:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         *type = tstate->exc_type;
                         ^~~~~~~~
                         curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18819:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         *value = tstate->exc_value;
                          ^~~~~~~~~
                          curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18820:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         *tb = tstate->exc_traceback;
                       ^~~~~~~~~~~~~
                       curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionReset’:
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18832:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tmp_type = tstate->exc_type;
                            ^~~~~~~~
                            curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18833:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tmp_value = tstate->exc_value;
                             ^~~~~~~~~
                             curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18834:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tmp_tb = tstate->exc_traceback;
                          ^~~~~~~~~~~~~
                          curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18835:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tstate->exc_type = type;
                 ^~~~~~~~
                 curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18836:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tstate->exc_value = value;
                 ^~~~~~~~~
                 curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18837:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tstate->exc_traceback = tb;
                 ^~~~~~~~~~~~~
                 curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_GetException’:
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18880:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tmp_type = tstate->exc_type;
                            ^~~~~~~~
                            curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18881:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tmp_value = tstate->exc_value;
                             ^~~~~~~~~
                             curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18882:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tmp_tb = tstate->exc_traceback;
                          ^~~~~~~~~~~~~
                          curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18883:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tstate->exc_type = local_type;
                 ^~~~~~~~
                 curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18884:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tstate->exc_value = local_value;
                 ^~~~~~~~~
                 curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18885:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tstate->exc_traceback = local_tb;
                 ^~~~~~~~~~~~~
                 curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c: In function ‘__Pyx_ExceptionSwap’:
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18907:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tmp_type = tstate->exc_type;
                            ^~~~~~~~
                            curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18908:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tmp_value = tstate->exc_value;
                             ^~~~~~~~~
                             curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18909:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tmp_tb = tstate->exc_traceback;
                          ^~~~~~~~~~~~~
                          curexc_traceback
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18910:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
         tstate->exc_type = *type;
                 ^~~~~~~~
                 curexc_type
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18911:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
         tstate->exc_value = *value;
                 ^~~~~~~~~
                 curexc_value
    py_stringmatching/similarity_measure/cython/cython_levenshtein.c:18912:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
         tstate->exc_traceback = *tb;
                 ^~~~~~~~~~~~~
                 curexc_traceback
    In file included from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
                     from /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                     from py_stringmatching/similarity_measure/cython/cython_levenshtein.c:242:
    At top level:
    /home/suzil/anaconda3/envs/py3.7/lib/python3.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
     _import_array(void)
     ^~~~~~~~~~~~~
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------

I get the same error when I try pip install py_stringmatching==0.4.0 and pip install py_stringmatching==0.3.0.

SoftTFIDF get_raw_score failing with float division by zero

I'm getting an exception while calling the get_raw_score function with the SoftTFIDF similarity measure. It only happens with a specific corpus, which I'm unfortunately unable to share, so the code snipped isnt' fully reproducible.

import py_stringmatching as sm
print(sm.__version__)
soft_tfidf =sm.SoftTfIdf(corpus, threshold=0.9)
soft_tfidf.get_raw_score(['AWN', 'AL'], ['ONEP'])
0.4.1
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-100-fcbb2f491b64> in <module>
      2 print(sm.__version__)
      3 soft_tfidf =sm.SoftTfIdf(corpus, threshold=0.9)
----> 4 soft_tfidf.get_raw_score(['AWN', 'AL'], ['ONEP'])

C:\ProgramData\Anaconda3\lib\site-packages\py_stringmatching\similarity_measure\soft_tfidf.py in get_raw_score(self, bag1, bag2)
    134             v_y = idf * tf_y.get(element, 0)
    135             v_y_2 += v_y * v_y
--> 136         return result if v_x_2 == 0 else result / (sqrt(v_x_2) * sqrt(v_y_2))
    137 
    138     def get_corpus_list(self):

ZeroDivisionError: float division by zero

I added a print right before line 136. The root cause is that v_y_2 is equal to zero.

Error if the string is not encoded in utf-8

The current code doesn't handle non unicode format strings. If the string is not encoded in utf-8 then it throws the UnicodeDecodeError.

Error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 8: invalid continuation byte

Dev Analysis:
String : "sharkey_„Žs cafe"
I used convert_to_unicode in utils.py to convert non unicode strings to unicode but still it was not able to parse the non unicode characters. At this moment we are ignoring the non unicode strings, but they need to be handled.

release plan 0.4.0 (July 2017)

  • Rewritten five similarity measures in Cython: Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
  • Added benchmark scripts to measure the performance of similarity measures.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.