GithubHelp home page GithubHelp logo

nppoly / cyac Goto Github PK

View Code? Open in Web Editor NEW
91.0 5.0 15.0 1.77 MB

High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python

License: MIT License

Python 22.87% Shell 0.13% Cython 76.72% C 0.28%
nlp data-extraction automata trie search search-in-text keyword-extraction double-array-trie cython

cyac's People

Contributors

chenkovsky avatar decaz avatar imaurer avatar nppoly avatar oblackato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cyac's Issues

error: conflicting declaration of ‘int _PyUnicode_ToLowerFull(Py_UCS4, Py_UCS4*)’ with ‘C++’ linkage

I got an error while doing "pip install cyac":
building 'cyac.xstring' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -Ilib/cyac -I/usr/include/python3.9 -c lib/cyac/xstring.cpp -o build/temp.linux-x86_64-cpython-39/lib/cyac/xstring.o
lib/cyac/xstring.cpp:2043:31: error: conflicting declaration of ‘int _PyUnicode_ToLowerFull(Py_UCS4, Py_UCS4*)’ with ‘C++’ linkage
2043 | __PYX_EXTERN_C DL_IMPORT(int) _PyUnicode_ToLowerFull(Py_UCS4, Py_UCS4 *); /proto/
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.9/unicodeobject.h:1026,
from /usr/include/python3.9/Python.h:97,
from lib/cyac/xstring.cpp:35:
/usr/include/python3.9/cpython/unicodeobject.h:1103:17: note: previous declaration with ‘C’ linkage
1103 | PyAPI_FUNC(int) _PyUnicode_ToLowerFull(
| ^~~~~~~~~~~~~~~~~~~~~~
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1

ERROR: Failed building wheel for cyac
Failed to build cyac

my cython version is 3.0.0
python version is 3.9
gcc version is 10.2.1
system: debian:bullseye-slim

Thanks!

Thread-safe

Is this library thread safe after calling the build function?
If the trie is built and several threads access it using the match function, would this be thread-safe?

Build failure

There is error during build of the latest 1.8 version when package downloaded from PyPI:

      lib/cyac/xstring.cpp:1113:10: fatal error: 'unicode_portability.c' file not found
      #include "unicode_portability.c"
               ^~~~~~~~~~~~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1

Invalid buf size raises randomly

I am trying to use several AC, each one of them shared between several processes.
I cannot share the code, thus I will provide a pseudo code.

def target_function(ac_name):
      with open(ac_name, "r+b") as bf:
          buff_object = mmap.mmap(bf.fileno(), 0)
     .    automaton = AC.from_buff(buff_object, copy=False)
          ....

processes_per_AC = 3
total_ac = 0
for x in range(0, total_Ac):
    x_patterns = [<some words here, different for every AC in the iteration>]
    ac = AC.build(x_patterns)
    ac.save("ac_{}".format(x))
    for x in range(0, processes_per_AC):
         p = Process(target_function, args=("ac_{}".format(x)))
         p.start()
         ....

Basically what I am doing is create several AC, with different words in the main process, and then launching several child processes that share the AC created. In this case, 3 processes per AC.
The AC I am building contain the following type of information:

The exception:

...
 File "lib/cyac/ac.pyx", line 413, in cyac.ac.AC.from_buff
 File "lib/cyac/ac.pyx", line 465, in cyac.ac.ac_from_buff
Exception: invalid data, buf size is not correct

Occurs when I am trying to load from_buff in the target_function. It occurs randomly, and I have not been able to understand why. It does not matter if I have more or less words on the AC. The type of words (email, ip, domains, etc), does not seem to make a difference. I wish I could be more precise, but this is everything I can observe from the exception.

I can provide you the file I am using, it is just random generated emails, domains, ip, etc in json format.

Segmentation fault when iterating over AC trie

Issue

When iterating over an AC trie, a segmentation fault occurs. For example:

import cyac
trie = cyac.AC.build(['hello', 'world'])
for w in trie:
    print(w)

Output:

hello
world










Segmentation fault

However, AC.items() is works as expected.

Environment

  • Ubuntu 18.04
  • Python 3.6.9
  • Cython 0.29.21
  • cyac 1.4

Trie.get fails in case sensitive Trie after buffer load of saved data

Summary:
Previously saved case sensitive data loads without error into trie. However trie.get(val) fails get item, even though iterating through trie.items() show correct item.

Env: Archlinux using python 3.9, Ubuntu 20.0.4 using python 3.8.5
Cyac Version: 1.2
Cython Version:

Steps to reproduce:

  1. Build a case insensitive trie
  2. Lookup data key (success)
  3. Save trie data out to file
  4. Load trie data from file
  5. Lookup data key (fails)

The code below will should be run twice:

  1. Run 1 - generates a two simple tries (one case sensitive, one insensitive) and saves them to two file
  2. Run 2- creates two tries and then executes the same lookups.

Script below:

from cyac import Trie
import os
import mmap

p=["M3 1AR", "M3 2AR", "M3 3AR"]

if os.path.isfile('./pinsens.bin'):
    # Load into new tries
    print("* * * Loaded from buffer * * * ")
    with open('pinsens.bin', 'r+b') as fins:
        bins = mmap.mmap(fins.fileno(), 0)
        cins = Trie.from_buff(bins)
        bins.flush()
    with open('psens.bin', 'r+b') as fsens:
        bsens = mmap.mmap(fsens.fileno(), 0)
        csen = Trie.from_buff(bsens)
        bsens.flush()
else:
    print("* * * Clean data")
    cins = Trie(ignore_case=True)
    csen = Trie()
    for x in p:
        cins.insert(x)
        csen.insert(x)
    cins.save('pinsens.bin')
    csen.save('psens.bin')

print("Case Insensitive")
print("{} is at {}".format(p[2], cins.get(p[2])))  # correctly returns 2
print("Case Sensitive")
print("{} is at {}".format(p[2], csen.get(p[2])))  # correctly returns 2 in first run, fails in second

for id in csen.items():
    print(id)

Cannot install it from pip in windows!

pip install cyac
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting cyac
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/db/2e/4a4916514b64694dd478ab085f01fb4ca599e3127c05704f98f3067e8fc7/cyac-1.3.tar.gz (38 kB)  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>     
        File "<pip-setuptools-caller>", line 34, 
in <module>
        File "C:\Users\admin\AppData\Local\Temp\pip-install-h1651_cj\cyac_4973446135a546ce84624d967d791583\setup.py", line 10, in <module>
          long_description = open("README.md").read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0x8c in position 3452: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, 
and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

python version: Python 3.10.1
pip version: pip 22.0.4

error when installing from pip

I'm getting the following error when trying to install cyac

% pip --version
pip 23.2.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)

Don't even know if it's cyacs fault, but maybe you can help me?

% pip install cyac
Collecting cyac
Using cached cyac-1.9.tar.gz (47 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: cython>=0.29.0 in /usr/local/lib/python3.9/site-packages (from cyac) (3.0.2)
Building wheels for collected packages: cyac
Building wheel for cyac (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for cyac (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [107 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-11-x86_64-cpython-39
creating build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/version.py -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/init.py -> build/lib.macosx-11-x86_64-cpython-39/cyac
running egg_info
writing cyac.egg-info/PKG-INFO
writing dependency_links to cyac.egg-info/dependency_links.txt
writing requirements to cyac.egg-info/requires.txt
writing top-level names to cyac.egg-info/top_level.txt
reading manifest file 'cyac.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching 'pycache' found anywhere in distribution
adding license file 'LICENSE'
writing manifest file 'cyac.egg-info/SOURCES.txt'
copying lib/cyac/ac.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/ac.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/trie.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/trie.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/unicode_portability.c -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/utf8.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/utf8.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.c -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/xstring.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/xstring.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
running build_ext
building 'cyac.util' extension
creating build/temp.macosx-11-x86_64-cpython-39
creating build/temp.macosx-11-x86_64-cpython-39/lib
creating build/temp.macosx-11-x86_64-cpython-39/lib/cyac
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/usr/local/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c lib/cyac/util.c -o build/temp.macosx-11-x86_64-cpython-39/lib/cyac/util.o
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 434, in build_wheel
return self._build_with_temp_dir(
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
exec(code, locals())
File "", line 21, in
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/init.py", line 103, in setup
return distutils.core.setup(**attrs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 364, in run
self.run_command("build")
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 88, in run
_build_ext.run(self)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
_build_ext.build_extension(self, ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/Cython/Distutils/build_ext.py", line 127, in build_extension
super(build_ext, self).build_extension(ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 600, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 185, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 1041, in spawn
spawn(cmd, dry_run=self.dry_run, **kwargs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/spawn.py", line 57, in spawn
proc = subprocess.Popen(cmd, env=env)
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 947, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1739, in _execute_child
env_list.append(k + b'=' + os.fsencode(v))
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py", line 810, in fsencode
filename = fspath(filename) # Does type-checking of filename.
TypeError: expected str, bytes or os.PathLike object, not int
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cyac
Failed to build cyac
ERROR: Could not build wheels for cyac, which is required to install pyproject.toml-based projects

Error import cyac in shared_buffered branch

Hello,

I would like to run some memory analysis and tests using the new shared_buffer branch. I would post those memory analysis in an issue, so that you could put them in the README.md if you like.

How I am facing the following problem:

>>> import cyac
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/cyac-1.0-py3.7-linux-x86_64.egg/cyac/__init__.py", line 2, in <module>
    from .ac import AC
  File "lib/cyac/ac.pyx", line 1, in init cyac.ac
    #cython: language_level=3, boundscheck=False, overflowcheck=False
  File "lib/cyac/trie.pyx", line 1, in init cyac.trie
    #cython: language_level=3, boundscheck=False, overflowcheck=False
ModuleNotFoundError: No module named 'cyac.util'

What I did to install it:

  • I cloned the cyac rep
  • git checkout shared_buffer
  • python3.7 setup.py build
  • python3.7 setup.py install
  • and then imported cyac and the error occurred.

Tried with python3.7 and 3.6.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.