GithubHelp home page GithubHelp logo

knut0815 / dawg-python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pytries/dawg-python

0.0 0.0 0.0 4.31 MB

Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension.

Home Page: http://pypi.python.org/pypi/DAWG-Python/

License: MIT License

Python 100.00%

dawg-python's Introduction

DAWG-Python

https://travis-ci.org/kmike/DAWG-Python.png?branch=master https://coveralls.io/repos/kmike/DAWG-Python/badge.png?branch=master

This pure-python package provides read-only access for files created by dawgdic C++ library and DAWG python package.

This package is not capable of creating DAWGs. It works with DAWGs built by dawgdic C++ library or DAWG Python extension module. The main purpose of DAWG-Python is to provide an access to DAWGs without requiring compiled extensions. It is also quite fast under PyPy (see benchmarks).

Installation

pip install DAWG-Python

Usage

The aim of DAWG-Python is to be API- and binary-compatible with DAWG when it is possible.

First, you have to create a dawg using DAWG module:

import dawg
d = dawg.DAWG(data)
d.save('words.dawg')

And then this dawg can be loaded without requiring C extensions:

import dawg_python
d = dawg_python.DAWG().load('words.dawg')

Please consult DAWG docs for detailed usage. Some features (like constructor parameters or save method) are intentionally unsupported.

Benchmarks

Benchmark results (100k unicode words, integer values (lenghts of the words), PyPy 1.9, macbook air i5 1.8 Ghz):

dict __getitem__ (hits):        11.090M ops/sec
DAWG __getitem__ (hits):        not supported
BytesDAWG __getitem__ (hits):   0.493M ops/sec
RecordDAWG __getitem__ (hits):  0.376M ops/sec

dict get() (hits):              10.127M ops/sec
DAWG get() (hits):              not supported
BytesDAWG get() (hits):         0.481M ops/sec
RecordDAWG get() (hits):        0.402M ops/sec
dict get() (misses):            14.885M ops/sec
DAWG get() (misses):            not supported
BytesDAWG get() (misses):       1.259M ops/sec
RecordDAWG get() (misses):      1.337M ops/sec

dict __contains__ (hits):           11.100M ops/sec
DAWG __contains__ (hits):           1.317M ops/sec
BytesDAWG __contains__ (hits):      1.107M ops/sec
RecordDAWG __contains__ (hits):     1.095M ops/sec

dict __contains__ (misses):         10.567M ops/sec
DAWG __contains__ (misses):         1.902M ops/sec
BytesDAWG __contains__ (misses):    1.873M ops/sec
RecordDAWG __contains__ (misses):   1.862M ops/sec

dict items():           44.401 ops/sec
DAWG items():           not supported
BytesDAWG items():      3.226 ops/sec
RecordDAWG items():     2.987 ops/sec
dict keys():            426.250 ops/sec
DAWG keys():            not supported
BytesDAWG keys():       6.050 ops/sec
RecordDAWG keys():      6.363 ops/sec

DAWG.prefixes (hits):    0.756M ops/sec
DAWG.prefixes (mixed):   1.965M ops/sec
DAWG.prefixes (misses):  1.773M ops/sec

RecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.429K ops/sec
RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      36.994K ops/sec
RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    121.897K ops/sec
RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
RecordDAWG.keys(prefix="xxx"), NON_EXISTING:            2450.898K ops/sec

Under CPython expect it to be about 50x slower. Memory consumption of DAWG-Python should be the same as of DAWG.

Current limitations

  • This package is not capable of creating DAWGs;
  • all the limitations of DAWG apply.

Contributions are welcome!

Contributing

Development happens at github: https://github.com/kmike/DAWG-Python Issue tracker: https://github.com/kmike/DAWG-Python/issues

Feel free to submit ideas, bugs or pull requests.

Running tests and benchmarks

Make sure tox is installed and run

$ tox

from the source checkout. Tests should pass under python 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy >= 1.9.

In order to run benchmarks, type

$ tox -c bench.ini -e pypy

This runs benchmarks under PyPy (they are about 50x slower under CPython).

Authors & Contributors

The algorithms are from dawgdic C++ library by Susumu Yata & contributors.

License

This package is licensed under MIT License.

dawg-python's People

Contributors

kmike avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.