GithubHelp home page GithubHelp logo

halfak / deltas Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 9.0 181 KB

A library for generating deltas of the difference between two sequences of tokens.

Home Page: http://pythonhosted.org/deltas

License: MIT License

Python 99.86% Shell 0.14%

deltas's Introduction

Deltas

An open licensed (MIT) library for performing generating deltas (A.K.A sequences of operations) representing the difference between two sequences of comparable tokens.

This library is intended to be used to make experimental difference detection strategies more easily available. There are currently two strategies available:

deltas.sequence_matcher.diff(a, b):
A shameless wrapper around difflib.SequenceMatcher to get it to work within the structure of deltas.
deltas.segment_matcher.diff(a, b, segmenter=None):
A generalized difference detector that is designed to detect block moves and copies based on the use of a Segmenter.
Example:
>>> from deltas import segment_matcher, text_split
>>>
>>> a = text_split.tokenize("This is some text.  This is some other text.")
>>> b = text_split.tokenize("This is some other text.  This is some text.")
>>> operations = segment_matcher.diff(a, b)
>>>
>>> for op in operations:
...     print(op.name, repr(''.join(a[op.a1:op.a2])),
...           repr(''.join(b[op.b1:op.b2])))
...
equal 'This is some other text.' 'This is some other text.'
insert ' ' '  '
equal 'This is some text.' 'This is some text.'
delete '  ' ''

deltas's People

Contributors

computermacgyver avatar haksoat avatar halfak avatar jodischneider avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deltas's Issues

SegmentMatcher diffengine sometimes returns no operations

>>> from deltas import SegmentMatcher
>>> from deltas import wikitext_split
>>> sm = SegmentMatcher(tokenizer=wikitext_split)
>>> [len(item) for item in sm.process("")]
[]
>>> operations, a, b = sm.process("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack

Possible incompatibility with python 2.7

I get the following error,

import deltas.tokenizers

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/deltas/__init__.py", line 3, in <module>
    from .algorithms.diff_engine import DiffEngine
  File "/usr/local/lib/python2.7/dist-packages/deltas/algorithms/__init__.py", line 18, in <module>
    from .segment_matcher import SegmentMatcher
  File "/usr/local/lib/python2.7/dist-packages/deltas/algorithms/segment_matcher.py", line 20, in <module>
    from . import sequence_matcher
  File "/usr/local/lib/python2.7/dist-packages/deltas/algorithms/sequence_matcher.py", line 10, in <module>
    from ..tokenizers import text_split
  File "/usr/local/lib/python2.7/dist-packages/deltas/tokenizers/__init__.py", line 16, in <module>
    from .tokenizer import Tokenizer, RegexTokenizer
  File "/usr/local/lib/python2.7/dist-packages/deltas/tokenizers/tokenizer.py", line 6, in <module>
    from .token import Token
  File "/usr/local/lib/python2.7/dist-packages/deltas/tokenizers/token.py", line 8, in <module>
    class Token(str):
TypeError: Error when calling the metaclass bases
    nonempty __slots__ not supported for subtype of 'str'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.