GithubHelp home page GithubHelp logo

suspension's Introduction

suspension - cross-format merging through token suspension

This library enables the transfer of text changes from one document to another, while preserving unique tokens in the destination document.

These tokens must be context-free, which is generally quite an issue, but certain formats such as Markdown and Wiki can handle it OK.

Components

  • Suspender - Removes the given set of tokens from the document, storing them in an offsets table (lossless).
  • Unsuspender - Joins a base document and an offsets table together to procude a single output document.
  • TextReplayer - Push changes from one document to another while preserving unique tokens in the destination document.
  • RelativeSuspendedTokens - (De)serialize a tab-delimited syntax for the offsets.
  • TokenReplacer - Push token offsets from one document to another.
  • TokenRemover - Remove tokens from a document.

How to run specs

Run entire spec suite:

rake

or just run a single spec file:

ruby specs/repositext_tokens_spec.rb

Command line utility (incomplete)

	suspend push text from-file to-file [-tokens]
	TextReplayer.new(from-text, to-text).replay

	suspend push [:subtitle_mark] from-file to-file
	TokenReplacer.new(from-text,to-text, Suspension::REPOSITEXT_TOKENS, Suspension::REPOSITEXT_TOKENS).replace([:subtitle_mark]) --> to_file

	suspend strip frome-file to-file [-tokens]
	Suspender.new(from-text,Suspension::REPOSITEXT_TOKENS).suspend.filtered_text

	suspend export from-file to-file [-tokens]
	Exports the token offset list.

	Suspender.new(from-text,Suspension::REPOSITEXT_TOKENS).suspend.suspended_tokens.to_relative.serialize

	suspend import offset-file text-file, to-file
	Merges the offset and text file

Diagrams

Repositext tokens overview

Repositext tokens

Workflow: Sync text changes from PT to AT

Retaining repositext tokens in AT.

Repositext tokens

Workflow: Sync subtitle_marks from ST to AT

Retaining plain text and other repositext tokens in AT.

Repositext tokens

Workflow: Convert AT to PT

Retaining kramdown-subset tokens, discarding at-specific tokens.

Repositext tokens

Workflow: Sync record_mark tokens from AT V2 to AT V1

Retaining all other tokens in AT V1.

Repositext tokens

suspension's People

Contributors

jhund avatar lilith avatar aaronjwalker avatar

Stargazers

 avatar

Watchers

Thomas Leitner avatar  avatar  avatar James Cloos avatar  avatar  avatar Joshua avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

suspension's Issues

Suspending newlines?

We may have an additional use case; suspending newlines so that we can line up different text for merging word changes between 2 or more versions of text that have different paragraph division.

This would probably be used for both pt-pt and txt-txt comparison and merging.

Tokens are re-inserted at increasingly incorrect locations as the document progresses

from_file = ARGV.shift
to_file = ARGV.shift

from_contents = File.read(from_file)
to_contents = File.read(to_file)

updated_from_text = Suspension::TextReplayer.new(to_contents,from_contents).replay```

This doesn't work with our documents - see https://github.com/imazen/repositext-vgr-specs/issues/16 for details.

Perhaps some kind of line ending normalization gone wrong, or unicode length errors?

diff_match_patch_native doesn't handle multibyte characters

I ran the Suspension spec suite with DMP native and got a failure for the multibyte spec:

  1) Failure:
Suspension::DiffAlgorithm::call#test_0004_handles strings with multibyte characters [/Users/johund/development/suspension/specs/diff_algorithm_spec.rb:39]:
Expected: [[1, "à"]]
  Actual: [[1, "\xC3\xA0"]]

This may be the cause for the issues I experience with 3way merging.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.