GithubHelp home page GithubHelp logo

tdiff's Introduction

TDiff

CI

Description

Calculates the differences between two tree-like structures. Similar to Rubys built-in TSort module.

Features

  • Provides the {TDiff} mixin.
  • Provides the {TDiff::Unordered} mixin for unordered diffing.
  • Allows custom node equality and traversal logic by overriding the {TDiff#tdiff_equal} and {TDiff#tdiff_each_child} methods.
  • Implements the Longest Common Subsequence (LCS) algorithm.

Examples

Diff two HTML documents:

require 'nokogiri'
require 'tdiff'

class Nokogiri::XML::Node

  include TDiff

  def tdiff_equal(node)
    if (self.text? && node.text?)
      self.text == node.text
    elsif (self.respond_to?(:root) && node.respond_to?(:root))
      self.root.tdiff_equal(node.root)
    elsif (self.respond_to?(:name) && node.respond_to?(:name))
      self.name == node.name
    else
      false
    end
  end

  def tdiff_each_child(node,&block)
    node.children.each(&block)
  end

end

doc1 = Nokogiri::HTML('<div><p>one</p> <p>three</p></div>')
doc2 = Nokogiri::HTML('<div><p>one</p> <p>two</p> <p>three</p></div>')

doc1.at('div').tdiff(doc2.at('div')) do |change,node|
  puts "#{change} #{node.to_html}".ljust(30) + node.parent.path
end

Output

+ <p>one</p>                  /html/body/div
+                             /html/body/div
  <p>one</p>                  /html/body/div
                              /html/body/div
  <p>three</p>                /html/body/div
- one                         /html/body/div/p[1]
+ two                         /html/body/div/p[2]
  three                       /html/body/div/p[2]

Requirements

Install

$ gem install tdiff

Copyright

See {file:LICENSE.txt} for details.

tdiff's People

Contributors

postmodern avatar bhollis avatar

Stargazers

Cristian Molina avatar Delon R. Newman avatar Arturs Krapans avatar Nate Hopkins avatar  avatar Steve Glaser avatar Blake Thomson avatar Kent Gruber avatar Troels Knak-Nielsen avatar 大野 avatar Javier Honduvilla Coto avatar Héctor Ramón avatar Geo avatar Angus H. avatar Jan Ivar Beddari avatar Peter Leitzen avatar Burin Choomnuan avatar Gerald Bauer avatar Brad Olson avatar Dave Sag avatar KITAITI Makoto avatar Michael Schueler avatar Peter Brindisi avatar Jits avatar Konstantin Haase avatar  avatar  avatar RailsFactory avatar

Watchers

 avatar KITAITI Makoto avatar James Cloos avatar  avatar

tdiff's Issues

unordered diffing

I think there is an issue with the diff using the :added and :removed options. To my understanding, using these options should provide a diff that doesn't look at the order of the nodes, but in some specific cases it seems as if order is still important.

This gist is my creation of the problem:
https://gist.github.com/tinhajj/95a2d97acd35d610cec4eb61845af4b0

To my understanding the two xml files in my example should be equivalent if no ordering is taken into account, but the output shows that there are changes.

/edit sorry if I attached this issue on the wrong repo. it might have been more appropriate on nokogiri/diff.

Suboptimal results when there are inserts mixed w deletes

Given these two inputs:

before:

<p>one</p>
<p>two</p>
<div>combobreaker</div>
<p>three</p>
<p>four</p>

after:

<p>one</p>
<p>one-and-a-half</p>
<p>two</p>
<div>combobreaker</div>
<p>four</p>

I would expect the following output:

expected:

<p>one</p>
<ins><p>one-and-a-half</p></ins>
<p>two</p>
<div>combobreaker</div>
<del><p>three</p></del>
<p>four</p>

But what I actually get is instead:

actual:

<p>one</p>
<p><del>two</del><ins>one-and-a-half</ins></p>
<del><div>combobreaker</div></del>
<p><del>three</del><ins>two</ins></p>
<ins><div>combobreaker</div></ins>
<p>four</p>

I realise that both outputs are technically valid, but the one produced is clearly the least desirable of the two. Since this is an old library, that hasn't seen much activity recently, I don't really expect a fix from the author, but any hints as to how one might remedy this would be appreciated.

Traverse self only

Hi
I need to compare two html for computing a ‘percentage tag matching similarity’ of doc2 related to doc1. For that purpose, its necessary to get the total amount of doc1 html tags, but the tdiff iterator yields doc1 + doc2 html tags

It is possible to traverse just the doc1 (self) object and returning when it's done?

Release a new version

The latest version released to rubygems emits a "shadowed variable warning" but it seems you have fixed it in the code base but not yet released the fix. It would be great to see a release to get rid of that warning when running our tests.

Thanks!

Says first paragraph was added when it wasn't

Looking at the example, I can't see how the algorithm is mathematically correct:

doc1 = Nokogiri::HTML('<div><p>one</p> <p>three</p></div>')
doc2 = Nokogiri::HTML('<div><p>one</p> <p>two</p> <p>three</p></div>')

doc1.at('div').tdiff(doc2.at('div')) do |change,node|
  puts "#{change} #{node.to_html}".ljust(30) + node.parent.path
end

output:

+ <p>one</p>                  /html/body/div
+                             /html/body/div
  <p>one</p>                  /html/body/div
                              /html/body/div
  <p>three</p>                /html/body/div
- one                         /html/body/div/p[1]
+ two                         /html/body/div/p[2]
  three                       /html/body/div/p[2]

Why does it say that the first paragraph was added? On the first run through, it should evaluate that there is an equal for each <p> node (tag names equal). Then when iterating over text nodes, it should find an equal for "one".

As far as my understanding of the algorithm goes, I believe the output should look like this:

  <p>one</p>                  /html/body/div
  <p>three</p>                /html/body/div
  one                         /html/body/div/p[1]
+ two                         /html/body/div/p[2]
  three                       /html/body/div/p[2]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.