GithubHelp home page GithubHelp logo

markedjs / html-differ Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bem/html-differ

24.0 3.0 13.0 3.4 MB

Сompares two HTML

Home Page: https://www.npmjs.com/package/@markedjs/html-differ

License: MIT License

JavaScript 84.81% HTML 15.19%

html-differ's Introduction

html-differ

Build Status Install Size

Compares two HTML strings.

The comparison algorithm

html-differ compares HTML using the following criteria:

  • <!DOCTYPE> declarations are case-insensitive, so the following two code samples will be considered to be equivalent:
<!DOCTYPE HTML PUBLIC "_PUBLIC" "_SYSTEM">
<!doctype html public "_PUBLIC" "_SYSTEM">
  • Whitespaces (spaces, tabs, new lines etc.) inside start and end tags are ignored during the comparison.

For example, the following two code samples will be considered to be equivalent:

<span id="1"></span>
<span id=
    "1"    ></span   >
  • Two respective lists of attributes are considered to be equivalent even if they are specified in different order.

For example, the following two code samples will be considered to be equivalent:

<span id="blah" class="ololo" tabIndex="1">Text</span>
<span tabIndex="1" id="blah" class="ololo">Text</span>
  • Two respective attributes class are considered to be equivalent if they refer to the same groups of CSS styles.

For example, the following two code samples will be considered to be equivalent:

<span class="ab bc cd">Text</span>
<span class=" cd  ab bc bc">Text</span>

CAUTION!
html-differ does not check the validity of HTML, but compares them using the above shown criteria and specified options (see the list of possible options).

Install

$ npm install @markedjs/html-differ

API

HtmlDiffer

import { HtmlDiffer } from '@markedjs/html-differ';
const htmlDiffer = new HtmlDiffer(options);

where options is an object.

Options

ignoreAttributes: [Array]

Sets what kind of respective attributes' content will be ignored during the comparison (default: []).

Example: ['id', 'for']
The following two code samples will be considered to be equivalent:

<label for="random">Text</label>
<input id="random">
<label for="sfsdfksdf">Text</label>
<input id="sfsdfksdf">
compareAttributesAsJSON: [Array]

Sets what kind of respective attributes' content will be compared as JSON objects, but not as strings (default: []).
In cases when the value of the attribute is an invalid JSON or can not be wrapped into a function, it will be compared as undefined.

Example: [{ name: 'data', isFunction: false }, { name: 'onclick', isFunction: true }]
The following two code samples will be considered to be equivalent:

<div data='{"bla":{"first":"ololo","second":"trololo"}}'></div>
<span onclick='return {"aaa":"bbb","bbb":"aaa"}'></span>

<button data='REALLY BAD JSON'></button>
<button onclick='REALLY BAD FUNCTION'></button>
<div data='{"bla":{"second":"trololo","first":"ololo"}}'></div>
<span onclick='return {"bbb":"aaa","aaa":"bbb"}'></span>

<button data='undefined'></button>
<button onclick='undefined'></button>

REMARK!
The first element of the array could be written in a short form as string:
['data', { name: 'onclick', isFunction: true }].

ignoreWhitespaces: Boolean

Makes html-differ ignore whitespaces (spaces, tabs, new lines etc.) during the comparison (default: true).

Example: true
The following two code samples will be considered to be equivalent:

<html>Text Text<head lang="en"><title></title></head><body>Text</body></html>
 <html>
 Text   Text
<head lang="en">
    <title>               </title>


            </head>

<body>
     Text

    </body>




</html>
ignoreComments: Boolean

Makes html-differ ignore HTML comments during the comparison (default: true).

REMARK!
Does not ignore conditional comments.

Example: true
The following two code samples will be considered to be equivalent:

<!DOCTYPE html>
<!-- comments1 -->
<html>
<head lang="en">
    <meta charset="UTF-8">
    <!--[if IE]>
        <link rel="stylesheet" type="text/css" href="all-ie-only.css" />
    <![endif]-->
    <!--[if !IE]><!-->
        <link href="non-ie.css" rel="stylesheet">
    <!--<![endif]-->
</head>
<body>
Text<!-- comments2 -->
</body>
</html>
<!DOCTYPE html>

<html>
<head lang="en">
    <meta charset="UTF-8">
    <!--[if IE]>
        <link href="all-ie-only.css" type="text/css" rel="stylesheet"/>
    <![endif]-->
    <!--[if !IE]><!-->
        <link href="non-ie.css" rel="stylesheet">
    <!--<![endif]-->
</head>
<body>
Text
</body>
</html>
ignoreEndTags: Boolean

Makes html-differ ignore end tags during the comparison (default: false).

Example: true
The following two code samples will be considered to be equivalent:

<span>Text</span>
<span>Text</spane>
ignoreSelfClosingSlash: Boolean

Makes html-differ ignore tags' self closing slash during the comparison (default: false).

Example: true
For example, the following two code samples will be considered to be equivalent:

<img src="blah.jpg" />
<img src="blah.jpg">

Presets

  • bem - sets predefined options for BEM.
Usage

Passing of a preset via the constructor:

import { HtmlDiffer } from '@markedjs/html-differ';
const htmlDiffer = new HtmlDiffer('bem');

Redefinition of a preset via the constructor:

import { HtmlDiffer } from '@markedjs/html-differ';
const htmlDiffer = new HtmlDiffer({ preset: 'bem', ignoreAttributes: [] });

Methods

htmlDiffer.diffHtml

@param {String} - the 1-st HTML code
@param {String} - the 2-nd HTML code
@returns Promise<{Array of objects}> - array with diffs between HTML

htmlDiffer.isEqual

@param {String} - the 1-st HTML code
@param {String} - the 2-nd HTML code
@returns Promise<{Boolean}>

Logger

import * as logger from '@markedjs/html-differ/lib/logger';

Methods

logger.getDiffText

@param {Array of objects} - the result of the work of the method htmlDiffer.diffHtml
@param {Object} - options:

  • charsAroundDiff: Number - the number of characters around the diff result between two HTML (default: 40).

@returns {String}

logger.logDiffText

@param {Array of objects} - the result of the work of the method htmlDiffer.diffHtml
@param {Object} - options:

  • charsAroundDiff: Number - the number of characters around the diff result between two HTML (default: 40).

@returns - pretty logging of diffs:

Example

import fs from 'fs';
import { HtmlDiffer } from '@markedjs/html-differ';
import * as logger from '@markedjs/html-differ/lib/logger';

const html1 = fs.readFileSync('1.html', 'utf-8');
const html2 = fs.readFileSync('2.html', 'utf-8');

const options = {
  ignoreAttributes: [],
  compareAttributesAsJSON: [],
  ignoreWhitespaces: true,
  ignoreComments: true,
  ignoreEndTags: false
};

const htmlDiffer = new HtmlDiffer(options);

async function run() {
  const diff = await htmlDiffer.diffHtml(html1, html2);
  const isEqual = await htmlDiffer.isEqual(html1, html2);
  const res = logger.getDiffText(diff, { charsAroundDiff: 40 });

  logger.logDiffText(diff, { charsAroundDiff: 40 });
}

run();

Usage as a program

$ html-differ --help
Compares two HTML

Usage:
  html-differ [OPTIONS] [ARGS]

Options:
  -h, --help : Help
  -v, --version : Shows the version number
  --config=CONFIG : Path to a configuration JSON file
  --bem : Uses predefined options for BEM (deprecated)
  -p PRESET, --preset=PRESET : Name of a preset
  --chars-around-diff=CHARSAROUNDDIFF : The number of characters around the diff (default: 40)

Arguments:
  PATH1 : Path to the 1-st HTML file (required)
  PATH2 : Path to the 2-nd HTML file (required)

Example

$ html-differ path/to/html1 path/to/html2

$ html-differ --config=path/to/config --chars-around-diff=40 path/to/html1 path/to/html2

$ html-differ --preset=bem path/to/html1 path/to/html2

Configuration file

Study the following file config.json:

{
  "ignoreAttributes": [],
  "compareAttributesAsJSON": [],
  "ignoreWhitespaces": true,
  "ignoreComments": true,
  "ignoreEndTags": false
}

Masks

html-differ supports handling of masks in HTML.

For example, the following two code samples will be considered to be equivalent:

<div id="{{[a-z]*\s\d+}}">
<div id="text 12345">

Syntax

Masks in html-differ have the following syntax:

{{RegExp}}

where:

  • {{ – opening identifier of the mask.

  • RegExp – regular expression for matching with the corresponding value in another HTML. The syntax is similar to regular expressions in JavaScript written in a literal notation.

  • }} – closing identifier of the mask.

Screening

The rules of screening of symbols are similar to the rules which are used in regular expressions in JavaScript written in a literal notation.

For example, the following two code samples will be considered to be equivalent:

<div id="{{\d\.\d}}">
<div id="1.1">

If you want to use {{ or }} inside a mask, you should screen both curly braces, i.e. \{\} or \}\}.

For example, the following two code samples will be considered to be equivalent:

<div class="{{a\{\{b\}\}c}}">
<div class="a{{b}}c">

html-differ's People

Contributors

baweaver avatar dependabot[bot] avatar egavr avatar feder1co5oave avatar github-actions[bot] avatar golyshevd avatar joshbruce avatar marcbachmann avatar mvasilkov avatar semantic-release-bot avatar shuhrat avatar styfle avatar tadatuta avatar tripodsan avatar uzitech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

html-differ's Issues

update diff

The diff dependency breaks tests after v1.3.2

Not exactly sure why 😕

Ordering attributes by value changes the meaning of HTML

Hi, I wanted to discuss the following sort() call: https://github.com/markedjs/html-differ/blob/master/lib/utils/utils.js#L24-L32

When an element has two attributes with the same name (not case-sensitive), the behavior is actually well-defined: the first attribute is used.

Consider the next example: https://codepen.io/mvasilkov/pen/abvWeqp

<button style="visibility: hidden" style="display: block">
  Can't see me
</button>

If we change the order of the attrs we introduce a bug, yet html-differ will report identical HTML.

With the Array#sort function being a stable sort now (since ES2019 / Chrome 70 / Node 11 / v8 7.0), the only thing needs to be done to address this, if you choose to do so, is to drop the a.value <> b.value comparisons and return zero if names match.

What do you think?

error implement

I'm using Angular 11, after install dependency try to create file to care about compare html
`import { Injectable } from '@angular/core';
import { HtmlDiffer, Options } from '@markedjs/html-differ';

@Injectable({
providedIn: 'root'
})
export class HtmlDifferService {

options: Options = {
// preset: 'bem',
"ignoreAttributes": ["id", "for", "aria-labelledby", "aria-describedby"],
"compareAttributesAsJSON": [
"data-bem",
{ "name": "onclick", "isFunction": true },
{ "name": "ondblclick", "isFunction": true }
]
};
htmlDiffer: HtmlDiffer;

constructor() {
this.htmlDiffer = new HtmlDiffer('bem');
}

compareDesc(_old: string, _new: string, isReview: boolean = false): { text: string, hasNew: boolean } {
const diff = this.htmlDiffer.diffHtml(_old, _new);
return { text: _new, hasNew: true };
}
}
`

but got error when angular compile :

`Error: ./node_modules/parse5-sax-parser/lib/index.js
Module not found: Error: Can't resolve 'stream' in '/xxx/node_modules/parse5-sax-parser/lib'

Error: ./node_modules/parse5-sax-parser/lib/dev-null-stream.js
Module not found: Error: Can't resolve 'stream' in '/xxx/node_modules/parse5-sax-parser/lib'

Error: ./node_modules/@markedjs/html-differ/lib/defaults.js 2:36
Module parse failed: Unexpected token (2:36)
You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders
| import { createRequire } from 'module';

const require = createRequire(import.meta.url);
|
| export const presets = {`

global `modifiedTokens` cause memory leak

the module global

const modifiedTokens = {};

is updated with:

  modifiedTokens[html] = tokens.split(/({{.+?}}(?!})|[{}\(\)\[\]#\*`=:;,.<>"'\/]|\s+)/).filter(i => i);

for each diff. after using it about 20k times, my nodejs process goes out of memory.

suggestion

make them fields of the HtmlDiffer class.

TypeError: _require is not a function

  ● Test suite failed to run

    TypeError: _require is not a function

    >  1 | import { HtmlDiffer } from '@markedjs/html-differ';
         | ^

      at Object._require (node_modules/@markedjs/html-differ/lib/defaults.js:2:30)
      at Object.require (node_modules/@markedjs/html-differ/lib/HtmlDiff.js:2:1)
      at Object.require (node_modules/@markedjs/html-differ/lib/index.js:1:1)

image

package.json

...
"@markedjs/html-differ": "4.0.2",
"@babel/core": "7.19.3",
"@babel/preset-env": "7.19.3",
@babel/plugin-transform-runtime: "7.21.4",
"babel-plugin-transform-import-meta": "2.2.0",
"jest": "28.1.1",
...

jest.config.ts

const esModules = ['@markedjs/html-differ'].join('|');

const config = {
  ...
  transform: {
    [`(${esModules}).+\\.js$`]: require.resolve('babel-jest'),
   ...
  },
  transformIgnorePatterns: [
    `/node_modules/(?!${esModules})`,
   ...
  ],
  ...
}

babel.config.js

module.exports = {
  env: {
    test: {
      presets: ['@babel/preset-env'],
      plugins: ['@babel/plugin-transform-runtime', 'babel-plugin-transform-import-meta'],
    },
  },
};

The issue can be resolved by downgrading to "@markedjs/html-differ": "3.0.4", without using the abovementioned configurations.

Produce wrong diff when repetitive data

Hello, thank you for this library

>>> from html_diff import diff
>>> diff("<p>n</p><p>n</p>","<p>n</p><p>n</p>")
'<p></p><p>n</p>'

I was expecting <p>n</p><p>n</p> is that the expected behavior ?

Travis

We should get this on travis-ci

update parse5

parse5's parser is now an async stream so updating to the latest version isn't so easy. html-differ will have to become async as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.