o2r-project / erc-checker Goto Github PK

View Code? Open in Web Editor NEW

0.0 7.0 4.0 11.88 MB

JavaScript library and CLI tool for ERC validation and comparison result checking

Home Page: https://o2r.info/erc-checker/

License: Apache License 2.0

JavaScript 3.47% Shell 0.07% HTML 96.46%

erc-checker's Introduction

erc-checker

A JavaScript library and CLI tool for ERC result checking.

The checker is part of the project Opening Reproducible Research (o2r). Its purpose is to verify the result of reproductions of scientific papers as part of the o2r reproducibility service by means of comparing the HTML of the original and reproduced article.

The checker runs on NodeJS. The tool implements a NodeJS module, which is a function returning a JavaScript Promise. It further implements a command line interface (WORK IN PROGRESS).

The documentation is available at https://o2r.info/erc-checker/.

Contribute

All help is welcome: asking questions, providing documentation, testing, or even development. See CONTRIBUTING.md for details.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Publish a new release

# see npm version --help
npm version {major,minor,bugfix}
npm publish

Add version tag locally, e.g., v.1.2.3, and push to remotes. Then create a new release on GitHub.

How to cite

To cite this software please use

Nüst, Daniel, 2018. Reproducibility Service for Executable Research Compendia: Technical Specifications and Reference Implementation. Zenodo. doi:10.5281/zenodo.2203843

License

o2r checker is licensed under Apache License, Version 2.0, see file LICENSE.

erc-checker's People

Contributors

Watchers

Forkers

timmimim nuest tnier01 fmazin

erc-checker's Issues

Make diff HTML file name configurable

Currently the output file in HTML format is fixed to diffHTML.html. It would be nice to be able to configure this as a user of the tool at the time of the check.

Support .ercignore file

See http://o2r.info/erc-spec/spec/#comparison-set-file

Use empty list if no images, and do only add content to display if there is any

Currently, if there is an error (not a failed check), the created object is

checkSuccessful:false
display:Object {diff: null}
end:1510999723270
errors:Array(1) ["Error: ENOENT: no such file or directory, stat '/t…"]
images:null
start:1510999723270

use an empty list for images ([]) instead of null @Timmimim
use an empty object ({}) for display if diff is null @Timmimim
remove special handling in muncher @nuest

Use values from erc.yml if present

The ERC configuration file, erc.yml, has the name of the display file (could be HTML or a png, see #11) in it, see http://o2r.info/erc-spec/spec/#erc-configuration-file

If a file erc.yml is present, it's content should be acknowledged by the checker, i.e. it must use the display file configured in the ERC configuation file.

Add a test with document without any images

one test where there are text differences (failure)
one test without any text differences (success)

Check differences in output data (geospatial) with geodiff

https://github.com/lutraconsulting/geodiff

Explore anti-aliasing settings of different plotting devices

As the blow image show, the text in these base R plots seems to have some kind of anti-aliasing effect. Note that bars match perfectly:

Therefore we must explore what settings for plots are favourable and which settings are obstructive for comparison within the checker.

Tasks

create a small (!) set of plots with both base R and gglot (use the respective documentation for examples)
evaluate resources (see below for a start) for potential settings changing rendering options, including the underlying devices
consider how Cairo might affect this

Resources

Support minimal ERC with PNG output

See http://o2r.info/erc-spec/user-guide/minimal/#minimal-examples-for-ercs

quite configuration is broken

With quiet: true, I get the following error:

message:"debug is not a function"
stack:"TypeError: debug is not a function\n    at /home/daniel/git/o2r/o2r-muncher/node_modules/erc-checker/index.js:374:4\n    at Promise (<anonymous>)\n    at ercChecker (/home/daniel/git/o2r/o2r-muncher/node_modules/erc-checker/index.js:212:9)\n    at __dirname.Executor.compendium.check.Promise.updateStep (/home/daniel/git/o2r/o2r-muncher/lib/executor.js:743:11)\n    at __dirname.Executor.compendium.updateStep.Job.update (/home/daniel/git/o2r/o2r-muncher/lib/executor.js:115:7)\n    at Model.compile.model.Query.callback (/home/daniel/git/o2r/o2r-muncher/node_modules/mongoose/lib/query.js:2880:9)\n    at /home/daniel/git/o2r/o2r-muncher/node_modules/kareem/index.js:273:21\n    at /home/daniel/git/o2r/o2r-muncher/node_modules/kareem/index.js:131:16\n    at _combinedTickCallback (internal/process/next_tick.js:131:7)\n    at process._tickCallback (internal/process/next_tick.js:180:9)"
_

Triggered in line 374 in index.js.

I recommend instead of

if (quiet) {
		debug = debugERROR = null;
	}

simply the enabled property as described here: https://www.npmjs.com/package/debug#checking-whether-a-debug-target-is-enabled

if (quiet) {
		debug.enabled = false;
		debugERROR.enabled = false;
	}

Evaluate and expose blink-diff's perceptual setting

Try out the "perceptual" setting, see https://www.npmjs.com/package/blink-diff#project-focus

Should be possible by providing option at

erc-checker/lib/checker.js

Line 490 in e6e4404

let diff = new BlinkDiff({

The tests should be conducted in combination with #22 and compile to a little report for future reference.

Evaluate ImageMagick comparison and its different metrics

ImageMagick has different metrics to calculate differences between images: https://www.imagemagick.org/script/command-line-options.php#metric

Might be able to get this running in node and should compare it to our current metrics based on blink-diff: https://www.npmjs.com/search?q=magick&ranking=popularity

The function is also easily available in R via magick: https://rdrr.io/cran/magick/man/analysis.html - thanks to @jeroen for the pointer!

library(magick)

> image1 <- image_read("~/Desktop/index1.png")
> image2 <- image_read("~/Desktop/index2.png")
> print(image1)
  format width height colorspace matte filesize density
1    PNG  1344    960       sRGB FALSE    84130   76x76
> image_compare_dist(image1, image2)
$distortion
[1] 0.9503712

> image_compare(image1, image2)
  format width height colorspace matte filesize density
1    PNG  1344    960       sRGB  TRUE        0   76x76
> image_compare(image1, image2, fuzz = 5)
  format width height colorspace matte filesize density
1    PNG  1344    960       sRGB  TRUE        0   76x76
> image_compare(image1, image2, fuzz = 50)
  format width height colorspace matte filesize density
1    PNG  1344    960       sRGB  TRUE        0   76x76
> image_compare_dist(image1, image2, fuzz = 50)
$distortion
[1] 0.9503712

> image_compare_dist(image1, image2, metric = "phash")
$distortion
[1] 0.2836806

> image_compare(image1, image2, metric = "phash")
  format width height colorspace matte filesize density
1    PNG  1344    960       sRGB  TRUE        0   76x76

It has similar challenges though, but the numeric metric could help and be more effective than the current "pixel counting".

build a test case where the line in the plot differs and has the exact same distortion number than a plot where only the font introduces errors

> metric_types()
 [1] "Undefined" "AE"        "Fuzz"      "MAE"       "MEPP"      "MSE"       "NCC"      
 [8] "PAE"       "PHASH"     "PSNR"      "RMSE"
> image_compare_dist(image1, image2, metric = "PAE")
$distortion
[1] 1

> image_compare_dist(image1, image2, metric = "PSNR")
$distortion
[1] 25.01049

> image_compare_dist(image1, image2, metric = "MEPP")
$distortion
[1] 1329365910

The different metrics give different output numbers, but the image is not perceivably different.

Used images for testing:

Re-check all usages of .map function

The function array.map(..) is wrongly used to iterate over an array, e.g. in runBlinkDiff. Make sure that in the function provided to map, return is called and the result is assigned, e.g. result = array.map(function(element) return element);
If iterating, then use array.forEach().

Allow to check a collection of text files in addition via text diff

Given o2r-project/o2r-muncher#81 the checker should also compare these plain text files as part of a check.

Add text comparison and visualisation for results

The Checker should be able to compare text chunks in papers, to find differences in their content beyond just images.
Compare Issue #1.

The Checker is currently only programmed to handle HTML files. A textual Check should focus on the HTML's <body>.
The body includes minor style elements and CSS, the actual text of the paper, and images. Since the Checker is already able to find and extract images, it may just as well assume that the remainder of the paper is mostly Text.
So the Checker already discriminates successfully between the two major parts of the body.

I would propose running text comparisons using String Edit Distance algorithms such as Levenshtein or similar (Issue #1). There are some fast and stable implementations available in npm.

The runtime of these algorithms inflates massively (!) for larger text chunks. Therefore I would suggest dividing the text into chunks, e.g. paragraph-wise. This, however, may prove problematic, if one of the papers has e.g. an extra paragraph somewhere in the middle of the paper. This would 'blow up' the results, since every pair of paragraphs following this extra (or missing) paragraph would most likely differ by a lot.
Text comparison must keep such eventualities in mind!

The <head>s usually contain base64-encoded JavaScript and plain CSS. I'm not sure if these make sense checking, but could maybe be useful as additional information.

Furthermore, String Edit Distance algorithms quantify differences, but do nothing for highlighting.
So there should be visualisation for quantified differences. This may be done by plotting differences, however these plots need to represent the differences and their meaning comprehensively, which may be tricky.
A better way may be to visualise differences in the UI as part of the diff-HTML view. Therefore, difference highlighting (git, vimdiff, ...) would be very helpful to make differences visible and easy to comprehend.

Finally, quantified differences and probably the position of these differences should be added to the Check-Result JSON.

Check and compare SVG output

Some links: https://www.stat.auckland.ac.nz/~paul/ and "SVG is not a dead end", https://www.stat.auckland.ac.nz/~paul/Talks/JSM2013/grid.html

Difference HTML file is missing some content with "undefined"

Provide access to the check log when using checker as a library

In the log I can see

index:requestHandling Differences were found; Calling compareHTML to create a HTML file highlighting these differences. +0ms
  checker:slice Extracting text chunks from original HTML String and saving them for later. +6ms
  checker:general Successfully read files. +2ms
  checker:slice Sliced up those pesky Stringsens. +5ms
  checker:slice Original:  1 images, 2 chunks of text. +1ms
  checker:slice Reproduced: 1 images, 2 chunks of text. +0ms
  checker:slice Created Buffer for Reproduced image #0: 17560 +0ms
  checker:slice Created Buffer for Reproduced image #0: 17639 +1ms
  checker:slice All images were extracted successfully. +0ms
  checker:compare Begin comparing images. +0ms
  checker:compare Original 0: {"width":1344,"height":960,"type":"png"} +1ms
  checker:compare Reproduced 0: {"width":1344,"height":960,"type":"png"} +0ms
  checker:compare No resizing needed for images with index 0 +1ms
  checker:general Preparation is done, move on to visual comparison. +0ms
  checker:compare Starting visual comparison of 1 images. +0ms
  checker:compare Creating a diff-Image for images with index 0 +1ms
  checker:compare Visual Comparison completed. +978ms
  checker:reassemble Begin Reassembling HTML with Diff-Images where images were not equal. +0ms
  checker:reassemble Piecing together text chunks and images. +0ms
  checker:reassemble Reassembly done. +7ms
  index:requestHandling Check done. +0ms
  index:requestHandling Metadata JSON file written successfully +61ms
  index:requestHandling Diff-HTML file written successfully +25m

And the resolved object is (excerpt):

{ checkSuccessful: false,
  images: 
   [ { imageIndex: 0,
       resizeOperationCode: 0,
       compareResults: [Object] } ],
  display: 
   { diff: '<!DOCTYPE html>\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n\n<head>\n\n<meta charset="utf-8" />\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n<meta name="generator" content="pandoc" />\n\n\n\n<meta name="date" content="2017-01-01" />\n\n<title>Capacity of container ships in seaborne trade from 1980 to 2016 (in million dwt)*</title>\n\n<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dGhpcyxmdW5jdGlvbihhLGIpe3ZhciBjPVtdLGQ9Yy5zbGljZSxlPWMuY29uY2F0LGY9Yy5wdXNoLGc9Yy5pbmRleE9mLGg9e30saT1oLnRvU3RyaW5nLGo9aC5oYXNPd2
[...]

Ideally, a stream of the check log should be made available. Is this is too complex, then the full log should be included in the resolved object.

Remove "timeOfCheck"

The added semantics of the property timeOfCheck are not given, so I suggest to remove it and have start and end as properties on the first level of result objects.

Add MacOS and Windows support for site build Action

At the moment the site building Action only runs on Linux. Theoretically it should also be able to run on Windows and MacOS but the OS difference must be handled in the .github/workflows/site_build.yml.

Consider to use the Edit distance for string comparison

For measuring the extent of difference in strings, we could use an algorithm for the Edit distance, e.g. the Levenshtein distance.

Doing so might enable us to better compare html or pdf files or segements of text within them that are close but no exact matches. It might also be a means for quantifying the comparison result in contrast to checksums that are compared in a boolean way.

Output improvements

Absolute paths for output do not work

DEBUG=* erc-checker test/TestPapers_2/paper_9_img_A.html test/TestPapers_2/paper_9_img_B.html -o /tmp/erc-checker

[...]

  checker:reassemble    found diff image #8 +0ms
  checker:reassemble    diff image #8 read +0ms
  checker:reassemble    diff image #8 integrated +0ms
  checker:reassemble    HTML stringified and reassembled. +0ms
  checker:ERROR                 Writing diff HTML file failed.ENOENT: no such file or directory, open '/home/daniel/git/o2r/erc-checker/tmp/erc-checker.html' +1ms

Create missing (intermediate) output directories if they do not exist

There is a warning if the output directory does not exists (see example above), but it would be nice if there is a flag to create the missing dirs.

Add default file name, diff.html

DEBUG=* erc-checker test/TestPapers_2/paper_9_img_A.html test/TestPapers_2/paper_9_img_B.html -o test-output/

Add support for external images

So far images must be base64 - encoded within the HTML. Add support for loading images via path from the compendium.

Try out JestJS for tests

https://jestjs.io/

Worth a switch? Migration from Mocha should be painless: https://jestjs.io/docs/en/migration-guide.html