GithubHelp home page GithubHelp logo

Add basic tests about remarks HOT 4 OPEN

lucasrla avatar lucasrla commented on May 27, 2024
Add basic tests

from remarks.

Comments (4)

lucasrla avatar lucasrla commented on May 27, 2024 1

Thanks for the write-up!

To kick things off, I plan to write a few basic tests using pytest and the demo as soon as I find some time in the upcoming weeks.

Regarding your point about how to "assert" PDFs, I am not sure what approaches are available. I will have a look at what OCRmyPDF does and see if I can get some inspiration.

PS: If you want to read well-written guides on anything Python, try RealPython's. On tests, see, for instance: Getting Started With Testing in Python, Effective Python Testing With Pytest, etc.

from remarks.

clement-elbaz avatar clement-elbaz commented on May 27, 2024 1

Regarding your point about how to "assert" PDFs, I am not sure what approaches are available. I will have a look at what OCRmyPDF does and see if I can get some inspiration.

Personally I think in the short and medium term having everything else under test is more than enough. If both the PNGs and MDs files are byte-for-byte identical to what is expected, it is very likely that the PDF files are too for most possible type of regressions. Especially if the existence of the PDF files is checked.

In the long term, one approach could be to convert each PDF to an image and check those images as being pixel-for-pixel identical to the expected output. Another approach could be to study the PDF format a bit and explore what parts of the file are supposed to be changing and what parts are not.

PS: If you want to read well-written guides on anything Python, try RealPython's. On tests, see, for instance: Getting Started With Testing in Python, Effective Python Testing With Pytest, etc.

Thanks for that! I'll keep those in mind if I have to do more serious Python work in the future.

from remarks.

Azeirah avatar Azeirah commented on May 27, 2024 1

For what it's worth, I added an extremely simple testing set-up. It doesn't even look at the contents of the files, it just asserts that there are no errors.

This is better than having no tests. I'm using this tool with many different notebooks, and I already found two fairly basic errors which are now prevented from being regressed by testing in this manner.

from remarks.

clement-elbaz avatar clement-elbaz commented on May 27, 2024

Some thoughts about this.

A good starting point could be to design test cases by creating a xochitl input directory and an expected output directory (or several output directories to test various command lines options). The tests would simply consist in running remarks, compare the generated output with the expected output, and raises an error if the files do not match. This is simple and could already catch a lot of things.

However there is a pitfall to avoid with this approach. The PDF format is not reproducible, notably because it contains a timestamp of creation as well as various things that are supposed to change every time: https://tex.stackexchange.com/questions/229605/reproducible-latex-builds-compile-to-a-file-which-always-hashes-to-the-same-va

And indeed, if I run the demo two times, the the PNG and MD files stay the same but the PDF files change:

# SHA256 hashes of files generated by the demo

PDF files: 

# Run 1:
00c60665d3d1b9df91565946b0471c44737c267496c57b14b2fdcdaa2b502767  1936 On Computable Numbers, with an Application to the Entscheidungsproblem - A. M. Turing _remarks.pdf
fbab4f8e5cb88666d6c84edaf0c0ecc0a478429bfbddd8c3c2ba1d99ba93982a  ./pdf/00.pdf
5c89e72f573eda68256e003c04dfcb19211eace65ca9b59f6841afad634d520f  ./pdf/01.pdf
6448127e7cc741ec145e27cae7611a5839bb9eb823fd24b38a40f03d99f339d7  ./pdf/27.pdf

# Run 2:
3af6e8fba49f90e2760847a6c5b63bf8673b97974c468402741e4b96e9cfa5d3  1936 On Computable Numbers, with an Application to the Entscheidungsproblem - A. M. Turing _remarks.pdf
c2865fe6516f08898a0365ce82d6e48c8f388bd936679f12c57d4fa18a044084  ./pdf/00.pdf
f0386058d7f9d12392cd66f82b710cad7179b18b4b85b3032e2cf9e3331b819e  ./pdf/01.pdf
67af8632f18b0cfae4c90a03a20ba7e172b2aa02e431196cbf9c3d01cdec639c  ./pdf/27.pdf


# MD files:

#Run 1:
3467455bb359d0b9d467cec1ec8020fe063ce2426984b4a7e94b017bc3492574  ./md/00.md
4f5ebd271306c84f0dbf2167ae9a0d75ba64ec362b11fbe7887f03bfdbb2b3af  ./md/01.md
bb7e45351862472a07638ba02999e48cc9f7b2c46457bcebe65da31180a71eef  ./md/27.md

#Run 2:
3467455bb359d0b9d467cec1ec8020fe063ce2426984b4a7e94b017bc3492574  ./md/00.md
4f5ebd271306c84f0dbf2167ae9a0d75ba64ec362b11fbe7887f03bfdbb2b3af  ./md/01.md
bb7e45351862472a07638ba02999e48cc9f7b2c46457bcebe65da31180a71eef  ./md/27.md

#PNG files:
#Run 1:
c142681cf9fec54fec77efa3bb33e2890315685fa1203d1f4e9230e81e75b2e1  ./png/00.png
c63f032392c46eef5528c94cb0ce44869617ed6ed3cb8ff7f816fb5b63646d3f  ./png/01.png
2a479dc4208df1dccd2946ea43ee28c92987e11e13b554d07aa20d27cb16984d  ./png/27.png

#Run 2:
c142681cf9fec54fec77efa3bb33e2890315685fa1203d1f4e9230e81e75b2e1  ./png/00.png
c63f032392c46eef5528c94cb0ce44869617ed6ed3cb8ff7f816fb5b63646d3f  ./png/01.png
2a479dc4208df1dccd2946ea43ee28c92987e11e13b554d07aa20d27cb16984d  ./png/27.png

Nevertheless, the approach could work quite well by checking the hash of all MD and PNG files and at least the existence of all PDF files between the expected output directory and the test-generated output directory. The PDF-specific code would have to be tested differently, but everything else would get a lot of test coverage in an easy way.

Such a test harness could quickly catch a lot of things while being easy to setup initially: a good starting point would be to use the files from the demo as a first test case and enrich the tests from there. Adding test cases would be as easy as creating documents on our remarkable tablets, copy them in the test directory, then convert them using a known good version of remarks.

Assuming we follow this approach, one open question is whether to use a Python testing framework (which I'm personally not familiar with) or plainly use a bash / makefile script that just call remarks from the outside and make the appropriate checks externally. I've no opinion on this.

What do you think of all that @lucasrla ?

from remarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.