htacg / tidy-html5-tests Goto Github PK

View Code? Open in Web Editor NEW

0.0 5.0 10.0 845 KB

Regression testing files and tools for HTML Tidy.

HTML 89.85% Batchfile 5.86% Shell 4.29%

tidy-html5-tests's People

Contributors

Watchers

Forkers

gagern tattocau kleopatra999 geoffmcl lhchavez ler762 trullock

tidy-html5-tests's Issues

Need to check line ending for tests 500236 and 661606

These two tests are ok in windows, but for some strange reason when cloned into linux, the expected files retain the windows line ending?

This means the diff compare fails in unix, and probably mac, unless an option added to ignore space, like -w, which should not be required.

For the moment these tests have been moved to the cases/specials folder, and removed from the tests manifest cases\testbase\_manifest.txt.

We need to discover why git does this, and somehow fix it.

A less liked alternative would be to instruct tidy, through its config, to output a matching CRLF - see newline - but this should not be required!

Needless to say, at some point they must be recovered from special and added back to the testbase manifest.

Text difference, next 5.7.0, linux and Windows, for case 629

Running regression tests using latest next tidy 5.7.0 on testbase next, version 5.5.84, gets a difference between linux message output, and windows message output, on one test, case 629... the html output are the same...

The windows tidy output exactly matches the testbase-expects, but the linux tidy seems to lose the first part of the string output...

Like for the first message the windows tidy correctly outputs, the expected -

Config: option "mute" given bad argument "FAKE_TAG" (STRING_ARGUMENT_BAD)

while linux tidy only outputs -

 (STRING_ARGUMENT_BAD)

And a similar problems with the next 4 message outputs... weird, very strange..

As indicated, the Windows Tidy passes 100%, while the Ubuntu linux Tidy fails on this one case, 629...

Have yet to dig deeply into it, but maybe somebody will spot either the problem, or maybe my testing problem, easily... thanks...

Am I doing something wrong in the testing,,, like picking up some stray config from somewhere, or something...

PS: My RPI Raspbian Tidy has the same missing output problem...

Access Tests need some TLC

While the initial part of the testing -c access runs no problem, the diff between expects and results needs an update...

The common, acceptable diffs are -

- Accessibility Checks: Version 0.1
+Accessibility Checks:
 and warning messages like
-1 warning, 0 errors were found!
+Tidy found 1 warning and 0 errors!
 and the various other info blah, blah, blah

I propose that --show-info no be added to these tests, since they are nothing about this Info: output... None of that output contributes to the test being conducted... just added to _onetesta.bat...

And remove the impossible for diff option --gnu-emacs yes - that leads to an automatic difference between Windows and unix/Mac! Not sure how it got in _onetesta.bat in the first place... did I do that?

Then the diff becomes manageable... some 122 differences, in 119 tests...

A few need to be examined, to really question some of the warning differences... but most look good for an update of the base...

Will study more about any questionable differences...

Feedback, and help much appreciated... thanks...

README/RUNTESTS.md

As can be seen the RUNTESTS.md could do with some updates, especially for running in unix...

And specifically the run-tests.sh needs to be greatly enhanced to run all test types, with different inputs, outputs, etc, etc...

In the equivalent Windows alltest.bat it supports commands, namely [-t path/to/tidy.exe] [-o output_directory/] [-c case_set_name], and even has a --help...

This issue was suggested in Issue 718

My unix shell scripting skills are just ok, but really seek the help of a person with more experience to assist here... thanks...

Being able to test other than the 'testbase' set in unix!

In Windows, with the addition of the handling of PROCESS_CLI in tools-cmd/_environmnet.bat, it is possible to set the following parameters for the tests -

 alltest [-t [path/to/]tidy[.exe]] [-o output_directory] [-c case_set_name]

Accordingly, to further facilitate re-testing of the specials, added a _manifest.txt, and added a special-expects, with the expected results. And the tests ran well in Windows... and passed...

But it seems that CLI has not yet been added to the tools-sh, so it does not seem possible to run the specials set in unix, without directly modifying the scripts, or creating new scripts, but maybe I missed something...

This would also apply to doing some re-testing of the unknown folder... and say creating a new folder, to pre-test a test, before adding it to the testbase...

Is this CLI capability going to be added to tools-sh? Or is there another way?

Allow tests with `alpha` by t1.bat, like 119a

Suggests a small patch, that also adds to the help output when using this single test script...

diff --git a/tools-cmd/_environment.bat b/tools-cmd/_environment.bat
index 220fb7d..bb83be1 100644
--- a/tools-cmd/_environment.bat
+++ b/tools-cmd/_environment.bat
@@ -483,8 +483,8 @@ GOTO:EOF
   echo.
   echo This Isn't Tidy Error
   echo The file you specified doesn't appear to be a valid Tidy. Specifically
-  echo an error was returned when trying to check its version. Check the file
-  echo %1.
+  echo an error was returned when trying to check its version. Check the file '%1',
+  echo or '%TY_TIDY_PATH% -v' failed. Use -t to set tidy.
   echo.
   rem # Let the calling script decide whether or not to abort.
   set /a TY_ERRORS=TY_ERRORS+1
diff --git a/tools-cmd/t1.bat b/tools-cmd/t1.bat
index e9f5c52..f47b56c 100755
--- a/tools-cmd/t1.bat
+++ b/tools-cmd/t1.bat
@@ -20,12 +20,12 @@
 @REM ------------------------------------------------
 @if "%~1x" == "x" goto :HELP
 @if "%~2x" == "x" goto :HELP
-@echo %~1| findstr /r ^[1-9][0-9]*$ > nul
+@echo %~1| findstr /r ^[1-9][0-9].*$ > nul
 @if ERRORLEVEL 1 goto :HELP
 @echo %~2| findstr /r ^[0-2]$ > nul
 @if ERRORLEVEL 1 goto :HELP
 
-
+@REM echo Got args '%1' '%2' '%3' '%4' ...
 @REM ------------------------------------------------
 @REM  Handle the CLI, and setup and test the 
 @REM  environment.
@@ -39,7 +39,6 @@
 )
 @if NOT "%TY_WANTS_HELP%" == "" goto :HELP
 
-
 @REM ------------------------------------------------
 @REM  Setup our file names, and additional checks.
 @REM ------------------------------------------------

Then is all worked like a charm... after a few more tests, will push this, unless anyone sees a problem... thanks...

Update to 5.2.0 for clean RT run in all OSES

A number of tests require an update to pass with release 5.2.0

Read Issue #390...

These tests output html better indented, so need updating...

testbase: 427820, 504206, 505770
xml: 1510101!

Versioning of this Tests Repo

In general my idea would be this repo follows the branch/tag names of html-tidy5...

That is -

The master branch be the last release
The next branch is the last compatible version
Special branches, can be used for testing development

That means, at this moment, the master stays at 5.4.0... the last official release...

As soon as PR #514 is triggered, either a next branch is created from custom_tags, or indeed custom_tags is renamed to next, and this becomes the default clone.

So let's say we end up with a merged 5.5.7, then that is the version tagged to next... Now while subsequent version bumps, 5.5.8, 5.5.9, may not mean any change in the tests expectes... you get a clean diff... ie no diff... tests will stay at 5.5.7... that is the last compatible version.

The idea here is that a developer can build a new version of Tidy 5.5.nn[.rc], and can run a full regression test, as part of the final, successful build is done... it should only take seconds, and be part of every build...

Make it usable, and easy to test, lots of things, quickly, fun,...

Look forward to feedback... thanks...

Need to update test 678268 for tidy v.5.1.44

Per commit, a fix for issue 377, tidy will no longer output a table missing summary warning if the document is seen as XHTML5.

Need to adjust the cases\testbase-expects\case-678268.txt accordingly...

Some legacy doc output errors in regression tests

Recently ran into regression testing problems, in that my fix for Issue 461/531 caused quite a number of diffs, and some exit value changes... Arrgh!!!

On investigating this, it seems some errors have crept into the regression tests!

Now most of the regression tests are still legacy documents. Take say 1086083. It is HTML 4.01, and testcase contains the code <li>Huh?</li>, and tidy is supposed to fix this by adding a <ul>, which, if we back up the tidy-tests repo that is what we will find in the original master branch, out_1086083.html, but not in the current next branch expects/case-1086083.html!

In other words, this expects/case-1086083.html now fails validation with the error Line 16, Column 6: document type does not allow element "LI" here; missing one of "UL", "OL" start-tag! While the original out_1086083.html PASSES!!

That is we now expect, allow some html5 tolerance in a legacy document, and this is an error! And there seems to be several other similar cases... yet to be explored fully...

So far the differences and or exit regression errors seem to be in the following tests -

1086083.html 1086083.txt
1316307b.html 1316307b.txt
1331849.html 1331849.txt
1423252.html 1423252.txt
435917.html 435917.txt 
487204.html 487204.txt 
836462a.html 836462a.txt
836462b.html 836462b.txt
836462c.html 836462c.txt

Any assistance investigating and correcting these would be most appreciated... thanks...

Problem with tests 427664 and 427672 on certain OSes!

There is a difference in output message text using ARM / Raspberry Pi 2, RPI, first reported as Issue 258, Issue 266, Issue 269, by @vielmetti, back in Sep 13, 2015. Thank you for that report. Maybe others...

One or both may also be a problem on the MAC OS X, reported by @balthisar. To be verified.

First to try to examine the exact reason for these two test...

Test 427664 - now https://sourceforge.net/p/tidy/bugs/4/
#4 Missing attr values cause NULL segfault
Created: 2001-05-27 Creator: Terry Teague

Test 427672 - now https://sourceforge.net/p/tidy/bugs/10/
#10 Non-std attrs w/multibyte names segfault
Created: 2001-05-27 Creator: Terry Teague

Both these test inputs existed in SF CVS source, without a special config file. Both reported a segfault at that time! And both input files seems exactly the same as in this github tests repo. And in a binary compare, namely in_427664.html == case-427664.html and in_427672.html == case-427672.html! So no change has been made in the inputs. Of course, the SF CVS has no `testbase-expects' output to compare with...

However, re-running tidy04aug00, even adding the suggested -utf8 option, on each file, does NOT produce a segfault, as far as I can see...

But running tidy04aug00, for which I do not have the source, on both inputs, using DrMemory, does show it has -

 Error #1: UNADDRESSABLE ACCESS beyond top of stack: reading 4 byte(s)

But this is not exacly a segfault due to a NULL pointer! And repeating the tests using tidy2000, for which we do have the source, does not show any problems...

And while, for some reason I can not yet run DrMemory using the current tidy 5.1.45++, it also appears to not have a segfault! And need to also try in linux using valgrind, ASAN, testing...

But, for sure, that segfault seems to have been solved, the reason for the two tests.

So there remains this mystery of the character encoding differences in the message output in certain OS environments, which still need to be solved.

What is in the `testbase` input, and `testbase-expects`?

Essentially both input file have <body name="xx">. A comment in the files says the name is supposed to be 2 bytes hex c3 87, but it is not! Now maybe this is a corruption from a long way back, but even in SF CVS source the name is a 4 byte sequence of C3 31 2F 32.

Thus, in their present state, both inputs do not verify as valid utf-8 text. They would if changed back to the c3 87 given in the comment, and yet to test if that changes the situation.

In parsing this document, tidy finds this 4 byte sequence is not a valid attribute name, and outputs a warning. Now it is the value output for that name in the warning message differs in RPI OS. And maybe in OS X, still to be verified.

Tidy in Windows, and Ubuntu linux consistently outputs a 9 byte sequence EF BF BF EF BF BF 31 2F 32, and this is what is in testbase-expects, so the compare is exact. No problem.

While Tidy in RPI outputs, in testbase-results a 7 byte sequence c3 83 c2 83 31 2f 32, so the diff fails. A problem.

What can we do?

Reduce the attribute name to just 1/2, which is still invalid, so keeps the tests meaning.
Change the file back to valid utf-8 c3 87, change the expected accordingly - to be tested.
Maybe a fix in Tidy code could force RPI to use EF BF BF output.
If also a problem in OS X, maybe exclude the 2 tests.
If only in RPI, be ready to explain that this difference exists in these 2 tests.
Other choices?

I seek ideas and comments on what would be best?

As previously expressed, I think it is important that we have a consistent set of tests across all OSes, and to not have to try and explain a difference every time someone stumbles across it.

Help Needed

Test problems with a ~/.tidyrc loaded

Both the tools cmd and sh specifically ensure a pre-command line config file load from the ENV variable HTML_TIDY, they do not check that a ~/.tidyrc exists...

Such a config file could pre-set any config the regression tests want to set up... and yield different results...

If detected, the test script should abort with a warning like - default config file ~/.tidyrc found. delete, rename, move before continuing... ....

The issue 752 exposed this failing... thanks...

Look forward to feedback, patches, PR's to fix this short falling in the current test scripts...

Update of xml expected for test 1510101

Having just run the xml tests, the first time for quite a while, note there has been a change in the output for test xml/case-1510101.[xml/conf]...

Looking back, this test was added due to https://sourceforge.net/p/tidy/bugs/777/, which fixed a bug, and it seems this bug remains fixed, the purpose of the test...

It seems the only change in the pretty print output are just indent changes... this change could have been around for quite a time... did not exactly check the commit that causes it...

It is certainly unrelated to the current xml PR #595...

Thus, baring any objections, propose to update the xml-expects\case-1510101.html to current output...

Comments welcome... thanks...

Need test for surrogate pairs

@balthisar I know you added a test case 2017030501 test, but it seems missing now...

Anyway, have used that to create a test case 483, in line with - htacg/tidy-html5#483 - which I will push shortly... thanks...

Update tests 1642186b and 431716 to match tidy 5.5.8.I119

Actions:

1642186b - Is a case using --show-info no, so just need expected updated, for removed debug message.
431716 - warning about split not existing. Again seems case of update expected?

htacg / tidy-html5-tests Goto Github PK

tidy-html5-tests's People

Contributors

Watchers

Forkers

tidy-html5-tests's Issues

What is in the testbase input, and testbase-expects?

What can we do?

Recommend Projects

Recommend Topics

Recommend Org

Jobs

What is in the `testbase` input, and `testbase-expects`?