danthedeckie / html5validate Goto Github PK
View Code? Open in Web Editor NEWPure Python (currently HTML5lib based) HTML5 validator for use in django tests, etc. Works offline, does not use external services.
License: Other
Pure Python (currently HTML5lib based) HTML5 validator for use in django tests, etc. Works offline, does not use external services.
License: Other
Hi! Thanks for a great little library! I've used it to write a crawler that goes through my whole site on localhost and validates all HTML. Great when there's lots of user generated content that I want to make sure doesn't break my site.
Anyways. I'm not sure how hard this is, but I would really like better error messages. Here's an example from my script:
Checking: http://localhost:8000/js/lazy-loading-asyncronous-javascript/ --> Traceback (most recent call last):
File "commands/crawl.py", line 88, in <module>
main()
File "commands/crawl.py", line 77, in main
print(("VALID" if html5validate.validate(html) is None else "INVALID"))
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5validate.py", line 36, in validate
dom = PARSER.parse(text)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 289, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 134, in _parse
self.mainLoop()
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 241, in mainLoop
new_token = phase.processStartTag(new_token)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 472, in processStartTag
return self.startTagHandler[token["name"]](token)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 1154, in startTagA
self.parser.parseError("unexpected-start-tag-implies-end-tag",
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 326, in parseError
raise ParseError(E[errorcode] % datavars)
html5lib.html5parser.ParseError: Unexpected start tag (a) implies end tag (a).
The checked HTML files has hundreds of links, and just finding which one is missing a </a>
is a lot of unnecessary work.
Another:
Checking: http://localhost:8000/css/ie7-hover-bug-z-index-ignored-and-other-properties/ --> Traceback (most recent call last):
File "commands/crawl.py", line 88, in <module>
main()
File "commands/crawl.py", line 77, in main
print(("VALID" if html5validate.validate(html) is None else "INVALID"))
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5validate.py", line 36, in validate
dom = PARSER.parse(text)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 289, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 134, in _parse
self.mainLoop()
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 217, in mainLoop
self.parseError(new_token["data"], new_token.get("datavars", {}))
File "/Users/EmilStenstrom/.pyenv/versions/friendlybit/lib/python3.8/site-packages/html5lib/html5parser.py", line 326, in parseError
raise ParseError(E[errorcode] % datavars)
html5lib.html5parser.ParseError: Unexpected character after attribute value.
Doesn't give any context over where in the document it is, which again takes a lot of work when the document is large.
Any chance the error messages could be made mode descriptive?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.