Classes for working with HTML, in particular to make it easier to test and verify HTML in unit tests, which was the original purpose.
The library essentially provides a more fluent style interface over the JSOUP html parser and also integrates nicely with Hamcrest, which makes tests easier to write.
- Java 7
- Apache Maven
<repositories>
<repository>
<id>canfactory</id>
<url>http://stage.canfactory.com/artifactory/libs-release</url>
</repository>
</repositories>
<dependency>
<groupId>com.canfactory</groupId>
<artifactId>canfactory-html</artifactId>
<version>0.1.0</version>
</dependency>
mvn clean install
This library is used to support unit tests in production code and can be considered relatively stable. New features subject to change are marked with the @Beta annotation
An HTML element has a single root node but can have any number of children, for example:
<ul>
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>
HtmlElement.Factory.fromStream(InputStream stream)
HtmlElement.Factory.fromStream(String html)
HtmlElement.Factory.fromStream(Element element)
An HTML fragment is a block of HTML that has no single root node, for example:
<p>This is paragraph 1</p>
<p>This is paragraph 2</p>
<ul>
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>
HtmlFragment.Factory.fromStream(InputStream stream)
HtmlFragment.Factory.fromString(String html)
HtmlFragment.Factory.fromElements(Elements elements)
Having either an element or a fragment you can find other elements and fragments for use in your assertions. There are other methods available, but these are the most common used.
Find all the elements matching the given CSS selector. N.B CSS selectors use the JSOUP syntax. This may differ slightly from the formal CSS specification.
html.all(String cssSelector)
Find the first element matching the given CSS selector.
html.first(String cssSelector)
Find the last element matching the given CSS selector.
html.last(String cssSelector)
Find the nth element matching the given CSS selector.
html.nth(int index, String cssSelector)
Nth is considered in CSS to be 1 based indexing, not zero
The API is designed so that methods can be chained. Methods should not return null. This is managed by providing implementations that handle the empty collection state, e.g. EmptyHtmlFragment. Its important to use the Factory methods on HtmlElement and HtlmFactory as these will take care of creating the correct instance.
For those who care, the API is designed to be vaguely Monadic, like the new stream (collections) API in Java 8. Whats is a Monad? The first lines of the Wikipedia article sum up the behaviour: "In functional programming, a monad is a structure that represents computations defined as sequences of steps. A type with a monad structure defines what it means to chain operations, or nest functions of that type together. This allows the programmer to build pipelines that process data in steps, in which each action is decorated with additional processing rules provided by the monad."
One benefit, to lift a phrase from a blog article is "a function thus bound can be guaranteed to be working with an instance and not a null.". So we don't have to worry about null, which is nice. In this API this is managed by implementations of HtmlElement and HtmlFragment that handle the empty collection state.
The available matchers from the project are currently as follows, but they may be combined with existing ones in hamcrest, most notably; not, allOf, anyOf. They are meant to be used in a fluid readable manner which may become more obvious in some of the examples afterwards.
Any of the found elements pass the result of the given matcher.
any(Matcher<HtmlElement> matcher)
The number of found elements equals a given value.
count(int value)
Each of the found elements pass the result of the given matcher.
each(Matcher<HtmlElement> matcher)
This is overloaded to provide the following functionality:
- The HTML element has an attribute present by name, ignoring the value
- The HTML element has an attribute that matches exactly by name and value
- The HTML element has all of the given attributes
hasAttribute(String name)
hasAttribute(String name, String value)
hasAttribute(Attribute attribute)
hasAttributes(Attribute... attributes)
hasAttributes(String... nameValuePairs)
The HTML element has the given class or classes.
hasClass(String classname)
hasClasses(String... classnames)
The HTML element has the given ID.
hasId(String id)
The HTML element contains the given text.
hasText(String text)
None of the found elements pass the result of the given matcher.
none(Matcher<HtmlElement> matcher)
Only one of the found elements pass the result of the given matcher.
one(Matcher<HtmlElement> matcher)
Assert that each li tag found has the text "list item".
assertThat(html.all("li"), each(hasText("list item")))
Assert that no li tag in found has the text "hello".
assertThat(html.all("li"), none(hasText("list item")))
Assert that only one h1 has the text "Now Showing".
assertThat(html.all("h1"), one(hasText("Now Showing")))
Assert that all div tag with the class "component" have an attribute "data-id".
assertThat(html.all("div.component"), each(hasAttribute("data-id")))
Assert that there is only one form present.
assertThat(html.all("form"), count(1))
Assert that the 2nd li tag has the class "second".
assertThat(html.nth(2, "li"), hasClass("second"))