Hypertext Abstract Syntax Tree format.
HAST discloses HTML as an abstract syntax tree. Abstract means not all information is stored in this tree and an exact replica of the original document cannot be re-created. Syntax Tree means syntax is present in the tree, thus an exact syntactic document can be re-created.
The reason for introducing a new “virtual” DOM is primarily:
- The DOM is very heavy to implement outside of the browser; a lean, stripped down virtual DOM can be used everywhere;
- Most virtual DOMs do not focus on ease of use in transformations;
- Other virtual DOMs cannot represent the syntax of HTML in its entirety, think comments, document types, and character data;
- Neither HTML nor virtual DOMs focus on positional information.
HAST is a subset of Unist, and implemented by rehype.
This document describes version 2.0.0 of HAST. Changelog ».
Table of Contents
List of Utilities
hastscript
— Hyperscript compatible DSL for creating nodes;hast-to-hyperscript
— Convert a Node to React, Virtual DOM, Hyperscript, and more;hast-util-assert
— Assert HAST nodes;hast-util-embedded
— Check ifnode
is embedded content;hast-util-find-and-replace
— Find and replace text;hast-util-from-parse5
— Transform Parse5’s AST to HAST;hast-util-from-string
— Set the plain-text value of a node;hast-util-has-property
— Check if a node has a property;hast-util-heading
— Check if a node is heading content;hast-util-interactive
— Check if a node is interactive;hast-util-is-body-ok-link
— Check if alink
element is “Body OK”;hast-util-is-conditional-comment
— Check ifnode
is a conditional comment;hast-util-is-css-link
— Check ifnode
is a CSSlink
;hast-util-is-css-style
— Check ifnode
is a CSSstyle
;hast-util-is-element
— Check ifnode
is a (certain) element;hast-util-is-event-handler
— Check ifproperty
is an event handler;hast-util-is-javascript
— Check ifnode
is a JavaScriptscript
;hast-util-labelable
— Check ifnode
is labelable;hast-util-menu-state
— Check the state of a menu element;hast-util-parse-selector
— Create an element from a simple CSS selector;hast-util-phrasing
— Check if a node is phrasing content;hast-util-raw
— Reparse a HAST tree;hast-util-sanitize
— Sanitise nodes;hast-util-script-supporting
— Check ifnode
is script-supporting content;hast-util-sectioning
— Check ifnode
is sectioning content;hast-util-to-html
— Stringify nodes to HTML;hast-util-to-nlcst
— Transform HAST to NLCST;hast-util-to-parse5
— Transform HAST to Parse5’s AST;hast-util-to-string
— Get the plain-text value of a node;hast-util-transparent
— Check ifnode
is transparent content;hast-util-whitespace
— Check ifnode
is inter-element whitespace;
See the List of Unist Utilities for projects which work with HAST nodes too.
Related HTML Utilities
a-rel
— List of link types forrel
ona
/area
;aria-attributes
— List of ARIA attributes;collapse-white-space
— Replace multiple white-space characters with a single space;comma-separated-tokens
— Parse/stringify comma-separated tokens;html-tag-names
— List of HTML tag-names;html-dangerous-encodings
— List of dangerous HTML character encoding labels;html-encodings
— List of HTML character encoding labels;html-element-attributes
— Map of HTML attributes;html-void-elements
— List of void HTML tag-names;link-rel
— List of link types forrel
onlink
;mathml-tag-names
— List of MathML tag-names;meta-name
— List of values forname
onmeta
;property-information
— Information on HTML properties;space-separated-tokens
— Parse/stringify space-separated tokens;svg-tag-names
— List of SVG tag-names;web-namespaces
— Map of web namespaces.
AST
Root
Root (Parent) houses all nodes.
interface Root <: Parent {
type: "root";
}
Element
Element (Parent) represents an HTML Element. For example,
a div
. HAST Elements corresponds to the HTML Element
interface.
interface Element <: Parent {
type: "element";
tagName: string;
properties: Properties;
}
For example, the following HTML:
<a href="http://alpha.com" class="bravo" download></a>
Yields:
{
"type": "element",
"tagName": "a",
"properties": {
"href": "http://alpha.com",
"id": "bravo",
"className": ["bravo"],
"download": true
},
"children": []
}
Properties
A dictionary of property names to property values. Most virtual DOMs
require a disambiguation between attributes
and properties
. HAST
does not and defers this to compilers.
interface Properties {}
Property names
Property names are keys on properties
objects and
reflect HTML attribute names. Often, they have the same value as
the corresponding HTML attribute (for example, href
is a property
name reflecting the href
attribute name).
If the HTML attribute name contains one or more dashes, the HAST
property name must be camel-cased (for example, ariaLabel
is a
property reflecting the aria-label
attribute).
If the HTML attribute is a reserved ECMAScript keyword, a common
alternative must be used. This is the case for class
, which uses
className
in HAST (and DOM), and for
, which uses htmlFor
.
DOM uses other prefixes and suffixes too, for example,
relList
for HTMLrel
attributes. This does not occur in HAST.
When possible, HAST properties must be camel-cased if the HTML property
name originates from multiple words. For example, the minlength
HTML
attribute is cased as minLength
, and typemustmatch
as typeMustMatch
.
Property values
Property values should reflect the data type determined by their
property name. For example, the following HTML <div hidden></div>
contains a hidden
(boolean) attribute, which is reflected as a hidden
property name set to true
(boolean) as value in HAST, and
<input minlength="5">
, which contains a minlength
(valid
non-negative integer) attribute, is reflected as a property minLength
set to 5
(number) in HAST.
In JSON, the property value
null
must be treated as if the property was not included. In JavaScript, bothnull
andundefined
must be similarly ignored.
The DOM is strict in reflecting those properties, and HAST is not,
where the DOM treats <div hidden=no></div>
as having a true
(boolean) value for the hidden
attribute, and <img width="yes">
as having a 0
(number) value for the width
attribute, these should
be reflected as 'no'
and 'yes'
, respectively, in HAST.
The reason for this is to allow plug-ins and utilities to inspect these values.
The DOM also specifies comma- and space-separated lists attribute
values. In HAST, these should be treated as ordered lists.
For example, <div class="alpha bravo"></div>
is represented as
['alpha', 'bravo']
.
There’s no special format for
style
.
Doctype
Doctype (Node) defines the type of the document.
interface Doctype <: Node {
type: "doctype";
name: string;
public: string?;
system: string?;
}
For example, the following HTML:
<!DOCTYPE html>
Yields:
{
"type": "doctype",
"name": "html",
"public": null,
"system": null
}
Comment
Comment (Text) represents embedded information.
interface Comment <: Text {
type: "comment";
}
For example, the following HTML:
<!--Charlie-->
Yields:
{
"type": "comment",
"value": "Charlie"
}
Text
TextNode (Text) represents everything that is text.
Note that its type
property is text
, but it is different
from the abstract Unist interface Text.
interface TextNode <: Text {
type: "text";
}
For example, the following HTML:
<span>Foxtrot</span>
Yields:
{
"type": "element",
"tagName": "span",
"properties": {},
"children": [{
"type": "text",
"value": "Foxtrot"
}]
}