GithubHelp home page GithubHelp logo

hast's Introduction

HAST

Hypertext Abstract Syntax Tree format.


HAST discloses HTML as an abstract syntax tree. Abstract means not all information is stored in this tree and an exact replica of the original document cannot be re-created. Syntax Tree means syntax is present in the tree, thus an exact syntactic document can be re-created.

The reason for introducing a new “virtual” DOM is primarily:

  • The DOM is very heavy to implement outside of the browser; a lean, stripped down virtual DOM can be used everywhere;
  • Most virtual DOMs do not focus on ease of use in transformations;
  • Other virtual DOMs cannot represent the syntax of HTML in its entirety, think comments, document types, and character data;
  • Neither HTML nor virtual DOMs focus on positional information.

HAST is a subset of Unist, and implemented by rehype.

This document describes version 2.0.0 of HAST. Changelog ».

Table of Contents

List of Utilities

See the List of Unist Utilities for projects which work with HAST nodes too.

Related HTML Utilities

AST

Root

Root (Parent) houses all nodes.

interface Root <: Parent {
  type: "root";
}

Element

Element (Parent) represents an HTML Element. For example, a div. HAST Elements corresponds to the HTML Element interface.

interface Element <: Parent {
  type: "element";
  tagName: string;
  properties: Properties;
}

For example, the following HTML:

<a href="http://alpha.com" class="bravo" download></a>

Yields:

{
  "type": "element",
  "tagName": "a",
  "properties": {
    "href": "http://alpha.com",
    "id": "bravo",
    "className": ["bravo"],
    "download": true
  },
  "children": []
}

Properties

A dictionary of property names to property values. Most virtual DOMs require a disambiguation between attributes and properties. HAST does not and defers this to compilers.

interface Properties {}
Property names

Property names are keys on properties objects and reflect HTML attribute names. Often, they have the same value as the corresponding HTML attribute (for example, href is a property name reflecting the href attribute name). If the HTML attribute name contains one or more dashes, the HAST property name must be camel-cased (for example, ariaLabel is a property reflecting the aria-label attribute). If the HTML attribute is a reserved ECMAScript keyword, a common alternative must be used. This is the case for class, which uses className in HAST (and DOM), and for, which uses htmlFor.

DOM uses other prefixes and suffixes too, for example, relList for HTML rel attributes. This does not occur in HAST.

When possible, HAST properties must be camel-cased if the HTML property name originates from multiple words. For example, the minlength HTML attribute is cased as minLength, and typemustmatch as typeMustMatch.

Property values

Property values should reflect the data type determined by their property name. For example, the following HTML <div hidden></div> contains a hidden (boolean) attribute, which is reflected as a hidden property name set to true (boolean) as value in HAST, and <input minlength="5">, which contains a minlength (valid non-negative integer) attribute, is reflected as a property minLength set to 5 (number) in HAST.

In JSON, the property value null must be treated as if the property was not included. In JavaScript, both null and undefined must be similarly ignored.

The DOM is strict in reflecting those properties, and HAST is not, where the DOM treats <div hidden=no></div> as having a true (boolean) value for the hidden attribute, and <img width="yes"> as having a 0 (number) value for the width attribute, these should be reflected as 'no' and 'yes', respectively, in HAST.

The reason for this is to allow plug-ins and utilities to inspect these values.

The DOM also specifies comma- and space-separated lists attribute values. In HAST, these should be treated as ordered lists. For example, <div class="alpha bravo"></div> is represented as ['alpha', 'bravo'].

There’s no special format for style.

Doctype

Doctype (Node) defines the type of the document.

interface Doctype <: Node {
  type: "doctype";
  name: string;
  public: string?;
  system: string?;
}

For example, the following HTML:

<!DOCTYPE html>

Yields:

{
  "type": "doctype",
  "name": "html",
  "public": null,
  "system": null
}

Comment

Comment (Text) represents embedded information.

interface Comment <: Text {
  type: "comment";
}

For example, the following HTML:

<!--Charlie-->

Yields:

{
  "type": "comment",
  "value": "Charlie"
}

Text

TextNode (Text) represents everything that is text. Note that its type property is text, but it is different from the abstract Unist interface Text.

interface TextNode <: Text {
  type: "text";
}

For example, the following HTML:

<span>Foxtrot</span>

Yields:

{
  "type": "element",
  "tagName": "span",
  "properties": {},
  "children": [{
    "type": "text",
    "value": "Foxtrot"
  }]
}

Related

hast's People

Contributors

wooorm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.