GithubHelp home page GithubHelp logo

zeed-dom's Introduction

🌱 zeed-dom

  • Lightweight virtual / offline DOM (Document Object Model)
  • Great to use in node or exporting to plain strings
  • Written in Typescript
  • Generates HTML and XML
  • Parses HTML
  • Supports some CSS selectors and queries
  • JSX compatible
  • Easy content manipulation (e.g. through element.handle helper)
  • Pretty print HTML (tidyDOM)

Does not aim for completeness!

Get started

npm i zeed-dom

Related projects

  • zeed - Foundation library
  • zerva - Event driven server
  • hostic - Static site generator

Used by TipTap in its html-package.

Utils

Manipulation

Drop in HTML and query and change it. Returns HTML again. Nice for post processing.

const newHTML = handleHTML(html, (document) => {
  const img = document.querySelector('.img-wrapper img')
  if (img)
    img.setAttribute('title', img.getAttribute('src'))
})

Serialization

Take any HTML node or document an serialize it so some other format:

  • serializePlaintext(node): Readable and searchable plain text
  • serializeMarkdown(node): Simple Markdown
  • serializeSafeHTML(node) or safeHTML(htmlString): Just allow some basic tags and attributes

Example

A simple example without JSX:

import { h, xml } from 'zeed-dom'

const dom = h(
  'ol',
  {
    class: 'projects',
  },
  [
    h('li', null, 'zeed ', h('img', { src: 'logo.png' })),
    h('li', null, 'zeed-dom'),
  ]
)

console.log(dom.render())
// Output: <ol class="projects"><li>zeed <img src="logo.png"></li><li>zeed-dom</li></ol>

console.log(dom.render(xml))
// Output: <ol class="projects"><li>zeed <img src="logo.png" /></li><li>zeed-dom</li></ol>

And this one with JSX:

import { h } from "zeed-dom"

let dom = (
  <ol className="projects">
    <li>zeed</li>
    <li>zeed-dom</li>
  </ol>
)

let projects = dom
  .querySelectorAll("li")
  .map((e) => e.textContent)
  .join(", ")

console.log(projects)
// Output: zeed, zeed-dom

dom.handle("li", (e) => {
  if (!e.textContent.endsWith("-dom")) {
    e.remove()
  } else {
    e.innerHTML = "<b>zeed-dom</b> - great DOM helper for static content"
  }
})

console.log(dom.render())
// Output: <ol class="projects"><li><b>zeed-dom</b> - great DOM helper for static content</li></ol>

In the second example you can see the special manipulation helper .handle(selector, fn) in action. You can also see HTML parsing works seamlessly. You can also parse directly:

import { tidyDOM, vdom } from 'zeed-dom'

const dom = vdom('<div>Hello World</div>')
tidyDOM(dom)
console.log(dom.render())
// Output is pretty printed like: <div>
//   Hello World
// </div>

These examples are available at /example.

JSX

Usually JSX is optimized for React i.e. it expects React.creatElement to exist and be the factory for generating the nodes. You can of course get the same effect here if you set up a helper like this:

import { html } from 'zeed-dom'

const React = {
  createElement: html,
}

But more common is the use of h as the factory function. Here is how you can set up this behavior for various environments:

In case of error messages on JSX in your Typescript project, try to add npm install -D @types/react.

TypeScript

In tsconfig.json:

{
  "compilerOptions": {
    "jsx": "react",
    "jsxFactory": "h"
  }
}

To avoid type checking issues you should add this to you shims.d.ts:

// https://www.typescriptlang.org/docs/handbook/jsx.html#intrinsic-elements
declare namespace JSX {
  interface IntrinsicElements {
    [elemName: string]: any
  }
}

In options:

{
  jsxFactory: 'h'
}

Or alternatively as command line option: --jsx-factory=h

Browser DOM

The JSX factory can also be used to directly create HTML DOM nodes in the browser. Just create the h function and let it use the browser's document object:

const { hFactory } = require('zeed-dom')

export const h = hFactory({ document })

Performance

The parser isn't doing too bad, according to the benchmarks of htmlparser-benchmark ;)

tl                 : 1.02699 ms/file ± 0.679139
htmlparser2        : 1.98505 ms/file ± 2.94434
node-html-parser   : 2.24176 ms/file ± 1.52112
neutron-html5parser: 2.36648 ms/file ± 1.38879
html5parser        : 2.39891 ms/file ± 2.83056
htmlparser2-dom    : 2.57523 ms/file ± 3.35587
html-dom-parser    : 2.84910 ms/file ± 3.61615
libxmljs           : 3.81665 ms/file ± 2.79295
zeed-dom           : 5.05130 ms/file ± 3.57184
htmljs-parser      : 5.58557 ms/file ± 6.47597
parse5             : 9.07862 ms/file ± 6.50856
htmlparser         : 21.2274 ms/file ± 150.951
html-parser        : 30.9104 ms/file ± 24.3930
saxes              : 49.5906 ms/file ± 141.194
html5              : 114.771 ms/file ± 148.345

Misc

  • To set namespace colons in JSX use double underscore i.e. <xhtml__link /> becomes <xhtml:link />
  • To allow CDATA use the helper function e.g. <div>{ CDATA(yourRawData) }</div>
  • style attributes can handle objects e.g. <span style={{backgroundColor: 'red'}} /> becomes <span style="background-color: red" />

zeed-dom's People

Contributors

holtwick avatar estrattonbailey avatar zensh avatar peti446 avatar

Stargazers

Bae Junehyeon avatar icode avatar Jongyoon Jeong avatar Fergus Leahy avatar Chris Sandvik avatar Elton Fernandes avatar Marc MacLeod avatar  avatar  avatar Alexandru Rosianu avatar Gârleanu Alexandru-Ștefan avatar Soumyajit Pathak avatar  avatar JasKang avatar fantasticit avatar Aarmn the limitless avatar Roman avatar Nhân(Kevin) Lý avatar KarlitosD avatar xiawei avatar yaokailun avatar doco avatar 云峰 avatar Henrique Mitsuo avatar  avatar Tyler Forest-Hauser avatar image72 avatar  avatar Philipp Kühn avatar

Watchers

 avatar  avatar

zeed-dom's Issues

Descendant CSS selector type not supported

Hi! Wanted to cross-reference a bug I'm investigating in Tiptap: ueberdosis/tiptap#4089

Currently when parsing the following HTML:

<div data-youtube-video>
        <iframe src="https://www.youtube.com/watch?v=cqHqLQgVCgY"></iframe>
</div>

It results in:

Unknown CSS selector type descendant div[data-youtube-video] iframe [
  { type: 'tag', name: 'div', namespace: null },
  {
    type: 'attribute',
    name: 'data-youtube-video',
    action: 'exists',
    value: '',
    namespace: null,
    ignoreCase: null
  },
  { type: 'descendant' },
  { type: 'tag', name: 'iframe', namespace: null }
]

Looks like descendant support is in the code but commented out, either:

  1. Here:
    // } else if (type === 'descendant') {
  2. or here:
    // else if (type === 'descendant') {

I'll keep digging and hopefully open a PR if I can manage to fix it.

The dist contains nullish coalescing operator.

The dist contains nullish coalescing operator:

parseAttributes(tagName, input) {
    const attrs = {};
    input.replace(
      this.attrRe,
      (attr, name, c2, value, c4, valueInQuote, c6, valueInSingleQuote) => {
        attrs[name] = valueInSingleQuote ?? valueInQuote ?? value ?? true;
      }
    );
    return attrs;
  }

Older browsers cannot run this code correctly.

Named export decode not found in he commonjs module

Hi, is there a way to resolve the following?

Zeed-dom 0.9.23 (as a tiptap dependency)
WSL on Windows 10
Node 16.14

file:///home/gustojs/playby-nuxt3/node_modules/zeed-dom/dist/index.js:22
import { decode } from "he";
         ^^^^^^
SyntaxError: Named export 'decode' not found. The requested module 'he' is a CommonJS module, which may not support all module.exports as named exports.
CommonJS modules can always be imported via the default export, for example using:

import pkg from 'he';
const { decode } = pkg;

Thanks!

`he` is embedded instead of part of the dependencies

Hi there!

I stumbled upon this package due to tip-tap using it for its HTML parsing. While it works great for other use cases in my app, i noticed that the he package has been embedded as per 2079834.

Initially requested by #3 the he package already was a bit outdated when it was added. zeed-dom is the only package using he (even when it's embedded) while all other packages are using the entities package instead (lots more downloads and active development too).

A quick glance at the benchmarks on entities' own README shows he being quite slow compared to them, and according to bundlephobia, he is also a bit larger in size.

Is there a reason for going with a) an embedded version of the package and b) not with a different one instead?

Thanks in advance!

Incorrect and slow parsing of base64 encoded CSS property values

When running generateJSON from @tiptap/html we noticed huge performance problem on docs containing with inline styles on elements containing base64 encoded images.

The problem first pointed to prosemirror-model, but after change there the problem moved to zeed-dom. By looking at the code it seems to be almost identical (incorrect regex parsing of styles declaration). Prosemirror issue

Changes introduces in ProseMirror/prosemirror-model@899a98e exposed also other problems in style getter. It should expose object with CSSStyleDeclaration interface, but it's plain object which is now incompatible with prosemirror-model.

Due to dependencies between Tiptap, prosemirror and zeed-dom fixing this seems quite important for future releases compatibility.

HTML entities are not being properly decoded

Seems like encoded html entities are not handled properly.
See this example: https://runkit.com/embed/9n1c04lodkqm
Providing the following html:

<p>Let&#x27;s go</p>

I expect the following output when calling parseHTML(html).textContent:

Let's go

However, zeed-dom outputs the entities undecoded:

Let&#x27;s go

I can see there is a small list of encoded/decoded entites in encoding.ts, but all other entities are being ignored.
Seems like there used to be a check to optionally load he if available. I understand this could cause issues, however the current behavior is not ideal either. Can this be improved in any way?

Thanks!

zeed dom is collecting all JSX elements it ever used in `USED_JSX`

While digging into this issue with tiptap, my colleague @jcarvalho and I found the global variable USED_JSX to be the issue. Even serializing a simple document to HTML in a loop would quickly become slower and slower until it takes over a second to finish.

Memory was not an issue in our tests since the contents of USED_JSX where just the same elements over and over again so node itself could easily optimize that.

tiptap issue #2148 lists an example on how to make this happen. Would be great if you could give an estimate on how easily this might be fixable. I tried switching it from a global var to a parameter yesterday but there are so many points where I needed to pass the parameter to, and some of them were getters or setters, so that wasn't achievable.

Broken example URL in README

The example URL in the readme leads to a 404:

These examples are available at [github.com/holtwick/zeed-dom-example](https://github.com/holtwick/zeed-dom-example).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.