GithubHelp home page GithubHelp logo

izyumidev / html2md-rs Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 155 KB

HTML to Markdown Parser in Rust

Home Page: https://crates.io/crates/html2md-rs

License: MIT License

Rust 99.84% Just 0.16%
html html-to-markdown markdown rust

html2md-rs's Introduction

html2md-rs

Parses HTML and converts it to markdown.

Usage

use html2md_rs::to_md::from_html_to_md;

fn main() {
    let html = "<h1>Hello, World!</h1>";
    let md = from_html_to_md(html);
    assert_eq!(md, "# Hello, World!");
}

Markdown Convention

There are many markdown conventions/standards out there. This project references the CommonMark Spec.

Supported HTML tags

Check the supported HTML tags here. Unsupported HTML tags will be parsed as NodeType::Unknown(String).

License

This project is licensed under the MIT License - see the LICENSE file for details.

html2md-rs's People

Contributors

github-actions[bot] avatar izyuumi avatar yutatokoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

yutatokoi

html2md-rs's Issues

Relax `html2md-rs`?

I like your parser, it has very few dependencies and a solid design.
I am currently testing it for tp-note. The use case: in case the clipboard contains HTML, a filter is needed to convert the input into CommonMark compliant Markdown. The input may contain complete HTML documents or just snippets, thus my previous feature request.

Going through your Rustdoc, I discovered panics if .... . Fortunately you offer safe variants of panicking functions. To my knowledge, it is common practice that libraries must never panic. What is your use case?

Feature request:
Besides, my first tests indicate that your (safe) parser might be too strict for my use case. I probably would need a parser, which processes also imperfect/incorrect input. Even in cases where the input is not perfectly valid HTML5.

Once you publish 0.7 I test again how bad the copied HTML in the clipboard can be.

feat: support for br tag

There is currently no support for <br />, which causes the UnknownNodeType error to be returned from the parser.

Incorrect parsing: "Missing quotation mark at around index 0"

Valid HTML

<meta http-equiv="content-type" content="text/html; charset=utf-8"><span><a href="https://search.nixos.org/packages?channel=unstable&amp;from=0&amp;size=50&amp;sort=relevance&amp;type=packages&amp;query=tpnote">tpnote</a></span><div>Markup enhanced granular note-taking</div><ul><li>Name: <code class="package-name">tpnote</code></li><li>Version: <strong>1.23.10</strong></li><li><a href="https://blog.getreu.net/projects/tp-note/" target="_blank">🌐 Homepage</a></li><li><a href="https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/by-name/tp/tpnote/package.nix#L53" target="_blank">📦 Source</a></li><li>License: <a href="https://spdx.org/licenses/MIT.html" target="_blank">MIT License</a></li></ul>

Incorrect parsing

html2md-rs: Malformed attribute: http-equiv="content-type" content="text/html; charset=utf-8" - Missing quotation mark at around index 0

Incorrect Error: Malformed attribute

input

create with

curl https://askubuntu.com/questions/189640/how-to-find-architecture-of-my-pc-and-ubuntu -o test.txt

The file:

test.txt

Incorrect Error

Malformed attribute: id=“search” role=“search” action=/search class=“s-topbar–searchbar js-searchbar “ autocomplete=“off” - Missing quotation mark at around index 13951

Malformed attribute

Malformed attribute: property="og:type" content= "website" - Missing attribute name at around index 938`
test.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.