GithubHelp home page GithubHelp logo

breakdance / breakdance Goto Github PK

View Code? Open in Web Editor NEW
513.0 23.0 28.0 1.71 MB

It's time for your markup to get down! HTML to markdown converter. Breakdance is a highly pluggable, flexible and easy to use.

Home Page: https://breakdance.github.io/breakdance/

License: MIT License

JavaScript 70.45% CSS 24.47% HTML 5.08%
markdown html convert parse compile render to-markdown html-to-markdown converter markup

breakdance's Issues

Is there any way to get outerHTML of the node?

Hi,
I want render some nodes as html … well leaving them as is. Is it possible?

    const page = `
    <figure>
          <img src="/media-narrow.png" width="800" height="400" alt="A stacked component, image on the top, text below" />
      <figcaption>The component as a single column</figcaption>
    </figure>
    `;
    const breakdance = new Breakdance();
    breakdance.before(['figure'], function(node) {
      this.emit(`I DONT KNOW HOW TO PUT NODE INNER HTML HERE`, node);
    });
    const md = breakdance.render(page);

Also I think that would be nice transform this to:

    ![A stacked component, image on the top, text below][The component as a single column]
    [The component as a single column]: /media-narrow.png "A stacked component, image on the top, text below"

But not sure how to solve this

Breakdance in browser

I was looking to improve my HTML to Markdown conversion for a web
application I have, but it appears that after an npm install of breakdance
that importing the module causes a "TypeError: fs.readdirSync is not a function".

My suspicion is that it's because the breakdance CLI is part of the
module, even though it's not required if one is calling the
breakdance( html_text) function in a browser context.

breakdance dot io is no longer your domain

Some of your docs still link to breakdance dot io, but the domain's currently showing a for-sale page. Should re-point all of those links to the GitHub Pages version if the project is still alive!

<a> with <img> inside returns an empty string

The following HTML code:

<a href="/"><img src="/image.jpg alt="Alt text" /></a>

gets converted to an empty string.

I wrote a simple failing test:

it('should convert an anchor with img tag inside to markdown', function () {
  isEqual.inline('<a href="/"><img src="/image.jpg" alt="Alt text" /></a>', '[![Alt text](/image.jpg)](/)');
});

Table conversion fail when containing <h> tags

Breakdance is unable to convert tables that contain text with <h> heading tags.
Example:

<table>
  <tr>
    <th><h3>Firstname</h3></th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>
</table>

(this converts to markdown that looks identical to the above input)

This behavior seems partially correct since <h> tags are not supported by tables.
However, total failure of conversion to markdown cannot be justified simply because tables can contain any text like '#%&@#&!", so any unsupported or unrecognizable syntax should also be treated like text.
Therefore the expected output should be:
| <h3>Firstname</h3> | Lastname | Age |

The problem is further amplified by the fact that currently Breakdance leaves the whole <table> HTML syntax in the markdown output, thus making it no longer usable as markdown. At minimum any bad syntax could be removed or replaced with a warning like:
[Unsupported HTML syntax - cannot be converted to markdown]

BTW, thanks Breakdance developers for your great contributions!

[email protected] crashes breakdance

Hi, I'm using [email protected] (latest) and this is the error:

> var breakdance = require('breakdance');
> breakdance('<strong>The freaks come out at night!</strong>')
Error: expected node to be an instance of Node
    at assert (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/node_modules/snapdragon-util/index.js:1018:19)
    at Object.utils.isOpen (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/node_modules/snapdragon-util/index.js:584:3)
    at Compiler.visit (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:187:14)
    at Compiler.compiler.visit (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/lib/compiler.js:66:22)
    at Compiler.mapVisit (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:229:23)
    at Compiler.compile (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:261:12)
    at Breakdance.compile (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:348:24)
    at Breakdance.render (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:373:18)
    at Breakdance (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:26:25)
❯ node --version
v9.11.1
❯ npm --version
5.8.0

Only with [email protected] I get the error, previous versions (0.11.3 and lower) work fine.

IMG loses width, height

Parsing something like this:

<img src="url1" width="394" height="106" />

Produces output like this:

![](url1)

While it is unpleasant that markdown doesn't have a way to specify width and height, I think it was more useful to (at least optionally) get output like this:

<img src="url1" width="394" height="106" />

(Which is to say, the input).

I hit upon this but working with tools that round-trip between HTML and markdown. This was the simplest common case I found which is "lossy".

(Already tried: { omit: ['img'] })

Preferences & Element Studio Option disappeared.

Under builder settings there used to be two more options that disappeared for all my sites.

Preferences & Element Studio

I need to upload custom font that used to be under preferences. Is there any other way to upload custom font?

image (9)

<br> tags in output

How do I avoid ending up with <br> tags in the output?

For example, if the input is: <i>italics</i> <br/> <h1> header 1 </h1>

The ouput is:

 _italics_ <br>

# header 1

The <br/> became a <br> (great!) but then isn't converted to markdown (not great).

Links does not take <base ...> into account

Hi and thanks for a great lib!

Given an html document like this:

<!doctype html>
<html>
  <head>
    <base href="/pages/">
  </head>
  <body>
    <a href="page2.html">Hi</a>
  </body>
</html>

The actual location for page2.html is /pages/page2.html because there is a <base> element which sets the base url for all relative urls.

But when I compile it to markdown with Breakdance it yields:

[Hi](page2.html)

When it should instead be:

[Hi](/pages/page2.html)

I had a hard time tracking down the domain option to the breakdance-util package, and because it depends on state and options from the compiler I haven't figured out a good way to solve it.

My quick fix is to use cheerio myself like this:

const $ = cheerio.load('...the html above...');
breakdance($.html(), {domain: url.resolve(myUrl, $('base').first().attr('href'))})

This works but it would be better if Breakdance did support the <base> element, which I think it should.

What do you think?

<b>, <br> and <i> tags

Hello, thanks for awesome project

console.log(breakdance('<b>fail</b><br><i>fail</i><br><strong>success</strong>'))
<b>fail</b><br>
<i>fail</i><br>
**success**

As I found playing with markup while writing this issue. <b> <i> and <br> works as is and can be not transpiled into markdown **bold** *italic* \n. It would be great if this would be documented in breakdance.

Indeed I need this behavior. Of course I can write my own regexps or tune handlers and I'll do so. But I think there should be a option key that change this behaviour

Thanks for you work

Table conversion fails for tables without headers and cells that contain <p>'s

Cells that contain <p>

Input:

<table>
    <thead>
        <tr>
            <th>Heading 1</th>
            <th>Heading 2</th>
            <th>Heading 3</th>
        </tr>
    </thead>
    <tbody>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    <tr>
        <td>Table cell</td>
        <td>
            <p>p #1</p>
            <p>p #2</p>
        </td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    </tbody>
</table>

Output:

| Heading 1 | Heading 2 | Heading 3 |
| --- | --- | --- |
| Table cell | Table cell | Table cell | Table cell |
| Table cell |

p #1

p #2  | Table cell | Table cell |

Preferred output:

| Heading 1 | Heading 2 | Heading 3 |
| --- | --- | --- |
| Table cell | Table cell | Table cell | Table cell |
| Table cell | p #1 <br> p #2  | Table cell | Table cell |

or even just removing the line break entirely

Seems related to #7

Tables without a header

Input:

<table>
    <tbody>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    </tbody>
</table>

Output (does not show as a table at all):

| Table cell | Table cell | Table cell | Table cell |
| Table cell | Table cell | Table cell | Table cell |

Preferred output:
Add a blank header to the table

| | | | |
| - | - | - | - |
| Table cell | Table cell | Table cell | Table cell |
| Table cell | Table cell | Table cell | Table cell |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.