GithubHelp home page GithubHelp logo

node-htmlparser's Introduction

#NodeHtmlParser A forgiving HTML/XML/RSS parser written in JS for both the browser and NodeJS (yes, despite the name it works just fine in any modern browser). The parser can handle streams (chunked data) and supports custom handlers for writing custom DOMs/output.

##Installing

npm install htmlparser

##Running Tests

###Run tests under node: node runtests.js

###Run tests in browser: View runtests.html in any browser

##Usage In Node

var htmlparser = require("htmlparser");
var rawHtml = "Xyz <script language= javascript>var foo = '<<bar>>';< /  script><!--<!-- Waah! -- -->";
var handler = new htmlparser.DefaultHandler(function (error, dom) {
	if (error)
		[...do something for errors...]
	else
		[...parsing done, do something...]
});
var parser = new htmlparser.Parser(handler);
parser.parseComplete(rawHtml);
sys.puts(sys.inspect(handler.dom, false, null));

##Usage In Browser

var handler = new Tautologistics.NodeHtmlParser.DefaultHandler(function (error, dom) {
	if (error)
		[...do something for errors...]
	else
		[...parsing done, do something...]
});
var parser = new Tautologistics.NodeHtmlParser.Parser(handler);
parser.parseComplete(document.body.innerHTML);
alert(JSON.stringify(handler.dom, null, 2));

##Example output

[ { raw: 'Xyz ', data: 'Xyz ', type: 'text' }
  , { raw: 'script language= javascript'
  , data: 'script language= javascript'
  , type: 'script'
  , name: 'script'
  , attribs: { language: 'javascript' }
  , children: 
     [ { raw: 'var foo = \'<bar>\';<'
       , data: 'var foo = \'<bar>\';<'
       , type: 'text'
       }
     ]
  }
, { raw: '<!-- Waah! -- '
  , data: '<!-- Waah! -- '
  , type: 'comment'
  }
]

##Streaming To Parser

while (...) {
	...
	parser.parseChunk(chunk);
}
parser.done();	

##Streaming To Parser in Node

fs.createReadStream('./path_to_file.html').pipe(parser);

##Parsing RSS/Atom Feeds

new htmlparser.RssHandler(function (error, dom) {
	...
});

##DefaultHandler Options

###Usage

var handler = new htmlparser.DefaultHandler(
	  function (error) { ... }
	, { verbose: false, ignoreWhitespace: true }
	);

###Option: ignoreWhitespace Indicates whether the DOM should exclude text nodes that consists solely of whitespace. The default value is "false".

####Example: true

The following HTML:

<font>
	<br>this is the text
<font>

becomes:

[ { raw: 'font'
  , data: 'font'
  , type: 'tag'
  , name: 'font'
  , children: 
     [ { raw: 'br', data: 'br', type: 'tag', name: 'br' }
     , { raw: 'this is the text\n'
       , data: 'this is the text\n'
       , type: 'text'
       }
     , { raw: 'font', data: 'font', type: 'tag', name: 'font' }
     ]
  }
]

####Example: false

The following HTML:

<font>
	<br>this is the text
<font>

becomes:

[ { raw: 'font'
  , data: 'font'
  , type: 'tag'
  , name: 'font'
  , children: 
     [ { raw: '\n\t', data: '\n\t', type: 'text' }
     , { raw: 'br', data: 'br', type: 'tag', name: 'br' }
     , { raw: 'this is the text\n'
       , data: 'this is the text\n'
       , type: 'text'
       }
     , { raw: 'font', data: 'font', type: 'tag', name: 'font' }
     ]
  }
]

###Option: verbose Indicates whether to include extra information on each node in the DOM. This information consists of the "raw" attribute (original, unparsed text found between "<" and ">") and the "data" attribute on "tag", "script", and "comment" nodes. The default value is "true".

####Example: true The following HTML:

<a href="test.html">xxx</a>

becomes:

[ { raw: 'a href="test.html"'
  , data: 'a href="test.html"'
  , type: 'tag'
  , name: 'a'
  , attribs: { href: 'test.html' }
  , children: [ { raw: 'xxx', data: 'xxx', type: 'text' } ]
  }
]

####Example: false The following HTML:

<a href="test.html">xxx</a>

becomes:

[ { type: 'tag'
  , name: 'a'
  , attribs: { href: 'test.html' }
  , children: [ { data: 'xxx', type: 'text' } ]
  }
]

###Option: enforceEmptyTags Indicates whether the DOM should prevent children on tags marked as empty in the HTML spec. Typically this should be set to "true" HTML parsing and "false" for XML parsing. The default value is "true".

####Example: true The following HTML:

<link>text</link>

becomes:

[ { raw: 'link', data: 'link', type: 'tag', name: 'link' }
, { raw: 'text', data: 'text', type: 'text' }
]

####Example: false The following HTML:

<link>text</link>

becomes:

[ { raw: 'link'
  , data: 'link'
  , type: 'tag'
  , name: 'link'
  , children: [ { raw: 'text', data: 'text', type: 'text' } ]
  }
]

##DomUtils

###TBD (see utils_example.js for now)

##Related Projects

Looking for CSS selectors to search the DOM? Try Node-SoupSelect, a port of SoupSelect to NodeJS: http://github.com/harryf/node-soupselect

There's also a port of hpricot to NodeJS that uses HtmlParser for HTML parsing: http://github.com/silentrob/Apricot

node-htmlparser's People

Contributors

cistov avatar deanmao avatar papandreou avatar tautologistics avatar tootallnate avatar vtamara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-htmlparser's Issues

Add ./index.js

or change lib/node-htmlparser.js to lib/htmlparser.js so I can localize / expose via require.paths using require('htmlparser')

No DefaultHandler in master branch?

I don't know whether I'm missing something or not, please tell me if I do and excuse me if this is too obvious but there is no DefaultHandler method of htmlparser object in master branch (version 2.0.0).
I've tried using this library in my browser but when I inspect htmlparser (Tautologistics.NodeHtmlParser) object, it doesn't have such a method. However it works like a charm in 1.x version!
is there something missing from this branch? or I'm missing something?
Thanks in advance.

meta value is not parsed correctly

Hi there,

If you look at a live journal entry like this one: http://cananian.livejournal.com/60624.html
You can see the

When doing: description = htmlparser.DomUtils.getElements( { tag_name: "meta", name: "description" }, dom);
Instead of having a result like this:

    [ { raw: 'meta name="description" value="the whole post in there"/',  
        data: 'meta name="description" value="the whole post in there"/',  
        type: 'tag',  
        name: 'meta',  
        attribs:   
         { name: 'description',  
           value: 'the whole post in there' } } ]  

I have this:

    [ { raw: 'meta name="description" value="the whole post in there"/',
        data: 'meta name="description" value="the whole post in there"/',
        type: 'tag',
        name: 'meta',
        attribs: 
         { name: 'description',
           value: 'the whole post in there' ,
           the: 'the',
           whole: 'whole',
           post: 'post',
           in: 'in',
           there: 'there' } } ]

Hope it helps!

tagStack.last() fails all over

for me. however when I change it to tagStack['-1']() its fine, haha.. no clue what you are doing here, but yeah here is my trace. Which is used with jsdom for parsing even a simple string like <html><body><p>foo</p></body></html>.

    register.test.js test GET /signup: TypeError: Property 'last' of object [object Object] is not a function
     at DefaultHandler.DefaultHandler$handleElement [as handleElement] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:694:26)
     at DefaultHandler.DefaultHandler$writeTag [as writeTag] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:612:8)
     at Parser.Parser$writeHandler [as writeHandler] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:443:20)
     at Parser.Parser$parseTags [as parseTags] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:383:8)
     at Parser.Parser$parseChunk [as parseChunk] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:95:8)
     at Parser.Parser$parseComplete [as parseComplete] (/Users/tj/Projects/LearnBoost/tests/functional/support/htmlparser/lib/htmlparser.js:86:8)
     at Object.ParseHtml (/Users/tj/Projects/LearnBoost/tests/functional/support/jsdom/lib/jsdom/browser/htmltodom.js:62:16)
     at HtmlToDom.appendHtmlToElement (/Users/tj/Projects/LearnBoost/tests/functional/support/jsdom/lib/jsdom/browser/htmltodom.js:73:27)
     at Object.innerHTML (/Users/tj/Projects/LearnBoost/tests/functional/support/jsdom/lib/jsdom/browser/index.js:295:27)
     at Object.write (/Users/tj/Projects/LearnBoost/tests/functional/support/jsdom/lib/jsdom/browser/index.js:202:22)
     at Object.jsdom (/Users/tj/Projects/LearnBoost/tests/functional/support/jsdom/lib/jsdom.js:30:9)
     at /Users/tj/Projects/LearnBoost/tests/functional/register.test.js:26:23
     at next (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:769:25)
     at runSuite (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:787:6)
     at check (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:648:16)
     at runFile (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:652:10)
     at Array.forEach (native)
     at runFiles (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:629:13)
     at run (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:598:5)
     at Object.<anonymous> (/Users/tj/Projects/LearnBoost/tests/integration/support/expresso/bin/expresso:851:13)
     at Module._compile (node.js:462:23)
     at Module._loadScriptSync (node.js:469:10)
     at Module.loadSync (node.js:338:12)
     at Object.runMain (node.js:522:24)
     at Array.0 (node.js:756:12)
     at EventEmitter._tickCallback (node.js:55:22)
     at node.js:773:9

bug when parsing <script> tag using some template system

var htmlparser = require('htmlparser'),
    util = require('util'),
    handler = new htmlparser.DefaultHandler(function(err, dom){}),
    parser = new htmlparser.Parser(handler),
    rawHtml = '<script type="text/template"><h1>Heading1</h1></script>';

parser.parseComplete(rawHtml);
console.log(util.inspect(handler.dom, false, null));

This piece of code discards "<" of <h1> and outputs:

[ { raw: 'script type="text/template"',
    data: 'script type="text/template"',
    type: 'script',
    name: 'script',
    attribs: { type: 'text/template' },
    children: 
     [ { raw: 'h1>Heading1</h1>',  // discard <
         data: 'h1>Heading1</h1>',
         type: 'text' } ] } ]

sync parse

Hi! Is it possible to synchronously parse a fully collected HTML, so that parser signature be the same to JSON.parse, require('qs').parse, require('querystring').parse?

TIA,
--Vladimir

form detection in js

How to detect a form which is created in javascript? I am unable o detect it.

For eg:
var frm = document.createElement("form")

handle '<' and '>' characters in attribute values

i have html that looks like the following:

<span title="first line<br>second line"></span>

it would be great if the 'br' is actually treated as part of the attribute value of 'title' instead of being treated as a new tag element

Streaming To Parser in Node - Example Request

Can you share an example of a working script using Streaming To Parser in Node?
Using your README text generates an error.

fs.createReadStream('./path_to_file.html').pipe(parser);

I get this error in console:

_stream_readable.js:476
  dest.on('unpipe', onunpipe);
       ^
TypeError: Object #<Parser> has no method 'on'
    at ReadStream.Readable.pipe (_stream_readable.js:476:8)
    at Object.<anonymous> (C:\dev\autorelease\script.js:54:8)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

Here is my simple script:

// get a file stream reader 
var reader = fs.createReadStream(process.argv[2]);

// get a file stream writer pointing to the json file to write to
var writer = fs.createWriteStream(input_json);

var htmlparser = require("htmlparser");
//var rawHtml = "Xyz <script language= javascript>var foo = '<<bar>>';< /  script><!--<!-- Waah! -- -->";
//var sys = require("sys");

var handler = new htmlparser.DefaultHandler(function (error, dom) {
    if (error)
        logger.log('error', 'handler error in parser', {error: error});
    else
        logger.log('info', '', {dom: JSON.stringify(dom)});
});
var parser = new htmlparser.Parser(handler);
//parser.parseComplete(reader);
//sys.puts(sys.inspect(handler.dom, false, null));

// pipe everything to do the conversion
reader.pipe(parser).pipe(writer);

erron on parse uncolsed tags

I install from npm, version is v1.7.6
<ul><li><ul><li>1<li>2</ul><li>3</ul>
test faild.
expected:
[
{
"raw": "ul",
"data": "ul",
"type": "tag",
"name": "ul",
"children": [
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "ul",
"data": "ul",
"type": "tag",
"name": "ul",
"children": [
{
"row": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"row": "1",
"data": "1",
"type": "text"
}
]
},
{
"row": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"row": "2",
"data": "2",
"type": "text"
}
]
}
]
}
]
},
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "3",
"data": "3",
"type": "text"
}
]
}
]
}
]
Complete
[
{
"raw": "ul",
"data": "ul",
"type": "tag",
"name": "ul",
"children": [
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "ul",
"data": "ul",
"type": "tag",
"name": "ul",
"children": [
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "1",
"data": "1",
"type": "text"
},
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "2",
"data": "2",
"type": "text"
}
]
}
]
}
]
},
{
"raw": "li",
"data": "li",
"type": "tag",
"name": "li",
"children": [
{
"raw": "3",
"data": "3",
"type": "text"
}
]
}
]
}
]
}
]

BODY becomes child of HEAD when closing tag is missing

Some old W3C pages were written in old style HTML. e.g. http://www.w3.org/TR/css3-2d-transforms/

node-htmlparser should be more forgiving.
version: 1.7.2

var request = require('request');
var jsdom = require('jsdom');

var url = 'http://www.w3.org/TR/css3-2d-transforms/';
request({uri:url}, function (error, response, body) {
    var html = body;
    var doc = jsdom.jsdom(html, null, {url: url});
    console.log(doc.head+''); //[ HEAD ]
    console.log(doc.body === null); //true
    console.log(doc.head.childNodes[9].tagName); //BODY
});

1.x: Less thans and greater thans in attributes break the parser

Based on the example in http://www.whatwg.org/specs/web-apps/current-work/#attr-iframe-srcdoc :

var rawHtml = '<iframe srcdoc="<p>Yeah, you can see it <a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;>in my gallery</a>."></iframe>',
    htmlparser = require('./lib/htmlparser'),
    handler = new htmlparser.DefaultHandler(),
    parser = new htmlparser.Parser(handler);
parser.parseComplete(rawHtml);
console.warn(require('util').inspect(handler.dom, false, null));

Output:

[ { raw: 'iframe srcdoc="',
    data: 'iframe srcdoc="',
    type: 'tag',
    name: 'iframe',
    attribs: { srcdoc: 'srcdoc' },
    children: 
     [ { raw: 'p',
         data: 'p',
         type: 'tag',
         name: 'p',
         children: 
          [ { raw: 'Yeah, you can see it ',
              data: 'Yeah, you can see it ',
              type: 'text' },
            { raw: 'a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;',
              data: 'a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;',
              type: 'tag',
              name: 'a',
              attribs: { href: '&quot;/gallery?mode=cover&amp;amp;page=1&quot;' },
              children: [ { raw: 'in my gallery', data: 'in my gallery', type: 'text' } ] },
            { raw: '."', data: '."', type: 'text' } ] } ] } ]

Expected output:

[ { raw: 'iframe srcdoc="<p>Yeah, you can see it <a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;>in my gallery</a>."',
    data: 'iframe srcdoc="<p>Yeah, you can see it <a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;>in my gallery</a>."',
    type: 'tag',
    name: 'iframe',
   attribs: { srcdoc: '<p>Yeah, you can see it <a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;>in my gallery</a>.' } } ]

It works if I entitify the less thans and greater thans.

handle improperly escaped attributes

The wild internet sometimes contains weird stuff that makes this parser behave funny.

A tag such as this:
<a href="#" onclick="moveAddCommentBelow("div-comment-579747", 579747, true); return false;" />

Has attributes parsed like so:
{ href: '#'
, onclick: 'moveAddCommentBelow('
, 'div-comment-579747': 'div-comment-579747'
, ',': ','
, '579747,': '579747,'
, 'true);': 'true);'
, return: 'return'
, 'false;': 'false;'
}

<source> tags are not parsed properly

I might be doing the readout wrong, but this is the second time I've picked this up. It seems that <source> isn't identified as a void tag, so they become children of one another when listed inside a <video>:

var htmlparser = require('htmlparser');

var htmlContent = "<html><head></head><body><video><source src=\"foo.ogv\"><source src=\"lol.smaz\"></video><div></div></body></html>";

var handler = new htmlparser.DefaultHandler(function (error, dom) {
  function parse(dom, spacing){
    console.log(spacing, dom.name);
    if(dom.children){
      for(var i=0; i<dom.children.length; ++i){
        parse(dom.children[i], spacing + ' ');
      }
    }
  }
  parse(dom[0], '');
});

new htmlparser.Parser(handler).parseComplete(htmlContent);

cruft in npm package

Looks like you were testing with libxmljs.node in your local folder before running npm package? I noticed this because I wrote a script to scan for C++ node modules and it bumped into this libxmljs.node, which seemed odd because htmlparser does not list it as a dependency in package.json. Just thought you might want to know.

1.x: HTML comment delimiters inside <script> ends the tag

var rawHtml = '<script>document.write("<!--hello-->");</script>',
    htmlparser = require('./lib/htmlparser'),
    handler = new htmlparser.DefaultHandler(),
    parser = new htmlparser.Parser(handler);
parser.parseComplete(rawHtml);
console.warn(require('util').inspect(handler.dom, false, null));

Output:

[ { raw: 'script',
    data: 'script',
    type: 'script',
    name: 'script',
    children: 
     [ { raw: 'document.write("',
         data: 'document.write("',
         type: 'text' },
       { raw: 'hello', data: 'hello', type: 'comment' },
       { raw: '");', data: '");', type: 'text' } ] } ]

Expected output:

[ { raw: 'script',
    data: 'script',
    type: 'script',
    name: 'script',
    children: 
     [ { raw: 'document.write("<!--hello-->");',
         data: 'document.write("<!--hello-->");',
         type: 'text' } ] } ]

Problem when a <script> tag is not closed.

If the parsed document has a <script> tag which is not closed, the parser acts like the document had nothing.
Example:

var
    htmlparser = require('htmlparser'),
    parser,
    pHandler,
    data = '<html><body><h1>Bla</h1><script src="somewhere"></body></html>';

pHandler = new htmlparser.DefaultHandler(function(err,doc){
    if ( err )
        throw err;
    console.log("doc: ",doc);
});

The result is:
doc: []

Bug in parser

node-htmlparser parses these two codes in the same way!

<span class="copyright link">Copyright content</span>
<span class="copyright link">Copyright content</spane>

Can you give me a quick rundown on how to use this?

I'm trying to integrate this into jsDOM to fix issues with parsing '<' or '>' in attributes.
After I successfully do that, I'll update the directions in the readme based on my experience.

EDIT: Also, if it's ready, can you publish to npm?

[Suggestion] Add support for reserialisation

As a suggestion to the DomUtils "submodule", it'd be neat if it included a way to re-serialize a DOM node back to HTML.
Basically, I'm using this library as a way to pre-process HTML before it's served to the user (an XML-flavoured templating engine, if you wish).

(I didn't find any obvious way to mark an issue as a suggestion/bug/..., hence why I added [suggestion] to the title. Did I miss anything? Github newbie here...)

Convert JSON back to HTML

I would like to convert DOM to JSON, modify the JSON and convet it back to DOM String.

Please suggest

maybe you should update your documents~

/Users/owen/Documents/workspace/node-htmlparser/snippet.js:8
var handler = new htmlparser.DefaultHandler(function(err, dom) {
^
TypeError: undefined is not a function
at Object. (/Users/owen/Documents/workspace/node-htmlparser/snippet.js:8:15)
at Module._compile (module.js:449:26)
at Object.Module._extensions..js (module.js:467:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.runMain (module.js:492:10)
at process.startup.processNextTick.process._tickCallback (node.js:244:9)

crazy IE-generated HTML is not normalized

IE8 (at least) will take the following HTML:

<!DOCTYPE html>
<html>
    <head></head>
    <body></body>
</html>

And convert it to:

<!DOCTYPE HTML>
<HTML>
    <HEAD></HEAD>
    <BODY></BODY>
</HTML>

The neat part: node-htmlparser handles this just fine!

The bad: libraries like soupselect (https://github.com/harryf/node-soupselect) and the DomUtils included with node-htmlparser will fail when trying to select 'body'. The DomUtils will select 'BODY' properly, but it's a pain to have to try to select BOTH 'body' and 'BODY'.

Should this be handled on the parser-side of things or the selector side of things? I'm not sure. Part of me thinks that the parser should normalize the HTML to an extent, such as make all the tags lowercase. At the same time, a selector engine could do this normalization when searching.

Thoughts?

does not support attribute names without values

example:

<div by-zero name="something">

If a tag like this appears anywhere in the html, the parsing stops shortly after a tag like this. I saw this tag in the reddit source code where my code bombed.

Any way to get textContent or innerHTML?

I need to get text contents inside an element and before reinventing the wheel I must be sure if anything similiar exists.
Would it be useful if I create a few utility methods that can be merged with DomUtils?

Vulnerable Regular Expression

The following regular expression used in parsing the HTML documents is vulnerable to ReDoS:

/(^\s+|\s+$)/g

The slowdown is moderately low: for 50.000 characters around 2.5 seconds matching time. However, I would still suggest one of the following:

  • remove the regex,
  • anchor the regex,
  • limit the number of characters that can be matched by the repetition,
  • limit the input size.

If needed, I can provide an actual example showing the slowdown.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.