erikd / language-javascript Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 92.0 1.11 MB

Parser for JavaScript, in Haskell

License: BSD 3-Clause "New" or "Revised" License

Haskell 68.96% Shell 0.38% JavaScript 0.01% Yacc 22.63% Makefile 0.14% Lex 7.87%

language-javascript's People

Contributors

Stargazers

Watchers

language-javascript's Issues

'$' character in identifier

In JavaScript, the '$' character is not special and can appear in identifiers. jQuery makes extensive use of this.

For example:

var img = document.createElement('img');
img.src = "mylogo.jpg";
$(img).click(function() {
    alert('clicked!');
});

Language.Javascript fails to parse the '$' character.

No space between `return` and identifier

Steps to reproduce:

open repl
import library (version 0.6.0.4)
type renderToString (JSAstStatement (JSReturn JSNoAnnot (Just (JSIdentifier JSNoAnnot "foo")) (JSSemi JSNoAnnot)) JSNoAnnot)

Expected outcome:

"return foo;"

Actual outcome:

"returnfoo;" (invalid javascript -- ReferenceError: returnfoo is not defined)

What is the correct construction of the AST to prevent this problem?

renderJS -> Parser -> renderJS should be idempotent

Hi,

I've just written a simple program that parses and then pretty prints some JS code. If I take the pretty printed output file and run it through the program again, the second output is different from the first. Would be nice if this operation was idempotent. Currently using version 0.5.4.

Cheers,
Erik

Do while Loop without Block is not parsed

When I try to parse a do while loop with a single statement ending in ; like the following

do x=x+1; while(x<4);

I get a parser error

SemiColonToken {token_span = TokenPn 8 1 9, token_comment = [NoComment]}

, although it is executed correctly as JavaScript

Error for high Unicode Code Points in block comments

There seems to be a problem with Unicode Code Points in the range U+10000 to U+10FFFF within block comments:

readJs "/* 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 */"
-- *** Exception: "lexical error @ line 1 and column 4"

Within normal comments, everything is fine:

readJs "// 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "
-- JSAstProgram [] (JSAnnot (TokenPn 0 0 0) [CommentA (TokenPn 0 1 1) "// \120792\120793\120794\120795\120796\120797\120798\120799\120800\120801"])

Add unicode escape parsing

Strings like the following appear in some Purescript output. psc-bundle uses language-javascript for dead code elimination.

λ  readJs "a = '\u1234'"

<interactive>:7:16:
    lexical error in string/character literal at character 'u'

E.g.: from purescript-test-unit:

return Control_Monad_Eff_Class.liftEff(Control_Monad_Aff.monadEffAff (Test_Unit_Console.consoleLog("\u2713 Passed: " + l));

Chrome & Firefox parse this fine.

0.6 on Stackage

Hi Erik!

When can we expect the 0.6.x branch on Stackage?

Build failure with GHC 8.4

language-javascript-0.6.0.10

[10 of 12] Compiling Language.JavaScript.Pretty.Printer ( src/Language/JavaScript/Pretty/Printer.hs, .stack-work/dist/x86_64-osx/Cabal-2.2.0.0/build/Language/JavaScript/Pretty/Printer.o )

/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:106:55: error:
    Ambiguous occurrence ‘<>’
    It could refer to either ‘Prelude.<>’,
                             imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
                             (and originally defined in ‘GHC.Base’)
                          or ‘Language.JavaScript.Pretty.Printer.<>’,
                             defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
    |
106 |     (|>) (PosAccum (r,c) bb) s = PosAccum (r',c') (bb <> str s)
    |                                                       ^^

/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:116:86: error:
    Ambiguous occurrence ‘<>’
    It could refer to either ‘Prelude.<>’,
                             imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
                             (and originally defined in ‘GHC.Base’)
                          or ‘Language.JavaScript.Pretty.Printer.<>’,
                             defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
    |
116 |     (|>)  (PosAccum (lcur,ccur) bb) (TokenPn _ ltgt ctgt) = PosAccum (lnew,cnew) (bb <> bb')
    |                                                                                      ^^

/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:120:22: error:
    Ambiguous occurrence ‘<>’
    It could refer to either ‘Prelude.<>’,
                             imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
                             (and originally defined in ‘GHC.Base’)
                          or ‘Language.JavaScript.Pretty.Printer.<>’,
                             defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
    |
120 |         bb' = bbline <> bbcol
    |                      ^^

Only allow valid javascript AST's

skimming through the code, it seems that all non-leaf nodes can allow any node as a child. This allows AST's to potentially represent invalid programs. It would be nice to be able to explicitly require the program to be correct by being representable by the tree.

Mozilla Parser API

Hey, it'd be really cool if this tool could emit json compatable with the Mozilla Parser API.
Alot of tools conform to it, so it'd make hooking into them alot easier.

I believe it should be a (lossy) transformation on the AST implemented.

Does not parse "x[i] / y".

I get a lexical error when trying to parse:

var z = x[i] / y;

Don't have time to look further into it at the moment, otherwise I'd send a pull request again.

the new version doesn't compile on windows10.

The new version# works well on linux, but not compile on windows.
#language-javascript-0.6.0.12

Text support

Currently only String seems to be supported, but for obvious reasons (performance, memory use) Text is often preferred, so it would be great if this package could support both.

cannot parse strings with certain allowed characters

I ran into this issue using a literal "escape" character (code point 27), which should be allowed within a string. The workaround for now is to use the \x1B escape sequence.

Ref. purescript/purescript#1265

mtl 2

Hi Alan,

Would it be possible to bump the mtl dependency to 2, or even better switch over to transformers? I can send you a patch if you'd like.

Thanks,
Michael

alex == 3.0.1 constraint causes problems

if someone does cabal install alex he'll get version 3.0.2 from hackage. Language-javascript will fail to install with this. Is this intentional?

Support for ECMAScript 6 and 7

This implements features for es 6 & 7.

Each of these should be implemented in a separate PR and each item needs tests. There are likely to be three tests needed, one for the appropriate XParser.hs test and one in each if Minify.hs and RoundTrip.hs.

Incorrect minification of bootstrap-datepicker

In Firefox and Chrome it gives this error:

SyntaxError: expected expression, got keyword 'else'1 autogen-unjVjs7z.js:4:387

You can use this files to check it.

https://gist.github.com/fgaray/0a08a3f4ab3c01f2d7c9

I managed to get a minimal example:

if(1)
    ;
else if(2){
    3;
}

Incorrect result

if(1)else if(2)3

AlexSpan not exported

The Language.JavaScript.Parser modules do not export the AlexSpan data type. This data type is used by Token which is in turn used by ParseError. This means that reporting the AlexSpan of parse errors is not possible. I think AlexSpan should be exported or some other arrangement to allow reporting of the location of parse errors.

alex & happy requirements

In building a Yesod app, the build process stumbled over language-javascript since alex & happy were not installed. I see this was also addressed in issue #3 and that the fix for that was to add them under build-tools:. You also suggested using Haskell Platform, but it, and both Ubuntu (upto 11.10) and Fedora (and I'm guessing other distros), don't provide the version of alex that language-javascript requires. These should probably be listed under build-depends: as cabal and hackage.org seem to be the best place to get these anyways. This would make building from a clean slate much easier.

Remove install-time dependency on Alex

So that it can install with the current Haskell Platform

'invalid byte sequence' error when installing the package

c:\purescript>stack build
language-javascript> configure
language-javascript> Configuring language-javascript-0.7.0.0...
language-javascript> build
language-javascript> Preprocessing library for language-javascript-0.7.0.0..
language-javascript> happy.exe: src\Language\JavaScript\Parser\Grammar7.y: hGetContents: invalid argument (invalid byte sequence)

--  While building package language-javascript-0.7.0.0 using:
      C:\Users\Marko\AppData\Roaming\stack\setup-exe-cache\x86_64-windows\Cabal-simple_Z6RU0evB_2.4.0.1_ghc-8.6.5.exe --builddir=.stack-work\dist\e626a42b build --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1
Progress 1/2

This is towards the end of the Purescript compiler build, but it happens to me when I try doing stack install language-javascript-0.7.0.0 on a completely fresh project as well.

For all I know, this error might be Windows 10 related. I've been running into crazy stack errors since yesterday and Haskell tooling is completely broken for me.

Any ideas why this could be happening?

Provide pretty print function for all types in AST

Probably need Language.JavaScript.Parser.Token as exposed module

Hello,
First of all thanks for your precious and useful work on "language-javascript".
I'm quite new to Haskell development, so sorry in advance for any mistake I could do or noob question I could ask.

I'm working on a Haskell program, and I would need to manipolate the JSAST I'm getting from the parser, in order to further parse JSDoc comment annotations.

JSAnnot and CommentAnnotation are defined as:

data JSAnnot
    = JSAnnot !TokenPosn ![CommentAnnotation]
    | JSNoAnnot
    deriving (Data, Eq, Show, Typeable)

data CommentAnnotation
    = CommentA TokenPosn String
    | WhiteSpace TokenPosn String
    | NoComment
    deriving (Eq, Show, Typeable, Data, Read)

By the way, CommentAnnotation is defined inside Language.JavaScript.Parser.Token module and I'm not able to pattern match against it to extract the actual comment String because Language.JavaScript.Parser.Token is an hidden module.

As a matter of facts, when I try importing Language.JavaScript.Parser.Token I get the following error:

Could not find module ‘Language.JavaScript.Parser.Token’
    it is a hidden module in the package ‘language-javascript-0.6.0.8@langu_79vgAiOr5C496FlesUP4aI’

Would it be possible to expose Language.JavaScript.Parser.Token? Or how would you suggest to approach the problem? Any advice would be appreciated.
Thank you!

Exponentially bad performance in minifyJS (GHC 7.10.3)

This is quite odd, because it seems to happen with GHC 7.10.3 (stackage lts-6.23) but goes away with GHC 8.0.1 (stackage nightly). Obviously a few dependency versions change too. This is a mistake: the bug is there with GHC 8 too. I must have mixed up executables for different versions of the test, or something

In language-javascript-0.6.0.8 the time taken by minifyJS can be exponential in the length of an expression. For example, when trying to minify "var x = 'a'+1+'a'+1+'a'+1+'a';" the time goes up by a factor of four for each additional "'a'+1+'".

This was discovered when looking into yesodweb/yesod#1291

FWIW here is my complete test program

import System.Environment                 (getArgs)
import Language.JavaScript.Parser         (parse)
import Language.JavaScript.Process.Minify (minifyJS)

-- Give a single integer on the command line to specify the number of
-- copies of "'a'+1' in the expression.  But be careful: the time taken
-- for minifyJS goes up as 4^n !!
main :: IO ()
main = do
    args <- getArgs
    let n = case args of
                nn:_ -> read nn :: Int
                _    -> error "Specify repetition count on the command line"
        expr = "var x = " ++ concat (replicate n "'a'+1+") ++ "'a';"
    putStrLn $ "Minifying: " ++ expr
    let x = parse expr "noname"
    case x of
        Left err  -> putStrLn $ "This shouldn't happen: " ++ err
        Right ast -> putStrLn $ show $ minifyJS ast

Install error on linux

Hi,

While trying to install yesod, language-javascript is failing to install.
Error Message:

$ cabal install language-javascript
Resolving dependencies...
Configuring language-javascript-0.5.8...
cabal: The program happy version >=1.18.5 is required but it could not be
found.
Failed to install language-javascript-0.5.8
cabal: Error: some packages failed to install:
language-javascript-0.5.8 failed during the configure step. The exception was:
ExitFailure 1
$ ghc --version 
The Glorious Glasgow Haskell Compilation System, version 7.6.3
$ cabal --version
cabal-install version 1.16.0.2
using version 1.16.0 of the Cabal library

I was separately able to install happy 1.9.2. without any issues.

Fails to parse foo.bar after break without semicolon

It fails to parse the following code.

var test = [];
for (var i = 0; i < 10; ++i) {
  if (false) break
  test.push(0)
}

Expose Language.JavaScript.Parser.StringEscape

It seems like Language.JavaScript.Parser.StringEscape was accidentally left unexposed. It's definitions are valuable to the library user, and they aren't even currently used by the package.

Escaping quotes causes parser error

When trying to parse the following javascript code the parser returns an lexical error:

var a = '\"';

I know this " should not require escaping as it differs from the enclosing quotes, but it should not raise an error. The ecmascript5 specs say that " should be allowed as a SingleEscapeCharacter.

This issues also happens the other way around (when escaping a ' enclosed by ").

To add some context: I'm using this library to minify a vendor JS library which happens to use quotes this way. And as it seems the specs should allow this I'm creating this issue here. If I however misunderstood the specs please let me know and i will move the issue to the vendor library instead.

Lexer.x source is missing in tarball

This makes the package not legally distributable by, e.g. Debian. (See http://bugs.debian.org/669156). So Lexer.x needs to end up in the tarball.

One way is to just put it in src/ and let Cabal take care of it. Using "setup sdist" will create a tarball with the generated file Lexer.hs in dist/, so that users will not need to have alex installed and in the path to install the library. Unless the modify the source Lexer.x of course.

language-javascript-0.5.14.0 haddock failure

Haddock generation seems to fail with latest language-javascript-0.5.14.0
(found while running the stackage nightly build script).

Running Haddock for language-javascript-0.5.14.0...
Running hscolour for language-javascript-0.5.14.0...
Preprocessing library language-javascript-0.5.14.0...
Preprocessing library language-javascript-0.5.14.0...
Warning: The documentation for the following packages are not installed. No
links will be generated to these packages: mtl-2.2.1, utf8-string-1

src/Language/JavaScript/Parser/Token.hs:16:5:
    parse error on input ‘-- * The tokens’

Parenthesized identifiers in expressions not being parsed

 function foo() {
 var x, y;
 x += (y);
}

Above program fails to parse. I looked at the grammar file but i am not sure why it's not being parsed as JSExpressionParen.

Test suite fails if locale is not an UTF8 locale

Hi,

the environment that Debian builds its packags in do not have a specific locale set (after all, the result should depend on the settings of the particular developer building the package). But without an UTF8 locale, your test suite fails:

  unicode5f: [Failed]
ERROR: ./test/Unicode.js: hGetContents: invalid argument (invalid byte sequence)

Unfortunately, the encoding of .js files is not specified, so you cannot just change the encoding of the filehandle to utf8 in parseFile. Maybe you should not use parseFile in the test suite, but parseJs and read the file with the correct encoding (which you know) set for the filehandle?

Automatically install build dependencies

Cabal doesn't seem to install the build-time dependencies Happy and Alex automatically. Manual 'cabal install happy && cabal install alex' works around the issue.

Are there plans to support ES6 or/and ES7

both versions seems to be official: http://www.ecma-international.org/publications/standards/Ecma-262.htm

IIRC there aren't that much syntax changes.

Cleanup of Lexer.x

Lexer.x has very large amounts of commented code. It's often not really clear what that code is doing: is it meant to represent a future possibility, or a previous approach that was replaced by a new one?

I'm attempting to update the lexer code for ES201x, and it would be nice to have some insight into what the commented stuff represents. I can get some sense by looking at git history, but is it also possible to simply clean some of it out?

Parser fails on code and mode as variable names

From tony Morris

Hi mate, I am using language-javascript-0.4.6 and the parser is failing
if I use the word "code". I can't find anything in the specification
that says this is not permitted, so am I assuming correctly that this is
a bug in the parser?

Here is an example that fails to parser:

var k = {
 y: code
}

If I change the word "code" to anything else it parses fine. Thanks for
any help.

Another word that appears to be disallowed but not listed in any specification I can find is "mode".

Add source filename to TokenPosn

This would be a pretty invasive change, I think. One idea would be to make TokenPosn in JSAnnot a type parameter, then you could fmap over it and add the information on the filename lazily.

Lexer error for various escaped characters

The lexer in version 0.6.0.9 breaks for inputs like var x = "\/";, complaining about an invalid character after the backslash. However, according to ecma-262 arbitrary Unicode characters (with some exceptions) may be escaped in string literals as stated in rule NonEscapeCharacter of the string literal grammar. Several websites use this e.g. for escaping slashes in URLs.

Regex lexing problem

The regex /[/]/ is a valid Javascript regex, but currently fails in the lexer because it the forward slash inside the square brackets is incorrectly detected as the end of the regex.

strings with backticks are misrecognized as template literals

function foo() {
  a = '`sadsd`';
}
function bar() {
  a = '`sadas`';
}

Above program fails to parse.

Readme may be out of date

It describes a 0.5 series on master and a 0.6 series on new-ast, but master is up to 0.7.

Cannot render correctly do while loop without braces

This issue is related to #35, the library cannot render correctly.

import Language.JavaScript.Parser
main = print $ fmap renderToString $ parse "do x = x + 1; while (x < 4);" ""

Right "do x = x + 1  while (x < 4);"

The semicolon is dropped out and the result is an invalid JavaScript code.

Scientific notation in object literals fails the parser

The parser appears to fail when using scientific notation in object literals.

readJs "{ y: 1e8 }" -- fine
readJs "{ y: 18 }" -- fine
readJs "x = { y: 18 }" -- fine
readJs "x = { y: 1e8 }" -- exception

Expected: No exception during parse.
Actual: Exception during parse in the above case.

Leaflet 1.5.1 does not minify

I've been trying to minify Leaflet 1.5.1 with hjsmin's Text.Jasmine.minifym function, but I get a Left value reading:

"HookToken {tokenSpan = TokenPn 87840 2829 23, tokenComment = [WhiteSpace (TokenPn 87839 2829 22) \" \"]}"

I got Leaflet 1.5.1 by downloading a zip file from http://cdn.leafletjs.com/leaflet/v1.5.1/leaflet.zip. Then I ran minifym on a ByteString obtained with Data.ByteString.Lazy.readFile from the bytestring package with a path to leaflet-src.js.

Any clue what's happening here?

Invalid RegEx Lexical Error

Compiler reports lexical error on column 73 for this line

function Class(v) { return Object.prototype.toString.call(v).replace(/^\[object *|\]$/g, ''); }

I've ran this line in multiple browsers and they seem to handle it fine.

lexer does not work with GHC 7.8

The pregenerated lexer in the 0.5.9 package does not work with GHC 7.8.1rc1. Can you release an update with an updated lexer (from a newer alex)?

cabal install language-javascript gives this result:

[ 6 of 12] Compiling Language.JavaScript.Parser.Lexer ( src/Language/JavaScript/Parser/Lexer.hs, dist/build/Language/JavaScript/Parser/Lexer.o )

src/Language/JavaScript/Parser/Lexer.hs:816:29:
    Couldn't match expected type ‛Bool’ with actual type ‛Int#’
    In the first argument of ‛(&&)’, namely ‛(offset >=# 0#)’
    In the expression: (offset >=# 0#) && (check ==# ord_c)
    In the expression:
      if (offset >=# 0#) && (check ==# ord_c) then
          alexIndexInt16OffAddr alex_table offset
      else
          alexIndexInt16OffAddr alex_deflt s

src/Language/JavaScript/Parser/Lexer.hs:816:48:
    Couldn't match expected type ‛Bool’ with actual type ‛Int#’
    In the second argument of ‛(&&)’, namely ‛(check ==# ord_c)’
    In the expression: (offset >=# 0#) && (check ==# ord_c)
    In the expression:
      if (offset >=# 0#) && (check ==# ord_c) then
          alexIndexInt16OffAddr alex_table offset
      else
          alexIndexInt16OffAddr alex_deflt s
Failed to install language-javascript-0.5.9

while cleaning the package first, forcing it to rebuild the lexer, does install:

$ cabal unpack language-javascript
$ cd language-javascript-0.5.9
$ cabal clean
$ cabal install

It should work if you rebuild the included parser with alex 3.1.3

Upgrade test suite to HUnit 1.3

The current .cabal file specifies HUnit < 1.3.

Autoinstall build dependencies: happy and alex

While walk through a yesod tutorial, cabal install stumble over your package. As I'm not experienced with neither haskell nor cabal, I don't know how build dependencies are handled in this system: But is it possible to install happy and alex automatic, if missing?

The parser doesn't seem to respect automatic semicolon insertion

In javascript, this code;

var f = function() {
  return
  'value';
}

should be parsed like;

var f = function() {
  return;
  'value';
}

The current parser seems to parse it as if the string 'value' is part of the return statement, which it (somewhat surprisingly) is not.

erikd / language-javascript Goto Github PK

language-javascript's People

Contributors

Stargazers

Watchers

Forkers

language-javascript's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs