erikd / language-javascript Goto Github PK
View Code? Open in Web Editor NEWParser for JavaScript, in Haskell
License: BSD 3-Clause "New" or "Revised" License
Parser for JavaScript, in Haskell
License: BSD 3-Clause "New" or "Revised" License
In JavaScript, the '$' character is not special and can appear in identifiers. jQuery makes extensive use of this.
For example:
var img = document.createElement('img');
img.src = "mylogo.jpg";
$(img).click(function() {
alert('clicked!');
});
Language.Javascript fails to parse the '$' character.
Steps to reproduce:
renderToString (JSAstStatement (JSReturn JSNoAnnot (Just (JSIdentifier JSNoAnnot "foo")) (JSSemi JSNoAnnot)) JSNoAnnot)
Expected outcome:
"return foo;"
Actual outcome:
"returnfoo;"
(invalid javascript -- ReferenceError: returnfoo is not defined
)
What is the correct construction of the AST to prevent this problem?
Hi,
I've just written a simple program that parses and then pretty prints some JS code. If I take the pretty printed output file and run it through the program again, the second output is different from the first. Would be nice if this operation was idempotent. Currently using version 0.5.4.
Cheers,
Erik
When I try to parse a do while loop with a single statement ending in ; like the following
do x=x+1; while(x<4);
I get a parser error
SemiColonToken {token_span = TokenPn 8 1 9, token_comment = [NoComment]}
, although it is executed correctly as JavaScript
There seems to be a problem with Unicode Code Points in the range U+10000
to U+10FFFF
within block comments:
readJs "/* 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 */"
-- *** Exception: "lexical error @ line 1 and column 4"
Within normal comments, everything is fine:
readJs "// 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 "
-- JSAstProgram [] (JSAnnot (TokenPn 0 0 0) [CommentA (TokenPn 0 1 1) "// \120792\120793\120794\120795\120796\120797\120798\120799\120800\120801"])
Strings like the following appear in some Purescript output. psc-bundle uses language-javascript for dead code elimination.
λ readJs "a = '\u1234'"
<interactive>:7:16:
lexical error in string/character literal at character 'u'
E.g.: from purescript-test-unit:
return Control_Monad_Eff_Class.liftEff(Control_Monad_Aff.monadEffAff (Test_Unit_Console.consoleLog("\u2713 Passed: " + l));
Chrome & Firefox parse this fine.
Hi Erik!
When can we expect the 0.6.x branch on Stackage?
language-javascript-0.6.0.10
[10 of 12] Compiling Language.JavaScript.Pretty.Printer ( src/Language/JavaScript/Pretty/Printer.hs, .stack-work/dist/x86_64-osx/Cabal-2.2.0.0/build/Language/JavaScript/Pretty/Printer.o )
/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:106:55: error:
Ambiguous occurrence ‘<>’
It could refer to either ‘Prelude.<>’,
imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
(and originally defined in ‘GHC.Base’)
or ‘Language.JavaScript.Pretty.Printer.<>’,
defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
|
106 | (|>) (PosAccum (r,c) bb) s = PosAccum (r',c') (bb <> str s)
| ^^
/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:116:86: error:
Ambiguous occurrence ‘<>’
It could refer to either ‘Prelude.<>’,
imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
(and originally defined in ‘GHC.Base’)
or ‘Language.JavaScript.Pretty.Printer.<>’,
defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
|
116 | (|>) (PosAccum (lcur,ccur) bb) (TokenPn _ ltgt ctgt) = PosAccum (lnew,cnew) (bb <> bb')
| ^^
/Users/dan/scratch/language-javascript-0.6.0.10/src/Language/JavaScript/Pretty/Printer.hs:120:22: error:
Ambiguous occurrence ‘<>’
It could refer to either ‘Prelude.<>’,
imported from ‘Prelude’ at src/Language/JavaScript/Pretty/Printer.hs:4:8-41
(and originally defined in ‘GHC.Base’)
or ‘Language.JavaScript.Pretty.Printer.<>’,
defined at src/Language/JavaScript/Pretty/Printer.hs:31:1
|
120 | bb' = bbline <> bbcol
| ^^
skimming through the code, it seems that all non-leaf nodes can allow any node as a child. This allows AST's to potentially represent invalid programs. It would be nice to be able to explicitly require the program to be correct by being representable by the tree.
Hey, it'd be really cool if this tool could emit json compatable with the Mozilla Parser API.
Alot of tools conform to it, so it'd make hooking into them alot easier.
I believe it should be a (lossy) transformation on the AST implemented.
I get a lexical error when trying to parse:
var z = x[i] / y;
Don't have time to look further into it at the moment, otherwise I'd send a pull request again.
The new version# works well on linux, but not compile on windows.
#language-javascript-0.6.0.12
Currently only String seems to be supported, but for obvious reasons (performance, memory use) Text is often preferred, so it would be great if this package could support both.
I ran into this issue using a literal "escape" character (code point 27), which should be allowed within a string. The workaround for now is to use the \x1B
escape sequence.
Hi Alan,
Would it be possible to bump the mtl dependency to 2, or even better switch over to transformers? I can send you a patch if you'd like.
Thanks,
Michael
if someone does cabal install alex
he'll get version 3.0.2
from hackage. Language-javascript
will fail to install with this. Is this intentional?
This implements features for es 6 & 7.
Each of these should be implemented in a separate PR and each item needs tests. There are likely to be three tests needed, one for the appropriate XParser.hs
test and one in each if Minify.hs
and RoundTrip.hs
.
In Firefox and Chrome it gives this error:
SyntaxError: expected expression, got keyword 'else'1 autogen-unjVjs7z.js:4:387
You can use this files to check it.
https://gist.github.com/fgaray/0a08a3f4ab3c01f2d7c9
I managed to get a minimal example:
if(1)
;
else if(2){
3;
}
Incorrect result
if(1)else if(2)3
The Language.JavaScript.Parser
modules do not export the AlexSpan
data type. This data type is used by Token
which is in turn used by ParseError
. This means that reporting the AlexSpan
of parse errors is not possible. I think AlexSpan
should be exported or some other arrangement to allow reporting of the location of parse errors.
In building a Yesod app, the build process stumbled over language-javascript since alex & happy were not installed. I see this was also addressed in issue #3 and that the fix for that was to add them under build-tools:
. You also suggested using Haskell Platform, but it, and both Ubuntu (upto 11.10) and Fedora (and I'm guessing other distros), don't provide the version of alex that language-javascript requires. These should probably be listed under build-depends:
as cabal and hackage.org seem to be the best place to get these anyways. This would make building from a clean slate much easier.
So that it can install with the current Haskell Platform
c:\purescript>stack build
language-javascript> configure
language-javascript> Configuring language-javascript-0.7.0.0...
language-javascript> build
language-javascript> Preprocessing library for language-javascript-0.7.0.0..
language-javascript> happy.exe: src\Language\JavaScript\Parser\Grammar7.y: hGetContents: invalid argument (invalid byte sequence)
-- While building package language-javascript-0.7.0.0 using:
C:\Users\Marko\AppData\Roaming\stack\setup-exe-cache\x86_64-windows\Cabal-simple_Z6RU0evB_2.4.0.1_ghc-8.6.5.exe --builddir=.stack-work\dist\e626a42b build --ghc-options " -fdiagnostics-color=always"
Process exited with code: ExitFailure 1
Progress 1/2
This is towards the end of the Purescript compiler build, but it happens to me when I try doing stack install language-javascript-0.7.0.0
on a completely fresh project as well.
For all I know, this error might be Windows 10 related. I've been running into crazy stack
errors since yesterday and Haskell tooling is completely broken for me.
Any ideas why this could be happening?
Provide pretty print function for all types in AST
Hello,
First of all thanks for your precious and useful work on "language-javascript".
I'm quite new to Haskell development, so sorry in advance for any mistake I could do or noob question I could ask.
I'm working on a Haskell program, and I would need to manipolate the JSAST I'm getting from the parser, in order to further parse JSDoc comment annotations.
JSAnnot and CommentAnnotation are defined as:
data JSAnnot
= JSAnnot !TokenPosn ![CommentAnnotation]
| JSNoAnnot
deriving (Data, Eq, Show, Typeable)
data CommentAnnotation
= CommentA TokenPosn String
| WhiteSpace TokenPosn String
| NoComment
deriving (Eq, Show, Typeable, Data, Read)
By the way, CommentAnnotation is defined inside Language.JavaScript.Parser.Token module and I'm not able to pattern match against it to extract the actual comment String because Language.JavaScript.Parser.Token is an hidden module.
As a matter of facts, when I try importing Language.JavaScript.Parser.Token I get the following error:
Could not find module ‘Language.JavaScript.Parser.Token’
it is a hidden module in the package ‘language-javascript-0.6.0.8@langu_79vgAiOr5C496FlesUP4aI’
Would it be possible to expose Language.JavaScript.Parser.Token? Or how would you suggest to approach the problem? Any advice would be appreciated.
Thank you!
This is quite odd, because it seems to happen with GHC 7.10.3 (stackage lts-6.23) but goes away with GHC 8.0.1 (stackage nightly). Obviously a few dependency versions change too. This is a mistake: the bug is there with GHC 8 too. I must have mixed up executables for different versions of the test, or something
In language-javascript-0.6.0.8 the time taken by minifyJS can be exponential in the length of an expression. For example, when trying to minify "var x = 'a'+1+'a'+1+'a'+1+'a';" the time goes up by a factor of four for each additional "'a'+1+'".
This was discovered when looking into yesodweb/yesod#1291
FWIW here is my complete test program
import System.Environment (getArgs)
import Language.JavaScript.Parser (parse)
import Language.JavaScript.Process.Minify (minifyJS)
-- Give a single integer on the command line to specify the number of
-- copies of "'a'+1' in the expression. But be careful: the time taken
-- for minifyJS goes up as 4^n !!
main :: IO ()
main = do
args <- getArgs
let n = case args of
nn:_ -> read nn :: Int
_ -> error "Specify repetition count on the command line"
expr = "var x = " ++ concat (replicate n "'a'+1+") ++ "'a';"
putStrLn $ "Minifying: " ++ expr
let x = parse expr "noname"
case x of
Left err -> putStrLn $ "This shouldn't happen: " ++ err
Right ast -> putStrLn $ show $ minifyJS ast
Hi,
While trying to install yesod
, language-javascript
is failing to install.
Error Message:
$ cabal install language-javascript
Resolving dependencies...
Configuring language-javascript-0.5.8...
cabal: The program happy version >=1.18.5 is required but it could not be
found.
Failed to install language-javascript-0.5.8
cabal: Error: some packages failed to install:
language-javascript-0.5.8 failed during the configure step. The exception was:
ExitFailure 1
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
$ cabal --version
cabal-install version 1.16.0.2
using version 1.16.0 of the Cabal library
I was separately able to install happy 1.9.2.
without any issues.
It fails to parse the following code.
var test = [];
for (var i = 0; i < 10; ++i) {
if (false) break
test.push(0)
}
It seems like Language.JavaScript.Parser.StringEscape was accidentally left unexposed. It's definitions are valuable to the library user, and they aren't even currently used by the package.
When trying to parse the following javascript code the parser returns an lexical error:
var a = '\"';
I know this "
should not require escaping as it differs from the enclosing quotes, but it should not raise an error. The ecmascript5 specs say that "
should be allowed as a SingleEscapeCharacter.
This issues also happens the other way around (when escaping a '
enclosed by "
).
To add some context: I'm using this library to minify a vendor JS library which happens to use quotes this way. And as it seems the specs should allow this I'm creating this issue here. If I however misunderstood the specs please let me know and i will move the issue to the vendor library instead.
This makes the package not legally distributable by, e.g. Debian. (See http://bugs.debian.org/669156). So Lexer.x needs to end up in the tarball.
One way is to just put it in src/ and let Cabal take care of it. Using "setup sdist" will create a tarball with the generated file Lexer.hs in dist/, so that users will not need to have alex installed and in the path to install the library. Unless the modify the source Lexer.x of course.
Haddock generation seems to fail with latest language-javascript-0.5.14.0
(found while running the stackage nightly build script).
Running Haddock for language-javascript-0.5.14.0...
Running hscolour for language-javascript-0.5.14.0...
Preprocessing library language-javascript-0.5.14.0...
Preprocessing library language-javascript-0.5.14.0...
Warning: The documentation for the following packages are not installed. No
links will be generated to these packages: mtl-2.2.1, utf8-string-1
src/Language/JavaScript/Parser/Token.hs:16:5:
parse error on input ‘-- * The tokens’
function foo() {
var x, y;
x += (y);
}
Above program fails to parse. I looked at the grammar file but i am not sure why it's not being parsed as JSExpressionParen
.
Hi,
the environment that Debian builds its packags in do not have a specific locale set (after all, the result should depend on the settings of the particular developer building the package). But without an UTF8 locale, your test suite fails:
unicode5f: [Failed]
ERROR: ./test/Unicode.js: hGetContents: invalid argument (invalid byte sequence)
Unfortunately, the encoding of .js files is not specified, so you cannot just change the encoding of the filehandle to utf8 in parseFile
. Maybe you should not use parseFile
in the test suite, but parseJs
and read the file with the correct encoding (which you know) set for the filehandle?
Cabal doesn't seem to install the build-time dependencies Happy and Alex automatically. Manual 'cabal install happy && cabal install alex' works around the issue.
both versions seems to be official: http://www.ecma-international.org/publications/standards/Ecma-262.htm
IIRC there aren't that much syntax changes.
Lexer.x
has very large amounts of commented code. It's often not really clear what that code is doing: is it meant to represent a future possibility, or a previous approach that was replaced by a new one?
I'm attempting to update the lexer code for ES201x, and it would be nice to have some insight into what the commented stuff represents. I can get some sense by looking at git history, but is it also possible to simply clean some of it out?
From tony Morris
Hi mate, I am using language-javascript-0.4.6 and the parser is failing
if I use the word "code". I can't find anything in the specification
that says this is not permitted, so am I assuming correctly that this is
a bug in the parser?
Here is an example that fails to parser:
var k = {
y: code
}
If I change the word "code" to anything else it parses fine. Thanks for
any help.
Another word that appears to be disallowed but not listed in any specification I can find is "mode".
This would be a pretty invasive change, I think. One idea would be to make TokenPosn
in JSAnnot
a type parameter, then you could fmap
over it and add the information on the filename lazily.
The lexer in version 0.6.0.9 breaks for inputs like var x = "\/";
, complaining about an invalid character after the backslash. However, according to ecma-262 arbitrary Unicode characters (with some exceptions) may be escaped in string literals as stated in rule NonEscapeCharacter of the string literal grammar. Several websites use this e.g. for escaping slashes in URLs.
The regex /[/]/
is a valid Javascript regex, but currently fails in the lexer because it the forward slash inside the square brackets is incorrectly detected as the end of the regex.
function foo() {
a = '`sadsd`';
}
function bar() {
a = '`sadas`';
}
Above program fails to parse.
It describes a 0.5
series on master
and a 0.6
series on new-ast
, but master
is up to 0.7
.
This issue is related to #35, the library cannot render correctly.
import Language.JavaScript.Parser
main = print $ fmap renderToString $ parse "do x = x + 1; while (x < 4);" ""
Right "do x = x + 1 while (x < 4);"
The semicolon is dropped out and the result is an invalid JavaScript code.
The parser appears to fail when using scientific notation in object literals.
readJs "{ y: 1e8 }" -- fine
readJs "{ y: 18 }" -- fine
readJs "x = { y: 18 }" -- fine
readJs "x = { y: 1e8 }" -- exception
Expected: No exception during parse.
Actual: Exception during parse in the above case.
I've been trying to minify Leaflet 1.5.1 with hjsmin
's Text.Jasmine.minifym
function, but I get a Left value reading:
"HookToken {tokenSpan = TokenPn 87840 2829 23, tokenComment = [WhiteSpace (TokenPn 87839 2829 22) \" \"]}"
I got Leaflet 1.5.1 by downloading a zip file from http://cdn.leafletjs.com/leaflet/v1.5.1/leaflet.zip. Then I ran minifym
on a ByteString obtained with Data.ByteString.Lazy.readFile
from the bytestring
package with a path to leaflet-src.js
.
Any clue what's happening here?
Compiler reports lexical error on column 73 for this line
function Class(v) { return Object.prototype.toString.call(v).replace(/^\[object *|\]$/g, ''); }
I've ran this line in multiple browsers and they seem to handle it fine.
The pregenerated lexer in the 0.5.9 package does not work with GHC 7.8.1rc1. Can you release an update with an updated lexer (from a newer alex)?
cabal install language-javascript
gives this result:
[ 6 of 12] Compiling Language.JavaScript.Parser.Lexer ( src/Language/JavaScript/Parser/Lexer.hs, dist/build/Language/JavaScript/Parser/Lexer.o )
src/Language/JavaScript/Parser/Lexer.hs:816:29:
Couldn't match expected type ‛Bool’ with actual type ‛Int#’
In the first argument of ‛(&&)’, namely ‛(offset >=# 0#)’
In the expression: (offset >=# 0#) && (check ==# ord_c)
In the expression:
if (offset >=# 0#) && (check ==# ord_c) then
alexIndexInt16OffAddr alex_table offset
else
alexIndexInt16OffAddr alex_deflt s
src/Language/JavaScript/Parser/Lexer.hs:816:48:
Couldn't match expected type ‛Bool’ with actual type ‛Int#’
In the second argument of ‛(&&)’, namely ‛(check ==# ord_c)’
In the expression: (offset >=# 0#) && (check ==# ord_c)
In the expression:
if (offset >=# 0#) && (check ==# ord_c) then
alexIndexInt16OffAddr alex_table offset
else
alexIndexInt16OffAddr alex_deflt s
Failed to install language-javascript-0.5.9
while cleaning the package first, forcing it to rebuild the lexer, does install:
$ cabal unpack language-javascript
$ cd language-javascript-0.5.9
$ cabal clean
$ cabal install
It should work if you rebuild the included parser with alex 3.1.3
The current .cabal file specifies HUnit < 1.3.
While walk through a yesod tutorial, cabal install stumble over your package. As I'm not experienced with neither haskell nor cabal, I don't know how build dependencies are handled in this system: But is it possible to install happy and alex automatic, if missing?
In javascript, this code;
var f = function() {
return
'value';
}
should be parsed like;
var f = function() {
return;
'value';
}
The current parser seems to parse it as if the string 'value'
is part of the return statement, which it (somewhat surprisingly) is not.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.