oftn-oswg / coca Goto Github PK
View Code? Open in Web Editor NEWAn implementation of C in JavaScript.
License: Other
An implementation of C in JavaScript.
License: Other
If a string literal or character constant is preceded by the capital letter L, as in L"Hello, world"
and L'a'
, then they need to be recognized as such. In that case, read_string_literal or read_character_constant should have true
be passed as its first (and only) argument.
Tokenizer.prototype.nextch
should recognize trigraph sequences and instead return the correct replacement.
Before any other processing takes place, each occurrence of one of the following
sequences of three characters (called trigraph sequences12)) is replaced with the
corresponding single character.
??= #
??( [
??/ \
??) ]
??' ^
??< {
??! |
??> }
??- ~
Digraphs are easier to add since the punctuators are constructed automatically as a trie structure and can be added with the add() method of Token.punctuators
.
Number tokenizing is very long and complex. As of now there isn't a suitable token type for them. The long long
needs to be represented with 64-bit integers, which JS does not support natively.
Here's some old code with doesn't tokenize, but goes through the basic steps of tokenizing. It is untested.
https://github.com/oftn/coca/blob/4ac315fb329a09871277bbe3a033dc79488993c6/CParse.js
The save() and restore() method of keeping track of the cursor is very inefficient. It may be best to implement unget-type functions. In the end the tokenizer might have to be re-thought out to be more efficient, but this is low priority.
Likely in read_string_literal
, near line 252 of src/Tokenizer.js
.
It should raise an error.
To fix: Tokenizer.prototype.nextch
should recognize a high surrogate, calculate the character from the character (assuming its a low surrogate), and advance the cursor as required.
To get the character code of a surrogate pair: ((hi - 0xD800) * 0x400) + (lo - 0xDC00) + 0x10000
ch()
will return 0 on EOF, which it will also return on a NUL byte. This should be changed to -1
.
Some of the Unicode code points will be larger than 0xFFFF which String.fromCharCode can't handle. The array must first be traversed and characters larger than 0xFFFF need to be broken up into 2 elements with this formula:
var hi, lo;
hi = Math.floor((ch - 0x10000) / 0x400) + 0xD800;
lo = ((ch - 0x10000) % 0x400) + 0xDC00;
Or should it? I'm not sure. It should probably advance the cursor twice, but the column only once.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.