ryanfleury / metadesk Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
I've been using the official reference to generate some Metadesk bindings for another language, and I've found a few issues through that process:
MD_PushNewReference
is missing the arena
parameter.arg_string
in MD_TagArgFromString
is listed as type int
, when it should be MD_String8
.MD_MakeDetachedError
is listed in the reference, but does not exist at all in the source.string_hash
field on MD_Node
that does not exist in the source.(I'm sure this is not exhaustive, but this is what I've found so far.)
This could totally be something I am doing wrong, but when trying to get set up with metadesk this afternoon, it seems like there are some compiler warnings in md.c that aren't caused by my usage
Warnings 4244, 4457, and 4456 on lines 1027, 1463, 1711, 1779, 3409, 3515, 3783, 3855, and 4022.
I just cloned the repository today. I'm using the VS2015 C++ compiler (with /W4 but with warnings 4130, 4201, 4324, 4458, 4505, 4996, and 4127 disabled so these warnings may also be present but I'm not seeing them).
Let me know if you want any more info or if it's potentially just my own mistakes or something.
MD_ReconstructionFromNode in out
list returns pointers to local variables:
https://github.com/Dion-Systems/metadesk/blob/23289461d249c787ff631ba1f82535b20acefe68/source/md.c#L4137 and
https://github.com/Dion-Systems/metadesk/blob/23289461d249c787ff631ba1f82535b20acefe68/source/md.c#L4170
This means that anybody using list outside will potentially read garbage values. This happens in sanity_tests.c
tests when it tries to join strings immediately afterwards (but in current code it will return correct strings because stack values are not yet overwritten):
If you want to trigger this error, then inserting following piece of code after line 845 (directly below call to MD_ReconstructionFromNode
) will produce corrupted string and test will fail:
{
// overwrite 4KB of stack with 0xaa
char* stack = (char*)_alloca(4096);
for (volatile int k = 0; k < 4096; k++) stack[k] = 0xaa;
}
I have potential fix for this here: mmozeiko@a18850f
It changes char variables to string literals. If you're ok with such change.
MD_DecodeCodepointFromUtf16
incorrectly calculates codepoints greater than 0xFFFF because it does not offset by 0x10000.
Adding 0x10000 to the end of the codepoint calculation should fix the issue:
if (1 < max && 0xD800 <= out[0] && out[0] < 0xDC00 && 0xDC00 <= out[1] && out[1] < 0xE000)
{
result.codepoint = ((out[0] - 0xD800) << 10) | (out[1] - 0xDC00) + 0x10000;
result.advance = 2;
}
Reference: Step 5 for Decoding UTF-16
MD_Utf8FromCodepoint
sets the first byte incorrectly when the codepoint requires four bytes because it left-bitshifts MD_bitmask4
by 3 rather than 4.
MD_bitmask4
is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).
Bitshifting by 4 instead of 3 should fix the issue:
else if (codepoint <= 0x10FFFF)
{
out[0] = (MD_bitmask4 << 4) | ((codepoint >> 18) & MD_bitmask3);
out[1] = MD_bit8 | ((codepoint >> 12) & MD_bitmask6);
out[2] = MD_bit8 | ((codepoint >> 6) & MD_bitmask6);
out[3] = MD_bit8 | ( codepoint & MD_bitmask6);
advance = 4;
}
MD_S8ChopWhitespace
does not chop newlines off the end of a string, because it checks using MD_CharIsSpace
, which does not include newlines, because newlines are handled differently within the parser.
This is weird and counterintuitive, since newlines are obviously whitespace. I think the root of the problem is that MD_CharIsSpace
probably should recognize newlines as well, but currently cannot do so because newlines are special.
My suggestion would be to keep MD_CharIsSpace
, but have it use a more conventional notion of whitespace (probably just anything recognized by isspace
?) Any parser code can simply use other internal functions that are not part of the public API.
On Windows, VirtualAlloc
always returns a NULL
pointer on failure. According to the mmap(2)
from the linux man pages, mmap
actually returns MAP_FAILED
on failure which is defined as (void *) -1
.
This means that there is actually a pretty big discrepancy between their failure values. It doesn't actually cause any issues in the code because in either case any failure is caught by the commit call on line 461. Probably worth fixing though because it could cause a big surprise in the future.
hello. i came across your project and was contemplating implementing a parser in Nim. i did however notice there isn't really a syntax guide so i read the API docs and available examples and attempted a draft of one.
Node: a node associates some flags, zero or more children, zero or more tags, and a textual identifier together. Flags store the particular type of an node the parser belives the text is (such as a number) and contextual information (such as if it comes before or after a comma.)
Tags: a tag is a node that begins with a @
. Tags may also be directly followed by one block which contains the arguments of the tag. Zero or more tags are written before the node they belong to.
Blocks: blocks begin with an opening symbol, contain zero or more ((20221031041407-jifzs9e "nodes")), and end with a closing symbol. The opening and closing symbols do not need to match but the symbol used is recorded as a flag so the programmer may enforce a standard if so chosen.
Opening symbols: {, (, and [
Closing symbols: }, ), and ]
Implied block: The character :
after a node's name indicates an implied block. An implied block may consist of a single Block or it may consist of zero or more nodes until the first separator encountered. This allows two forms of syntax to exist: foo: bar baz, ...
and foo: { ... }
.
Separator: ;
, or ,
. A separator punctuates a list. Which separator was used is recorded in a node's flags so the programmer may enforce a standard if chosen.
Flags: flags hold special meanings attributed to a node. This can be whether the parser things the text represents a number, a string, what kind of boundary characters were used for the string, if it came before a comma or semicolon, and so on.
Line comments: a line comment begins with //
and goes until a new line character.
Block comments: a block comment begins with /*
until */
is read. Block comments also nest so for every /*
there must exist a matching */
.
Comments: comments come in line or block forms. Comments are read during parsing but are thrown away. They exist for authors to keep notes to themselves that are of no interest to the computer.
Escape codes: an escape code is a pair of \
followed by some other symbols. For example \\
means the backslash is itself escaped and so should be replaced with a single \
character in the output. This is for strings.
Strings: a string begins and ends with a boundary character. All text between the boundaries are stored within a single node as the node's text. If the boundary characters are entered in triplicate the string is allowed to contain multiple lines. Escape codes are allowed to insert special characters that cannot otherwise be entered. Single quotes, double quotes, and backticks are allowed as boundaries or triple boundaries.
foo: {
bar baz
}
foo
{
bar
and baz
inside the group}
foo
. Since we have triggered the special rule, though, we move the children individually to foo
rather than adding the group as its own node.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.