ryanfleury / metadesk Goto Github PK

View Code? Open in Web Editor NEW

317.0 317.0 19.0 721 KB

License: MIT License

Roff 0.82% C 91.24% C++ 0.13% Shell 7.81%

metadesk's People

Contributors

Stargazers

Watchers

Forkers

thephtest cedric-h dreamcat4 haifenghuang devdoshi bvisness linecode mundusnine dontbelieveme cbuttner mdodis fafok29 aspurdy mmozeiko risecai dbechrd fishcken astrolemonade

metadesk's Issues

Errors in the API reference

I've been using the official reference to generate some Metadesk bindings for another language, and I've found a few issues through that process:

MD_PushNewReference is missing the arena parameter.
arg_string in MD_TagArgFromString is listed as type int, when it should be MD_String8.
MD_MakeDetachedError is listed in the reference, but does not exist at all in the source.
The reference lists a string_hash field on MD_Node that does not exist in the source.

(I'm sure this is not exhaustive, but this is what I've found so far.)

This could totally be something I am doing wrong, but when trying to get set up with metadesk this afternoon, it seems like there are some compiler warnings in md.c that aren't caused by my usage

Warnings 4244, 4457, and 4456 on lines 1027, 1463, 1711, 1779, 3409, 3515, 3783, 3855, and 4022.
I just cloned the repository today. I'm using the VS2015 C++ compiler (with /W4 but with warnings 4130, 4201, 4324, 4458, 4505, 4996, and 4127 disabled so these warnings may also be present but I'm not seeing them).

Let me know if you want any more info or if it's potentially just my own mistakes or something.

MD_ReconstructionFromNode returns pointers to local variables

MD_ReconstructionFromNode in out list returns pointers to local variables:
https://github.com/Dion-Systems/metadesk/blob/23289461d249c787ff631ba1f82535b20acefe68/source/md.c#L4137 and
https://github.com/Dion-Systems/metadesk/blob/23289461d249c787ff631ba1f82535b20acefe68/source/md.c#L4170

This means that anybody using list outside will potentially read garbage values. This happens in sanity_tests.c tests when it tries to join strings immediately afterwards (but in current code it will return correct strings because stack values are not yet overwritten):

https://github.com/Dion-Systems/metadesk/blob/23289461d249c787ff631ba1f82535b20acefe68/tests/sanity_tests.c#L846

If you want to trigger this error, then inserting following piece of code after line 845 (directly below call to MD_ReconstructionFromNode) will produce corrupted string and test will fail:

            {
                // overwrite 4KB of stack with 0xaa 
                char* stack = (char*)_alloca(4096);
                for (volatile int k = 0; k < 4096; k++) stack[k] = 0xaa;
            }

I have potential fix for this here: mmozeiko@a18850f
It changes char variables to string literals. If you're ok with such change.

Unicode decoding and encoding bugs for codepoints greater than 0xFFFF

MD_DecodeCodepointFromUtf16 incorrectly calculates codepoints greater than 0xFFFF because it does not offset by 0x10000.

Adding 0x10000 to the end of the codepoint calculation should fix the issue:

if (1 < max && 0xD800 <= out[0] && out[0] < 0xDC00 && 0xDC00 <= out[1] && out[1] < 0xE000)
{
    result.codepoint = ((out[0] - 0xD800) << 10) | (out[1] - 0xDC00) + 0x10000;
    result.advance = 2;
}

Reference: Step 5 for Decoding UTF-16

MD_Utf8FromCodepoint sets the first byte incorrectly when the codepoint requires four bytes because it left-bitshifts MD_bitmask4 by 3 rather than 4.
MD_bitmask4 is the value 0x0F (in binary 1111), and the first byte in UTF-8 of codepoints greater than 0xFFFF should start with the binary 11110 (which would then get bitshifted by 3 so the remaining 3 bits can hold codepoint info).

Bitshifting by 4 instead of 3 should fix the issue:

else if (codepoint <= 0x10FFFF)
{
    out[0] = (MD_bitmask4 << 4) | ((codepoint >> 18) & MD_bitmask3);
    out[1] = MD_bit8 | ((codepoint >> 12) & MD_bitmask6);
    out[2] = MD_bit8 | ((codepoint >>  6) & MD_bitmask6);
    out[3] = MD_bit8 | ( codepoint        & MD_bitmask6);
    advance = 4;
}

MD_S8ChopWhitespace does not chop newlines

MD_S8ChopWhitespace does not chop newlines off the end of a string, because it checks using MD_CharIsSpace, which does not include newlines, because newlines are handled differently within the parser.

This is weird and counterintuitive, since newlines are obviously whitespace. I think the root of the problem is that MD_CharIsSpace probably should recognize newlines as well, but currently cannot do so because newlines are special.

My suggestion would be to keep MD_CharIsSpace, but have it use a more conventional notion of whitespace (probably just anything recognized by isspace?) Any parser code can simply use other internal functions that are not part of the public API.

Discrepancy between MD_WIN32_Reserve and MD_LINUX_Reserve return value on failure

On Windows, VirtualAlloc always returns a NULL pointer on failure. According to the mmap(2) from the linux man pages, mmap actually returns MAP_FAILED on failure which is defined as (void *) -1.

This means that there is actually a pretty big discrepancy between their failure values. It doesn't actually cause any issues in the code because in either case any failure is caught by the commit call on line 461. Probably worth fixing though because it could cause a big surprise in the future.

Formal syntax guide

hello. i came across your project and was contemplating implementing a parser in Nim. i did however notice there isn't really a syntax guide so i read the API docs and available examples and attempted a draft of one.

Node: a node associates some flags, zero or more children, zero or more tags, and a textual identifier together. Flags store the particular type of an node the parser belives the text is (such as a number) and contextual information (such as if it comes before or after a comma.)

Tags: a tag is a node that begins with a @. Tags may also be directly followed by one block which contains the arguments of the tag. Zero or more tags are written before the node they belong to.

Blocks: blocks begin with an opening symbol, contain zero or more ((20221031041407-jifzs9e "nodes")), and end with a closing symbol. The opening and closing symbols do not need to match but the symbol used is recorded as a flag so the programmer may enforce a standard if so chosen.

Opening symbols: {, (, and [

Closing symbols: }, ), and ]

Implied block: The character : after a node's name indicates an implied block. An implied block may consist of a single Block or it may consist of zero or more nodes until the first separator encountered. This allows two forms of syntax to exist: foo: bar baz, ... and foo: { ... }.

Separator: ;, or ,. A separator punctuates a list. Which separator was used is recorded in a node's flags so the programmer may enforce a standard if chosen.

Flags: flags hold special meanings attributed to a node. This can be whether the parser things the text represents a number, a string, what kind of boundary characters were used for the string, if it came before a comma or semicolon, and so on.

Line comments: a line comment begins with // and goes until a new line character.

Block comments: a block comment begins with /* until */ is read. Block comments also nest so for every /* there must exist a matching */.

Comments: comments come in line or block forms. Comments are read during parsing but are thrown away. They exist for authors to keep notes to themselves that are of no interest to the computer.

Escape codes: an escape code is a pair of \ followed by some other symbols. For example \\ means the backslash is itself escaped and so should be replaced with a single \ character in the output. This is for strings.

Strings: a string begins and ends with a boundary character. All text between the boundaries are stored within a single node as the node's text. If the boundary characters are entered in triplicate the string is allowed to contain multiple lines. Escape codes are allowed to insert special characters that cannot otherwise be entered. Single quotes, double quotes, and backticks are allowed as boundaries or triple boundaries.

Examples

foo: {
	bar baz
}

Create a node called foo
Open an implicit grouping
Create an explicit group with {
Create nodes bar and baz inside the group
Close the explicit group with }
Since we are in an implicit group we add the newly created group node as a child of foo. Since we have triggered the special rule, though, we move the children individually to foo rather than adding the group as its own node.

ryanfleury / metadesk Goto Github PK

metadesk's People

Contributors

Stargazers

Watchers

Forkers

metadesk's Issues

Errors in the API reference

Compiler Warnings in md.c?

MD_ReconstructionFromNode returns pointers to local variables

Unicode decoding and encoding bugs for codepoints greater than 0xFFFF

MD_S8ChopWhitespace does not chop newlines

Discrepancy between MD_WIN32_Reserve and MD_LINUX_Reserve return value on failure

Formal syntax guide

Examples

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs