GithubHelp home page GithubHelp logo

pmx's Introduction

pmx: Node.js postmortem export library

This is a very early implementation of a library to emit a postmortem export file in the format described by the Node.js Postmortem Working Group.

The expectation is that this will eventually have at least two backends:

  • a JSON-based backend that just emits JSON objects representing the data
  • a sqlite-based backend that records the data in a sqlite database

TODO:

  • implement the actual export functions
  • build a small test suite

pmx's People

Contributors

davepacheco avatar

Stargazers

Luca Maraschi avatar

Watchers

Julien Gilli avatar  avatar

pmx's Issues

Notes on string representation

(I'm recording these notes from August, 2017 for future reference.)

JSON:
- http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
- how are strings encoded? See standard. It's non-trivial.
- Object keys are encoded as regular strings.

sqlite:
- how are strings encoded?
  - Looks like it can be UTF-8, UTF-16BE, or UTF-16LE.
    https://www.sqlite.org/datatype3.html
    This is determined by the open function that's used
    (https://www.sqlite.org/c3ref/open.html).
  - text is converted on-the-fly (https://www.sqlite.org/version3.html)
  - sql processed as utf-8 (implementation detail)

V8/JavaScript:
- Does JavaScript say how strings are encoded?
  Looks like UCS-2: https://mathiasbynens.be/notes/javascript-encoding
  BUT WITH SURROGATE PAIRS!
- Does V8 describe more precisely how strings are encoded in memory?
  - Two-byte string: each character is a two-byte uint16_t, and it's "Unicode"
  - ExternalTwoByteStrings appear to be explicitly UTF-16, but I think it's
    really UCS-2 because the characters really are only two bytes each.
  - All strings can contain null bytes
  - Elsewhere in objects.h, it says strings are either ASCII or two-byte "UC16",
    which isn't a thing.

Design points:
- what kind of strings do we want the libpmx utilities to consume? UTF-8?
  - for labels?
  - for strings?
- if so, how do consumers provide those strings?
- what if the string inside the C program isn't valid?
- representation inside the JavaScript program
- representation in JSON/sqlite
- do we want to have to re-encode every string?


Options:
- For strings represented as ASCII strings inside V8, can use a function that
  takes an array of ASCII characters (with a length), and just emit them.  Maybe
  try to deal with non-7-bit-clean characters.  Definitely deal with embedded
  NULs.
- For strings represented as two-byte strings inside V8, we can either convert
  them to UTF-8 for export or we can base64-encode the UCS-2/UTF-16 value.  I
  lean towards converting to UTF-8.  The downside is perf (but base64-encode is
  going to be crappy too) and potential loss of information (e.g., memory
  usage).
- For both cases, we want to provide metadata indicating:
  - the string contained invalid characters (e.g., non-7-bit-clean ASCII or an
    unconvertable UCS-2/UTF-16 value), in which case the UTF-8 representation is
    the closest approximation but may be truncated or otherwise incomplete.
  - the length (however V8/JavaScript represent this, which I think will be in
    code units).
  - the original string representation, so that the consumer can determine
    memory usage?

For sqlite, we could store the UTF-16 value if we wanted, but I don't think
there's any advantage.  The underlying strings are probably UCS-2 anyway --
would they need a conversion?  I say we just do the same thing as we do for
JSON.

Remaining open questions:
- What's the best way to convert JavaScript-flavor UCS-2 to UTF-8 in C?
  - Can I use the mbtowc and wctomb functions?  Am I not supposed to use wchar?
    Is the encoding (UCS-2 vs. UTF-8) expressed in the locale, and I need to
    specify which locale I want?
  - Should I be using something else instead?
    - See the implementation of: ToCString() in objects.cc, which uses
      Utf8::Encode in unicode-inl.h.
      Looks like they've got their own non-trivial implementation :-/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.