GithubHelp home page GithubHelp logo

ideas's People

Watchers

Ola Fosheim Grøstad avatar

ideas's Issues

String Interpolation Performance Concerns

String Interpolation Performance Concerns

Buffering and Buffer Size Estimators

TODO

Computing the Formatted Result

Redundant Computations of Unused Result

Interpolated strings is a convenient feature in the context of logging. Logging functions often support logging levels, where you can adjust how fine grained logging you want. If interpolated strings are regular expressions, then they will be computed even if the logging of them are turned off as there could be side effects. This goes against the goal of having no extra work done when logging is turned off.

  • One option is to define interpolated strings in such a way as the execution of expressions being triggered by whether the result is used or not. Like asserts in C.
  • Another option is to introduce call-by-name parameters (equivalent of lambdas and delegates, but hidden). This is considered to be a bad feature in language design as the expression might be evaluated multiple times in the function that use call-by-name parameters, and this is invisible at the call site.

Cache Pressure

Two conflicting goals:

  1. Maximize performance of an individual string interpolation.
  2. Keep pressure on instruction caches low.

In the case of widespread logging with interpolated strings it might be desirable to recognize cases where the formatting without string constants is similar so that they can be factored out and merged into a shared "logging function" with embedded formatting code.

String Interpolation Background Sketch

String Interpolation Background Sketch

An interpolated string is a string with embedded expression slots that are capable of producing objects that can be turned into strings, the overall result of evaluating an interpolated string is a string-like object. The output from an interpolated string is thus either a string or something that can be converted into a string. Some possible technical alternatives follows, but only option 1 is considered at this point.

The compiler may transform the interpolated string into:

  1. a series of expressions creating strings that are concatenated with a builtin binary or n-ary string concatenation operator. This approach may cause significant overhead as several temporary strings may be generated at runtime.
  2. a series of statements that write directly into a size-estimated buffer that is preallocated (might involve putting limitations on expressions)
  3. a series of statements that write directly into a large user-specified preallocated buffer.
  4. a series of statements that writes into a list of buffer-blocks with the ability to add new blocks.
  5. as series of statements that calls into a user-provided string-builder object.
  6. a tree of formatting-capable objects that can be either converted into a string by a standard conversion or consumed directly by library functions.
  7. a standardized unexpanded serialization-encoding for performant free-form logging.
  8. a form that allows runtime translation and localization of substrings and number formatting.

One goal for this proposal is to come up with a design and syntax that support all these technical string-building approaches as possible future language extensions with minimal changes to syntax and user-code.

Scenarios with Special Needs

Please note: the following scenarios do not define what string-interpolation should support, nor do they define what is important. The scenarios are meant to explore situations where a user would want some kind of templated textual expression that put extra demands on design and implementation.

  1. Formatting strings for console output assumes fixed-width font layout and requires the formatting to preappend or postappend spaces in order to get multi-line columns that are left aligned, right aligned or centered. Formatting has to recognize unicode-runes with zero-width and how code-points are composed in order to get proper alignment.
  2. Formatting strings at compile time require formatters that are compile time compatible, so there might be a need to be able to let the type system distinguish between static resolvable and runtime resolvable interpolated strings.
  3. Formatting strings with expressions slots containing strings in encodings different from Carbon strings require a transcoding pass. Example: Some Windows APIs use a 16 bit encoding called UCS-2.
  4. Building documents may involve a large number of writes to the same buffer or a chain of buffer-blocks. That makes performance especially important and it might be desirable to support speculative writes that can fail (not size estimation) or providing the writer of string-interpolated expression the means to add new blocks to the buffer-chain.
  5. Formatting strings for user interfaces may require combining localization of numeric formatting for natural language text and non-localized formatting for technical formatting of code and similar. Example: In <price salestax="0.25" lang="se">21,25SEK</price> we see that the sales-tax rate is a "technical" value used for computation and the content is a "localized" value for display.
  6. Internationalization (i18n) requires a solution for marking string constants so that they can be extracted, translated, and turned into a format that can be loaded at runtime. The expression slots should be named for easy translation and reordering. Example: $"Pay {price*tax:.2|price} to {iban[id]:|account}." could yield Pay {price} to {account}. for translation. After translation it could be broken down to ("Pay ", 0, " to ", 1, ".") or something similar.
  7. Formatting for data-exchange formats may benefit from automatic escaping of character sequences that are specific to the application, e.g. attributes in HTML elements. This could allow skipping a separate escape-encoding-pass. This feature is common in dedicated text-templating solutions.
  8. Formatting for efficient logging where the computation cost for formatting and string processing is moved out of the logging expression by allowing delayed formatting, either by passing it to a different thread, process, server or by writing to a temporary buffer. Example: Circular buffers are used for logs that record events and state leading up to an exceptional situation, such as a crash. Formatting could be stored in an intermediate format so that string formatting is delayed until the log is extracted from the circular buffer.

Technical Constraints

C++20 has added a formatting protocol and also provides formatters for standard numbers, using a formatting spec close to the one used by Python. In addition formatters for chronos have been added. We should expect third party libraries to add formatters for their types as C++20 gain traction.

Interpolated strings in Carbon should be able to generate the input that the C++20 formatter protocol expects.

C++20 formatters may take additional parameters. Example syntax: ${n:{width}} where width is a variable specifying text-field-width.

A formal description of the Formatter protocol can be found in the C++20 draft N4860 , §20.20.5, page 730.

Other Technical Requirements

  • Consider custom formatting that may need to be localized, such as currency.
  • Put constraints on jumping out of a string literals.

Usability

System level developers are expected to use other languages for scripting and may use string-formatting only on occasion in system programming code. String formatting is common in scripting code and it might lower the cognitive load if the formatting specification for key types, such as floating point, is consistent with commonly used system level scripting tools. Familiarity with Python and/or C++ is relevant in this context. This suggests that the C++20 formatting spec for numbers should be considered.

Many languages mark expressions in interpolated strings with {…}, ${…}, #{…} or some other variation based on braces. While this may create more visual noise for data-exchange formats such as JSON, it also is easily recognizable marker given Carbon syntax for blocks and what has been suggested for lambdas. Furthermore JSON data should probably not be built using interpolated strings but by using a dedicated library. {…} is also consistent with Python 3 string interpolation, so it is familiar to many developers who use Python for scripting and building.

There is a need to escape } inside an interpolated string. Python and C++20 are using }}, but it is unclear if this will cause visual confusion when using Carbon expressions. A conservative alternative is to use \}.

Support for invisible context parameters for Carbon has been suggested, this might be relevant for interpolated strings. It could be used for selecting the i18n context and also for providing buffers, allocators, or string-writers. An explicit form could be either $(context)"…" or $[context]"".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.