Base repo for setting up to run tests on a machine. It is meant to be followed up with a push of code to test after the system has been setup.
-- notes on comparing the JVM UTF-8 decoder with others --
The JVM runs way faster when you get rid of unnessisary heap. For AWS Lambda
# JAVA_TOOL_OPTIONS environment variable on AWS Lambda
/usr/bin/time -v java -cp target/scala-2.12/hello-scala.jar coreutils.WC UTF_8_test.txt
/usr/bin/time -v java -Xms1m -cp target/hello-scala.jar coreutils.WC UTF_8_test.txt
SpiderMonkey Unicode seems to be using harfbuzz heavily. Harfbuzz unicode classification seems to be the most up to date.
GNU/BSD wc are using
#include <locale.h>
#include <wchar.h>
mbstate_t state;
//expands characters from narow to wide format based on locale
size_t mbrtowc( wchar_t *restrict pwc, const char *restrict s, size_t n,
mbstate_t *restrict ps );
latest Apple libc which calls a locale specific mbrtowc
.Net Runtime UTF8 on Windows 10 the path for the debug binary of the C library is C:\WINDOWS\system32\ucrtbased.dll.