influxdata / influxdb_iox Goto Github PK

Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.

License: Apache License 2.0

Rust 99.68% Shell 0.09% Python 0.03% Dockerfile 0.02% PLpgSQL 0.03% Java 0.15% Makefile 0.01%

influxdb_iox's Introduction

InfluxDB IOx

InfluxDB IOx (short for Iron Oxide, pronounced InfluxDB "eye-ox") is the core of InfluxDB, an open source time series database. The name is in homage to Rust, the language this project is written in. It is built using Apache Arrow and DataFusion among other technologies. InfluxDB IOx aims to be:

The core of InfluxDB; providing industry standard SQL, InfluxQL, and Flux
An in-memory columnar store using object storage for persistence
A fast analytic database for structured and semi-structured events (like logs and tracing data)
A system for defining replication (synchronous, asynchronous, push and pull) and partitioning rules for InfluxDB time series data and tabular analytics data
A system supporting real-time subscriptions
A processor that can transform and do arbitrary computation on time series and event data as it arrives
An analytic database built for data science, supporting Apache Arrow Flight for fast data transfer

Persistence is through Parquet files in object storage. It is a design goal to support integration with other big data systems through object storage and Parquet specifically.

For more details on the motivation behind the project and some of our goals, read through the InfluxDB IOx announcement blog post. If you prefer a video that covers a little bit of InfluxDB history and high level goals for InfluxDB IOx you can watch Paul Dix's announcement talk from InfluxDays NA 2020. For more details on the motivation behind the selection of Apache Arrow, Flight and Parquet, read this.

Platforms

Our current goal is that the following platforms will be able to run InfluxDB IOx.

Linux x86 (x86_64-unknown-linux-gnu)
Darwin x86 (x86_64-apple-darwin)
Darwin arm (aarch64-apple-darwin)

Project Status

This project is in active development, which is why we're not producing builds yet.

If you would like contact the InfluxDB IOx developers, join the InfluxData Community Slack and look for the #influxdb_iox channel.

We're also hosting monthly tech talks and community office hours on the project on the 2nd Wednesday of the month at 8:30 AM Pacific Time.

Get started

Install dependencies
Clone the repository
Configure the server
Compiling and Running (You can also build a Docker image to run InfluxDB IOx.)
Write and read data
Use the CLI
Use InfluxDB 2.0 API compatibility
Run health checks
Manually call the gRPC API

Install dependencies

To compile and run InfluxDB IOx from source, you'll need the following:

Rust
Clang
lld (on Linux)
protoc (on Apple Silicon)
Postgres

Rust

The easiest way to install Rust is to use rustup, a Rust version manager. Follow the instructions for your operating system on the rustup site.

rustup will check the rust-toolchain file and automatically install and use the correct Rust version for you.

C/C++ Compiler

You need some C/C++ compiler for some non-Rust dependencies like zstd.

lld

If you are building InfluxDB IOx on Linux then you will need to ensure you have installed the lld LLVM linker. Check if you have already installed it by running lld -version.

lld -version
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead

If lld is not already present, it can typically be installed with the system package manager.

protoc

Prost no longer bundles a protoc binary. For instructions on how to install protoc, refer to the official gRPC documentation.

IOx should then build correctly.

Postgres

The catalog is stored in Postgres (unless you're running in ephemeral mode). Postgres can be installed via Homebrew:

brew install postgresql

then follow the instructions for starting Postgres either at system startup or on-demand.

Clone the repository

Clone this repository using git. If you use the git command line, this looks like:

git clone [email protected]:influxdata/influxdb_iox.git

Then change into the directory containing the code:

cd influxdb_iox

The rest of these instructions assume you are in this directory.

Configure the server

InfluxDB IOx can be configured using either environment variables or a configuration file, making it suitable for deployment in containerized environments.

For a list of configuration options, run influxdb_iox --help, after installing IOx. For configuration options for specific subcommands, run influxdb_iox <subcommand> --help.

To use a configuration file, use a .env file in the working directory. See the provided example configuration file. To use the example configuration file, run:

cp docs/env.example .env

Compiling and Running

InfluxDB IOx is built using Cargo, Rust's package manager and build tool.

To compile for development, run:

cargo build

To compile for release and install the influxdb_iox binary in your path (so you can run influxdb_iox directly) do:

# from within the main `influxdb_iox` checkout
cargo install --path influxdb_iox

This creates a binary at target/debug/influxdb_iox.

Build a Docker image (optional)

Building the Docker image requires:

Docker 18.09+
BuildKit

To enable BuildKit by default, set { "features": { "buildkit": true } } in the Docker engine configuration, or run docker build withDOCKER_BUILDKIT=1

To build the Docker image:

DOCKER_BUILDKIT=1 docker build .

Local filesystem testing mode

InfluxDB IOx supports testing backed by the local filesystem.

Note

This mode should NOT be used for production systems: it will have poor performance and limited tuning knobs are available.

To run IOx in local testing mode, use:

./target/debug/influxdb_iox
# shorthand for
./target/debug/influxdb_iox run all-in-one

This will start an "all-in-one" IOx server with the following configuration:

File backed catalog (sqlite), object store, and write ahead log (wal) stored under <HOMEDIR>/.influxdb_iox
HTTP v2 api server on port 8080, querier gRPC server on port 8082 and several ports for other internal services.

You can also change the configuration in limited ways, such as choosing a different data directory:

./target/debug/influxdb_iox run all-in-one --data-dir=/tmp/iox_data

Compile and run

Rather than building and running the binary in target, you can also compile and run with one command:

cargo run -- run all-in-one

Release mode for performance testing

To compile for performance testing, build in release mode then use the binary in target/release:

cargo build --release
./target/release/influxdb_iox run all-in-one

You can also compile and run in release mode with one step:

cargo run --release -- run all-in-one

Running tests

You can run tests using:

cargo test --all

See [docs/testing.md] for more information

Write and read data

Data can be written to InfluxDB IOx by sending line protocol format to the /api/v2/write endpoint or using the CLI.

For example, assuming you are running in local mode, this command will send data in the test_fixtures/lineproto/metrics.lp file to the company_sensors namespace.

./target/debug/influxdb_iox -vv write company_sensors test_fixtures/lineproto/metrics.lp --host http://localhost:8080

Note that --host http://localhost:8080 is required as the /v2/api endpoint is hosted on port 8080 while the default is the querier gRPC port 8082.

To query the data stored in the company_sensors namespace:

./target/debug/influxdb_iox query company_sensors "SELECT * FROM cpu LIMIT 10"

Use the CLI

InfluxDB IOx is packaged as a binary with commands to start the IOx server, as well as a CLI interface for interacting with and configuring such servers.

The CLI itself is documented via built-in help which you can access by running influxdb_iox --help

Use InfluxDB 2.0 API compatibility

InfluxDB IOx allows seamless interoperability with InfluxDB 2.0.

Where InfluxDB 2.0 stores data in organizations and buckets, InfluxDB IOx stores data in namespaces. IOx maps organization and bucket pairs to namespaces with the two parts separated by an underscore (_): organization_bucket.

Here's an example using curl to send data into the company_sensors namespace using the InfluxDB 2.0 /api/v2/write API:

curl -v "http://127.0.0.1:8080/api/v2/write?org=company&bucket=sensors" --data-binary @test_fixtures/lineproto/metrics.lp

Run health checks

The HTTP API exposes a healthcheck endpoint at /health

$ curl http://127.0.0.1:8080/health
OK

The gRPC API implements the gRPC Health Checking Protocol. This can be tested with grpc-health-probe:

$ grpc_health_probe -addr 127.0.0.1:8082 -service influxdata.platform.storage.Storage
status: SERVING

Manually call the gRPC API

To manually invoke one of the gRPC APIs, use a gRPC CLI client such as grpcurl. Because the gRPC server library in IOx doesn't provide service reflection, you need to pass the IOx .proto files to your client when making requests. After you install grpcurl, you can use the ./scripts/grpcurl wrapper script to make requests that use the .proto files for you--for example:

Use the list command to list gRPC API services:

./scripts/grpcurl -plaintext 127.0.0.1:8082 list

google.longrunning.Operations
grpc.health.v1.Health
influxdata.iox.authz.v1.IoxAuthorizerService
influxdata.iox.catalog.v1.CatalogService
influxdata.iox.compactor.v1.CompactionService
influxdata.iox.delete.v1.DeleteService
influxdata.iox.ingester.v1.PartitionBufferService
influxdata.iox.ingester.v1.PersistService
influxdata.iox.ingester.v1.ReplicationService
influxdata.iox.ingester.v1.WriteInfoService
influxdata.iox.ingester.v1.WriteService
influxdata.iox.namespace.v1.NamespaceService
influxdata.iox.object_store.v1.ObjectStoreService
influxdata.iox.schema.v1.SchemaService
influxdata.platform.storage.IOxTesting
influxdata.platform.storage.Storage

Use the describe command to view methods for a service:

./scripts/grpcurl -plaintext 127.0.0.1:8082 describe influxdata.iox.namespace.v1.NamespaceService

service NamespaceService {
  ...
  rpc GetNamespaces ( .influxdata.iox.namespace.v1.GetNamespacesRequest ) returns ( .influxdata.iox.namespace.v1.GetNamespacesResponse );
  ...
}

Invoke a method:

./scripts/grpcurl -plaintext 127.0.0.1:8082 influxdata.iox.namespace.v1.NamespaceService.GetNamespaces

{
  "namespaces": [
    {
      "id": "1",
      "name": "company_sensors"
    }
  ]
}

Contributing

We welcome community contributions from anyone!

Read our Contributing Guide for instructions on how to run tests and how to make your first contribution.

Architecture and Technical Documentation

There are a variety of technical documents describing various parts of IOx in the docs directory.

influxdb_iox's People

Contributors

Stargazers

Watchers

Forkers

zhulongcheng devopstoday11 pothulapati astradot fullstop000 isgasho aknuds1 zurlys liurenjie1024 amit2016-17 yew1eb cjp10 kfabryczny trucnguyenlam ansrivas tsdb-io linuxai-pku wolf4ood bamaao ming535 alvinccan mbstavola zhuang16384 ryan-git puzza007 mhall119 jeschkies brandonsov mapbased yutiansut doytsujin zofuthan seantbooker fengjiachun jzkongfu jpkrohling jeivardan mkmik baajur davidlee-nz rustbunker therustmonk mitch292 next-generation-search-engine krehl nevi-me sinhasantos goldenmetteyya nicochatzi mhtocs rahuahua decadevvv tzzed milanof-huma yliu0571 eliasyaoyc matthew-snyder pierwill enzo-liu yobol tisilent aprimadi monkey-tree forestofrain lvheyang sunng87 tatsuya6502 e-dard capkurmagati rust-zuiwanyuan emg110 1aguna jsheedy placrosse jackwener yjshen mtvu li-ang xudong963 yfaming phial3 zhangxi123051 ritchie46 re-gmbh mattsre junli1026 marwes netzdoktor aierui jaymebrd r4ntix jianchen2580 yorkart hsuchifeng isabella232 shanem123 pidb mrhaha1998 gitsrc hi-rustin

influxdb_iox's Issues

TSM Reader does not support escaped characters in keys

Right now the TSM Reader does not support escaped characters in TSM keys. The parser needs to be more sophisticated to do that. We are hitting some of these keys in Tools data, so the plan is to just skip these problematic keys for now until it's time to invest in making the parser compliant with the TSM key format.

Below is an example of a problematic TSM key. Note, the org/bucket id is base-16 encoded for clarity (it's not in the actual key). Further I replaced the special byte markers for measurement and field with \x00 and \xff for clarity.

844910ece80be8bc3c0bd4c89186ca89,\x00=query_log,env=prod01-eu-central-1,error=memory\ allocation\ limit\ reached:\ limit\ 2010000000\ bytes\,\ allocated:\ 2009999616\,\ wanted:\ 1152;\ memory\ allocation\ limit\ reached:\ limit\ 2010000000\ bytes\,\ allocated:\ 2009999616\,\ wanted:\ 1152,errorCode=invalid,errorType=user,host=queryd-algow-rw-56db6bbfcc-qg6tr,hostname=queryd-algow-rw-56db6bbfcc-qg6tr,nodename=ip-10-153-10-150.eu-central-1.compute.internal,orgID=2d653fea871432bf,ot_trace_sampled=false,role=queryd-algow-rw,source=Chrome,\xff=requeueDuration	requeueDuration

This key should be parsed into the following:

measurement: query_log

tagset: 
  * env = prod01-eu-central-1
  * error = memory\ allocation\ limit\ reached:\ limit\ 2010000000\ bytes\,\ allocated:\ 2009999616\,\ wanted:\ 1152;\ memory\ allocation\ limit\ reached:\ limit\ 2010000000\ bytes\,\ allocated:\ 2009999616\,\ wanted:\ 1152
  * errorCode = invalid
  * errorType = user
  * host = queryd-algow-rw-56db6bbfcc-qg6tr
  * hostname = queryd-algow-rw-56db6bbfcc-qg6tr
  * nodename = ip-10-153-10-150.eu-central-1.compute.internal
  * orgID = 2d653fea871432bf
  * ot_trace_sampled = false
  * role = queryd-algow-rw
  * source = Chrome

field: requeueDuration

chore: define multi-core / CPU resource allocation scheme

As discussed on #221 in the multi-threaded TSM mapper prototype, we need a strategy for handling I/O requests as well as CPU heavy requests in a manner that makes the most appropriate tradeoffs between resource utilization and availability

It should also define the pattern for how CPU heavy requests interact with async functions (aka how does an async function wait for a CPU heavy request to complete)

This strategy is likely important not only for ingest / data conversion, but also query.

This ticket tracks the work to write up in docs somewhere the CPU / execution strategy

Create Block Type

Create a block type and definition, along with generic methods for creating blocks from slices of timestamps and values. Any block meta data should be generated during the creation process.

Flesh out the rest of the Line Protocol Parser

There is a basic line parser now, but it's not complete. For example, it probably doesn't handle edge cases.

The current line protocol format is not perfect; there are bugs and weird rules, generally around how it handles escaping special values, and then how it handles backslash literals themselves.

We have a requirement to be strictly backward compatible, including any existing bugs.

There are tests for the Go implementation which should cover most of the edge cases:

There may also be some fuzz/quickcheck style tests somewhere.

Some differences:

Duplicate tag key/value pairs should return an error — you can't have two tags with the same key. There's no need to de-duplicate them.
Tag keys and field keys must be unique — you can't have a tag and a field with the same key.
time should be a reserved key — you shouldn't be able to use it for a tag or field.

The last performance test showed the Go line protocol parser at about 3M points/second.

Running some benchmarks in influxdb/models/ show roughly ~110 MB/s:

% go test -test.bench BenchmarkParsePointsWithOptions
goos: darwin
goarch: amd64
pkg: github.com/influxdata/influxdb/models
BenchmarkParsePointsWithOptions/line-protocol.txt/1-12         	    1136	   1037854 ns/op	 112.17 MB/s	  280930 B/op	    1167 allocs/op
BenchmarkParsePointsWithOptions/line-protocol.txt/315-12       	       3	 334336998 ns/op	 109.68 MB/s	85948802 B/op	  366663 allocs/op
PASS
ok  	github.com/influxdata/influxdb/models	3.254s

Reject conflicting field types in TSM Files

It is possible within Influx to have the same field under a measurement with different types. A simple example:

cpu,region=west value=100i
cpu value=2.33

They are different series, and field type checking is done at the series level so they would be allowed. However, because they're under the same measurement then in Delorean value would be sharing a column in the data model. That's not possible because there are different types.

We plan not to allow this to happen on ingest into Delorean, however for TSM -> Delorean converting for now we will simply drop conflicting fields and emit some warnings to the logs.

Right now encountering a field that conflicts will cause the delorean convert tool to panic.

`delorean convert` will return infinite errors if the the input data has an invalid field value and then more data

I discovered this while working on the parquet writer:

air_and_water.zip

When the file ends with a newline conversion generates an infinite stream of


cd /Users/alamb/Software/delorean && cargo build && ./target/debug/delorean convert tests/fixtures/lineproto/air_and_water.lp   /tmp
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
[2020-06-10T14:19:00Z INFO  delorean::commands::convert] dstool convert starting
[2020-06-10T14:19:00Z INFO  delorean::commands::convert] Read 1246 bytes from tests/fixtures/lineproto/air_and_water.lp
[2020-06-10T14:19:00Z INFO  delorean::commands::convert] Writing to output directory "/tmp"
[2020-06-10T14:19:00Z WARN  delorean::commands::convert] Ignorning line with parse error: No fields were provided
[2020-06-10T14:19:00Z WARN  delorean::commands::convert] Ignorning line with parse error: No fields were provided
[2020-06-10T14:19:00Z WARN  delorean::commands::convert] Ignorning line with parse error: No fields were provided
[2020-06-10T14:19:00Z WARN  delorean::commands::convert] Ignorning line with parse error: No fields were provided
...

Which is clearly not right

fix: Report bucket/location when relevant with object store errors

See https://github.com/influxdata/delorean/pull/134/files#r438680030 for discussion. delorean_object_store InternalErrors should contain bucket and location associated with the error to assist in debugging.

Rename 'timestamp' column to 'time' in delorean schema/parquet files to match flux

Given flux calls the timestamp column 'time' (e.g. https://github.com/influxdata/flux/blob/master/docs/new_flux_ideas_and_guide.md#working-with-data-in-flux) we should probably use the same name in delorean (it is currently timestamp)

Slack convo: https://influxdata.slack.com/archives/CRK9M8L5Q/p1592906778422000

Add Support for missing types (e.g String and bool) in Line Protocol Parser

The basic goal of this ticket is to implement the remaining data types described in
https://docs.influxdata.com/influxdb/v1.8/write_protocols/line_protocol_tutorial/
In the rust line protocol parser implementation

Here is an example LP data that does not parse today:

[2020-06-15T22:23:29Z DEBUG delorean_line_parser] Error parsing line: 'system,host=Andrews-MBP.hsd1.ma.comcast.net uptime_format="5 days,  7:06" 1591894340000000000'. Error was Err(FieldSetMissing)
[2020-06-15T22:23:29Z WARN  delorean::commands::convert] Ignorning line with parse error: No fields were provided

Error reading tsm data: Error decoding index entry TODO - io error: failed to fill whole buffer (Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }

As mentioned here, #111 (comment)

When I try to decode this tsm file aal.tsm.gz, which came from /Users/alamb/.influxdbv2/engine/data/000000000000010-000000001.tsm on my local checkout of influxdb, built from master, I get an error:

#117 contains a test with a reproducer:

cargo test --package delorean -- tsm
...
---- storage::tsm::tests::decode_tsm_blocks_cpu_usage stdout ----
thread 'storage::tsm::tests::decode_tsm_blocks_cpu_usage' panicked at 'Error decoding index entry TODO - io error: failed to fill whole buffer (Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" })', src/storag\
e/tsm.rs:700:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

bug: error reading certain parquet files from delorean in python with pandas/arrow

Repro:

Download the data in here and unzip to /tmp/nulls.lp
nulls.lp.zip

And then run

./target/debug/delorean convert /tmp/nulls.lp /tmp/good.parquet

The resulting file can't be read in python for some reason:

~/Software/virtual_envs/parquet/bin/python -c "import pandas as pa ; print(pa.read_parquet('/tmp/good.parquet'))"

alamb@MacBook-Pro delorean % ~/Software/virtual_envs/parquet/bin/python -c "import pandas as pa ; print(pa.read_parquet('/tmp/good.parquet'))"
~/Software/virtual_envs/parquet/bin/python -c "import pandas as pa ; print(pa.read_parquet('/tmp/good.parquet'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pandas/io/parquet.py", line 310, in read_parquet
    return impl.read(path, columns=columns, **kwargs)
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pandas/io/parquet.py", line 125, in read
    path, columns=columns, **kwargs
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pyarrow/parquet.py", line 1551, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pyarrow/parquet.py", line 1276, in read
    use_pandas_metadata=use_pandas_metadata)
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pyarrow/parquet.py", line 721, in read
    table = reader.read(**options)
  File "/Users/alamb/Software/virtual_envs/parquet/lib/python3.7/site-packages/pyarrow/parquet.py", line 337, in read
    use_threads=use_threads)
  File "pyarrow/_parquet.pyx", line 1130, in pyarrow._parquet.ParquetReader.read_all
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
OSError: Unknown encoding type.

Create int64 encoder and decoder

Support RLE and Simple8B compression.

dstool test failure

A test needs to have the binary built to successfully run:

running 2 tests
thread 'thread 'dstool_tests::convert_good_input_filenamedstool_tests::convert_bad_input_filename' panicked at '' panicked at 'called `Result::unwrap()` on an `Err` value: CargoError { cause: Some(NotFoundError { path: "/Users/edd/rust/delorean/target/debug/dstool" }) }called `Result::unwrap()` on an `Err` value: CargoError { cause: Some(NotFoundError { path: "/Users/edd/rust/delorean/target/debug/dstool" }) }', ', tests/dstool.rstests/dstool.rs::723::2323

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test dstool_tests::convert_bad_input_filename ... FAILED
test dstool_tests::convert_good_input_filename ... FAILED

failures:

failures:
    dstool_tests::convert_bad_input_filename
    dstool_tests::convert_good_input_filename

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to

End-to-end test server restart isn't waiting for the server to stop completely

I've seen this in CI a few times now; will take care of it.

refactor: switch from env_logger to tracing_subscriber for logging setup

As suggested by @carols10cents on #197 (review)

Goal is to switch from the env_logger crate to tracing_subscriber, which has more filtering features

Fix: add underlying error sources to storage errors

This was noticed by @pauldix and @carols10cents here: #131 (comment)

The basic problem is that errors from storage::Error are not captured -- see https://github.com/influxdata/delorean/blob/master/src/storage.rs#L66

So we see messages like "no such file or directory" but don't have the info about what file was being looked for. For example:

[2020-06-09T22:12:00Z WARN  delorean] Server shutdown with error: StorageError { description: "TODO - io error: No such file or directory (os error 2) (Os { code: 2, kind: NotFound, message: \"No such file or directory\" })"

Does not inform the user that ~/.delorean was not found.

Integrate Block type into in-memory database

Implement object storage wrapper library

We need to support multiple object stores for the durability layer. At the very least:

S3
Google Cloud Storage (objects)
Azure Blob Storage

And later we'll want to be able to support Minio and Ceph, but I assume those will be fine by just using the S3 API.

We'll need to support operations for put, get, delete and we'll need to be able to traverse the directory tree of object storage to see all the file listings given some prefix.

Evaluate different summary statistic data types (histogram)

We should support a summary statistic for rolled up data. Rather than storing min, max, sum, count, and different percentiles, we might look into having a single series rollup into a single summary statistic that can get serialized and deserialized and used depending on what kind of value a query is asking for.

Here's what I can think of to evaluate:

t-digest
HDRHistogram
DDSketch

Remove `unsafe` code that may lead to undefined behavior

I found two uses of unsafe which are invalid:

https://github.com/influxdata/delorean/blob/418b89a87b6a7e274c1e7e97b72e8d9a33004c26/src/storage/rocksdb.rs#L1051-L1053

https://github.com/influxdata/delorean/blob/418b89a87b6a7e274c1e7e97b72e8d9a33004c26/src/storage/rocksdb.rs#L1146-L1148

These convert a byte into an enum, but there's no guarantee that the enum is the size of a byte (missing #[repr(u8)]). There's also no guarantee what a specific enum variant's discriminant will be without specifying it, so the value might change between subsequent compilations.

Even if both of those cases were addressed, the code doesn't appear to guard that the values passed in are one of the valid values. Creating an enum with an invalid discriminant can lead to undefined behavior.

feat: Add a local filesystem based object store

In addition to the S3, GCS, and in-memory stores in the delorean_object_store crate. See #134 and #116 for info and inspiration.

Create a custom CI image to speed up builds

I feel you have outgrown the standard circleci/rust container. We could setup a nightly build pipeline that builds a new CI container with these dependencies. The pipeline will produce a container image that can be used in the CircleCI jobs.

— @BondAnthony

@BondAnthony what should our next steps be here?

Implement Block reader

I need to implement a Block reader. A block read can take a serialised block, e.g., in a file, and deserialise it into a Rust Block object.

Things to consider:

provide support for just reading the initial part of the block without the block summary and block data.
provide support for lazily reading block data (not decoding it but just storing compressed data).

test: Larger data sets and streaming with object stores

From this comment:

Given all the apis above work with async streams too, I wonder how well the 'data was sent / came back in multiple chunks' case is tested.

In particular I was thinking about these cases:

Input data stream: data in this function seems to be sent in a single chunk via stream_data below. I wonder if breaking it into two pieces ("arbitrary", "data") might add some additional coverage.

Data comes back in multiple chunks -- this will be hard to test without sending a significant number of bytes to AWS I suspect, which is likely not appropriate for unit tests.

The state machine for streaming list https://github.com/influxdata/delorean/pull/134/files#diff-e062a137534b1fbd6b7bfdd7d4dd9b75R282. I don't have a great idea of how to test that as it depends on AWS but I have written code before that works great in the small data (aka dev environments) but then had challenges in real situations where the data is larger and there were lurking bugs we couldn't cover it with tests that only appeared sporadically

I am not sure what to suggest here (I would probably suggest using the simpler, though less efficient initial implementation if we can't come up with some way to test the more efficient streaming implementation, but I realize that in some way that is the opposite of what @pauldix suggested)

feat: Add INFO level messages for each gRPC request that is received.

From #197 (comment)

To manage and debug a production server, we need to have visibility into the requests it is serving

This issue tracks adding an INFO level messages for each gRPC request that is received -- one INFO log message for request seems to be the standard in most servers (e.g web servers)

The message should also include the client address, and other salient information

Remove all spots where we have #[allow(dead_code)]

Making Google cloud-storage crate async

I wanted to have this issue to track my work on making the cloud-storage crate async as this work will be... asynchronous, awaiting (puns intended) the maintainer to review my PRs, etc.

TSM decoder for bools

Bitmap the values.

TSM: Implement decoding of bool/string/unsigned data types

As I discovered while writing a test for #117, the current implementation of the tsm decoder does not work for bool/string/unsigned types.

This issue is to track completing that work

Definition of done:
Remove the branches that skip decoding bool/string/unsigned here:
https://github.com/influxdata/delorean/pull/117/files#diff-64f8124192522d6b626eb3ceb2f28682R762

Support OpenTelemetry for tracing Delorean

We are using the tracing crate now for structured, event-based diagnostics.

If we also use something that builds on the tracing framework, such as tracing-opentelemetry then we will be able to emit Delorean traces to our InfluxData Jaeger instance.

This will be particularly useful for when we begin performance testing in GKE. What do you think @influxdata/delorean?

Bug: incorrect null handling in parquet files

While trying to file what I thought was a bug in the parquet writer, I found a bug in the delorean parquet writer's null handling (again due to my misunderstanding of how it works)

Specifically, if there is def_level=0 then that means there should be a missing value in that row

So for example to encode 5 rows, with two nulls, there should only be 3 string values

let string_vals = [ByteArray::from("one"), ByteArray::from("two"), ByteArray::from("three")];
let def_levels = [1, 0, 1, 0, 1];
let rep_levels = [1, 1, 1, 1, 1];

If you encode the above data using parquet, and read it back (via python/pandas) you get the following:

  string_field
0          one
1         None
2          two
3         None
4        three

If we do the same today in the delorean parquet writer we'll end up with

string_field
0          one
1         None
2              <-- empty string
3         None
4         two

Fix parquet writer encodings to be as intended

It turns out that the parquet files written by delorean/master are dictionary encoding all columns, due to a misunderstanding on my part of how the rust parquet property writer works.

Slack reference: https://influxdata.slack.com/archives/CRK9M8L5Q/p1592917468423600

This ticket is for updating the code so that the encodings that are used https://github.com/influxdata/delorean/blob/master/docs/encoding_thoughts.md
with the exception of BYTE_STREAM_SPLIT for float fields, which isn't supported by the rust parquet writer yet (see details on #180)

Refactor: Use `tracing` crate rather than `log`

Inspired from #197 (comment)

Goal is to have a unified tracing system / unified pattern for people to follow. After #197 we will have a mix of log and tracing. The goal is to standardize on tracing given we are going to have an asynchronous server (or two!)

Swap any use of std::sync::RWLock to tokio::sync::RWLock

Just came across this post today: https://www.reddit.com/r/rust/comments/f4zldz/i_audited_3_different_implementation_of_async/

Looks like std::sync::RWLock will end up having the problem of readers starving the writers on Linux (our primary deployment platform). Based on the post, the tokio implementation is fair so writers will be able to obtain the lock even in heavy read workloads.

Although this may not be necessary in the end if we end up partitioning all data and going lock free in the actual storage and caching bits.

Rearrange / rename modules to reflect better what they do

Per @pauldix -- #168 (comment)

I think the work is basically to rearrange the code in delorean along the following lines:

This can all come later in a refactor, but here's how I think of organizing things, let me know if this makes sense. I'm not sure there's a need for a delorean_ingest crate. I think of it like:

influxdb_line_protocol crate, which parses to Packers
influxdb_tsm crate, which parses a single tsm file, many tsm files, or a dirctory/recursive into Packers
delorean_parquet crate, which can convert Packers to Parquet and back again
delorean_mem crate or something name to hold Packers definition and other in-mem representation, which I'll talk about next

For those conversions, I'd assume there would be interfaces for to return a vec, iterator of vecs, or stream of vecs. But I'm basically thinking that we have a single in-memory representation for this data. It's Packers for now, but originally we were thinking of Arrow RecordBatches.

For the actual code tying those crates together, it seems like it would be minimal so I'd just put it on the convert.rs command itself.

Support full nanosecond precision timestamps in parquet

We discovered that version of the rust parquet writer we are using doesn't support timestamps with nanosecond precision and so we have worked around it in #166 by truncating time values to microsecond level precision.

The goal of this issue is to figure out how to support nanosecond precision timestamps in parquet files (perhaps by adding it to the rust parquet implementation).

Here is some additional commentary from @pauldix from here: #166 (comment)

Hmmm this one is tricky. We'll need to add nanosecond precision to the Rust library since we need to be able to support it. Although if we have a bunch of LP data that's coming in with the nano precision areas zeroed out, then we could automatically convert that over.

Basically, Influx timestamps are nanosecond epochs. We encourage people to zero out the precision they don't need, but in practice I think most people don't (the client libraries don't make this easy). I'm not sure if Telegraf does this, but it definitely should. No reason to have even millisecond level of precision if you're only collecting every 10s.

At the time of writing, the underlying rust parquet library doesn't support nanosecond timestamp precisions yet

Timestamp handling (including nanosecond support) was changed as part of Parquet version 2.6 according to https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-260

The rust implementation claims to only support parquet-version 2.4 https://github.com/apache/arrow/tree/master/rust/parquet#supported-parquet-version

Create float64 encoder and decoder

This is net new so it doesn't need to support TSM1 blocks. This one must support NaN and +/-INF. Just the float encoder itself, not the timestamps.

TSM decoder for u64

This is pretty easy, it's basically the i64 encoders but with a different conversion.

TSM Decoder for Strings

Find a snappy library and wrap that in the encoder API.

Improved `delorean convert` ergonomics - convert entire directories of files

As proposed by @pauldix on #164 (review)

Goal is to make the delorean convert command easier to use:

You can hand it a directory and it will convert all TSM files in that dir. Also to have an option that walks the directory tree and converts all TSM files below. In practice, that's what people would use this for over just a single file.

How the collection of TSM files is organized (as in what data goes into each file) should be totally opaque to the user. So the thing they'll know is what their database directory is and they will just point the tool at that.

Build IOx with rust stable rather than a nightly build (apache arrow needs rust nightly-2020-04-22)

This issue tracks figuring out how to build apache arrow using rust stable rather than the nightly-2020-04-22 toolchain. The reason to do this is so that IOx itself could also be built using rust stable.

@carols10cents and @shepmaster had previously looked at the Rust Arrow implementation and think that nightly is used because it uses packed_simd and SIMD is still nightly-only https://github.com/apache/arrow/blob/master/rust/arrow/README.md#simd-single-instruction-multiple-data

it does look like arrow can have a feature that can be enabled/disabled for SIMD support, so theoretically it could work (but just be slower) on stable: https://github.com/apache/arrow/blob/a70b4a06f3cf657f08f80cee83b61f8799828539/rust/arrow/Cargo.toml#L48-L56

Here is a link to more details in slack:
https://influxdata.slack.com/archives/CRK9M8L5Q/p1589984277329600

Parquet: Investigate using BYTE_STREAM_SPLIT encoding for float fields

It is likely float measurements would get better compression if we used BYTE_STREAM_SPLIT encoding (an encoding specifically designed for floating point values) as called for in https://github.com/influxdata/delorean/blob/master/docs/encoding_thoughts.md

However, as of this writing, the rust parquet writer doesn’t support that encoding (it was added in parquet version 2.8 https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-280 but the rust implementation only supports parquet-version 2.4: https://github.com/apache/arrow/tree/master/rust/parquet#supported-parquet-version)

The goal of this ticket is to:

Test how much better compression can be obtained for floating point fields (for, by example, re-compressing a delorean parquet file using one of the libraries that DOES support BYTE_STREAM_SPLIT)
Summarize the results of this test
If the results show significant promise, file a ticket to actually add support for BYTE_STREAM_SPLIT (e.g. by adding a Rust implementation https://influxdata.slack.com/archives/CRK9M8L5Q/p1592917609423900?thread_ts=1592917468.423600&cid=CRK9M8L5Q or some other mechanism)

feat: multi-core TSM conversion

When running a command such as delorean convert input.tsm output at the moment only a single CPU core is used; The goal of this ticket is to improve the speed of the conversion process by using multiple cores (and the pattern described in #223)

Improve error handling / log errors in src/server.rs

There is a lot of code in server.rs that translates some error into a BadRequest but does not log the underlying error. This made it more challenging to debug my interactions with delorean. The idea is to replace this pattern (note the map_err with _) :

    let read_info: ReadInfo =
        serde_urlencoded::from_str(query).map_err(|_| StatusCode::BAD_REQUEST)?;

To one that logs the error to the server logs

Create S3 data organization configuration objects

We'll need to be able to specify many different ways to organize data in S3. This means that the way to organize this data will need to be configurable on a per organization or per bucket level.

Enable `Self` clippy lint

We should enable these lints, and fix any offenders:

Add readers and writers for storage files

Storage files contain indexes, blocks and footers. See here:

      ╔════════════════════════════════════════════════════════════════════════════════════FILE═════════════════════════════════════════════════════════════════════════════════════╗                                                                                             
      ║┌──────────┐┌───────┐┌──────────┐┌──────────┐┌───────┐╔══════════════╗╔═══════════════════════════════════════════════════════════╗╔════════════════════════════╗┌──────────┐║                                                                                             
      ║│          ││       ││          ││          ││       │║              ║║                                                           ║║                            ║│          │║                                                                                             
      ║│ Checksum ││Version││ Min Time ││ Max Time ││ Index │║    INDEX     ║║                          BLOCKS                           ║║           FOOTER           ║│  Footer  │║                                                                                             
      ║│    4B    ││  1B   ││    8B    ││    8B    ││ Size  │║     <N>      ║║                            <N>                            ║║            <N>             ║│  Offset  │║                                                                                             
      ║│          ││       ││          ││          ││<vint> │║              ║║                                                           ║║                            ║│    4B    │║                                                                                             
      ║│          ││       ││          ││          ││       │║              ║║                                                           ║║                            ║│          │║                                                                                             
      ║└──────────┘└───────┘└──────────┘└──────────┘└───────┘╚══════════════╝╚═════════════════════════════════╦═════════════════════════╝╚════════════════════════════╝└──────────┘║                                                                                             
      ╚═══════════════════════════════════════════════════════════════╦════════════════════════════════════════╬═════════════════════════════════════════╦══════════════════════════╝                                                                                             
                                                                      │                                        │                                         │                                                                                                                        
                                                                      │                                        │                                         │                                                                                                                        
                                                                      │                                        │                                         │                                                                                                                        
                                                                      │                                        │                                         │                                                                                                                        
                                                              ┌───────┘                                        └───────────┐                             │                                                                                                                        
                                                              │                                                            │                             └────────────────────────────────────────────────────────┐                                                               
                                                              │                                                            │                                                                                      │                                                               
                                                              │                                                            │                                                                                      │                                                               
                                                              │                                                            │                                                                                      │                                                               
                                                              │                                                            │                                                                                      │                                                               
                                                              │                                                            │                                                                                      │                                                               
╔════════════════════════════INDEX════════════════════════════▼╗                                         ╔═════════════════▼═══════════BLOCKS═════════════════════════════╗                 ╔═════════════════════▼═════════════════FOOTER═══════════════════════════════════════╗
║ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌──────────┐ ║                                         ║╔══════════════╗╔══════════════╗╔══════════════╗╔══════════════╗║                 ║┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌──────────┐            ║
║ │          ││          ││          ││          ││          │ ║                                         ║║              ║║              ║║              ║║              ║║                 ║│          ││          ││          ││          ││          ││          │            ║
║ │    ID    ││ Min Time ││ Max Time ││  Offset  ││          │ ║                                         ║║    BLOCK     ║║    BLOCK     ║║    BLOCK     ║║              ║║                 ║│    ID    ││ Key Size ││   Key    ││    ID    ││ Key Size ││   Key    │            ║
║ │    4B    ││    8B    ││    8B    ││    4B    ││  . . .   │ ║                                         ║║     <N>      ║║     <N>      ║║     <N>      ║║     ...      ║║                 ║│  <vint>  ││  <vint>  ││   <N>    ││  <vint>  ││  <vint>  ││   <N>    │            ║
║ │          ││          ││          ││          ││          │ ║                                         ║║              ║║              ║║              ║║              ║║                 ║│          ││          ││          ││          ││          ││          │            ║
║ │          ││          ││          ││          ││          │ ║                                         ║║              ║║              ║║              ║║              ║║                 ║│          ││          ││          ││          ││          ││          │            ║
║ └──────────┘└──────────┘└──────────┘└──────────┘└──────────┘ ║                                         ║╚══════╦═══════╝╚══════════════╝╚══════════════╝╚══════════════╝║                 ║└──────────┘└──────────┘└──────────┘└──────────┘└──────────┘└──────────┘            ║
╚══════════════════════════════════════════════════════════════╝                                         ╚═══════╬════════════════════════════════════════════════════════╝                 ╚════════════════════════════════════════════════════════════════════════════════════╝
                                                                                                                 │                                                                                                                                                                
             ┌───────────────────────────────┐                                                                   │                                                                                                  ┌──────────────────────────────────────┐                      
             │ The Index section for a file  │                                                                   │                                                                                                  │   The Footer section for a file is   │                      
             │   maps series ID to a block   │                                                                   │                                                                                                  │stored as a compressed block of data. │                      
             │      offset for that ID.      │                                                                   │                                                                                                  │ It is designed to not be frequently  │                      
             │                               │                                                                   │                                                                                                  │read, but used for DR purposes, or to │                      
             │ Multiple blocks can exist for │                                                                   └──────────┐                                                                                       │       initialise new indexes.        │                      
             │ the same ID. In that case the │                                                                              │                                                                                       │                                      │                      
             │offset will point to the first │                                                                              │                                                                                       └──────────────────────────────────────┘                      
             │ of those blocks. The min and  │                                                                              │                                                                                                                                                     
             │max times will cover all blocks│                                                                              │                                                                                                                                                     
             │          for the ID.          │                                                                              │                                                                                                                                                     
             │                               │                                                                              │                                                                                                                                                     
             │                               │                                                                              │                                                                                                                                                     
             └───────────────────────────────┘                                                                              │                                                                                                                                                     
                                                  ╔═════════════════════════════════════════════════════════════════BLOCK═══▼═════════════════════════════════════════════════════════════╗                                                                                       
                                                  ║┌────────┐┌──────┐┌────────┐┌────────┐┌──────┐┌───────┐┌────────────┐┌──────────┐┌─────────┐╔═════════════╗╔══════════════════════════╗║                                                                                       
                                                  ║│        ││      ││        ││        ││      ││       ││            ││          ││         │║             ║║                          ║║                                                                                       
                                                  ║│Checksum││  ID  ││Min Time││Max Time││ Rem  ││ Block ││Summary Size││   Data   ││  Data   │║   SUMMARY   ║║           DATA           ║║                                                                                       
                                                  ║│   4B   ││  4B  ││   8B   ││   8B   ││  4B  ││ Type  ││     1B     ││  Offset  ││  Size   │║     <N>     ║║           <N>            ║║                                                                                       
                                                  ║│        ││      ││        ││        ││      ││  1B   ││            ││    2B    ││   4B    │║             ║║                          ║║                                                                                       
                                                  ║│        ││      ││        ││        ││      ││       ││            ││          ││         │║             ║║                          ║║                                                                                       
                                                  ║└────────┘└──────┘└────────┘└────────┘└──────┘└───────┘└────────────┘└──────────┘└─────────┘╚═════════════╝╚══════════════════════════╝║                                                                                       
                                                  ╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

The Block portion is done, but the rest needs implementing.

Port line protocol parser tests from Go to Rust

Ratiobale: Before putting the rust based line protocol parser into production, we should ensure it is as compatible as possible with the Go based reference implementation.

This issue tracks the work to port the tests from Go to Rust (and if any don't pass either fix them or file tickets to track what is failing).

https://github.com/influxdata/influxdb/blob/217eddc87e14a79b01d0c22994fc139f530094a2/models/points_test.go#L418

chore: Move tsm reader code into its own crate

Right now, https://github.com/influxdata/delorean/blob/master/src/storage/tsm.rs is in the delorean crate which means that any other crate in delorean that needs the tsm reader (e.g. the dstool) ends up needing a dependency on delorean which is a bit messy, as pointed out by @shepmaster here: https://github.com/influxdata/delorean/pull/112/files#r435880289

The idea would be to pull the tsm reader into its own crate -- @e-dard I think said he had some idea / plan to do so

Add encoders and decoders for remaining types: bool/string/unsigned

Still need to provide encoders for:

u64: this is pretty easy, it's basically the i64 encoders but with a different conversion;
string: find a snappy library and wrap that in the encoder API;
bool: bitmap the values.

When complete, the checks should be able to be removed from the branches that skip decoding bool/string/unsigned here:
https://github.com/influxdata/delorean/pull/117/files#diff-64f8124192522d6b626eb3ceb2f28682R762