elastic / apm-data Goto Github PK

View Code? Open in Web Editor NEW

10.0 149.0 23.0 1.35 MB

apm-data holds definitions and code for manipulating Elastic APM data

License: Apache License 2.0

Makefile 0.09% Go 99.73% Shell 0.17%

apm-data's People

Stargazers

Watchers

apm-data's Issues

Remove model-modelpb compatibility layers

during the migration to protobuf we added some compatibility layers.

Once we are done, we should remove them:

model.ProtoBatchProcessor
modelprocessor.Chained
modelprocessor.PbChained->modelprocessor.Chained`

invalid input for HTTPHeader: nil, numbers and maps

Part of the ops KPI review, from the ecs logs:

decode error: data read error: v2.transactionRoot.Transaction: v2.transaction.Context: v2.context.Response: v2.contextResponse.Headers: invalid input for HTTPHeader: [<nil>]

decode error: data read error: v2.transactionRoot.Transaction: v2.transaction.Context: v2.context.Tags: Response: v2.contextResponse.Headers: invalid input for HTTPHeader: 301

decode error: data read error: v2.transactionRoot.Transaction: v2.transaction.Context: v2.context.Response: v2.contextResponse.Headers: invalid input for HTTPHeader: map[httponly:true path:/ samesite:Lax secure:true]

OTLP logs should only require one of `exception.{type,message}` for exceptions

The code is checking for both attributes, even though the comment (and spec) says that only one is required:

apm-data/input/otlp/logs.go

Lines 183 to 187 in 39e2a3c

 if exceptionMessage != "" && exceptionType != "" { 

 // Per OpenTelemetry semantic conventions: 

 // `At least one of the following sets of attributes is required: 

 // - exception.type 

 // - exception.message`

Add `span.id` field to transactions

We want to converge span and transaction data models. For that, all transaction documents should also have a span.id field (where the field value is a copy of transaction.id).
This also will allow better correlation between logs and arbitrary spans (that are either spans or transactions) for OTel use cases.

OTel log records know about the span.id they belong to. However, if they are not tied to spans as SpanEvents, there's no way to tell whether the span.id in a log record belongs to a span document or a transaction. So, in case of transaction the correlation breaks, because we query for span.id = XYZ while transactions do not have a span.id field at all.

Use race detector for tests

Investigate adding -race flag to go test

Introduce fuzz testing

Followup from #63

We should introduce fuzz testing to make sure we are not missing anything

As this is a library we might also evaluate adding fuzz testing into APM Server and fuzz the intake endpoint directly.

OTel logs correlation breaks for transactions-mapped spans

When does the problem occur?

When receiving an OTel span S that is being mapped to a transaction (e.g. root span, or SpanKind = SERVER) and in addition an OTel log event L that is correlated to that span S (i.e. the log event has the OTLP field SpanID pointing to that span).

Problem

Correlation on the span / transaction breaks.

Reason

In the above situation we map the OTel span S to a transaction document. Thus, the OTLP field SpanID is being mapped to the transaction.id field in the internal model.

When receiving the corresponding log event L, the log event points to S through an OTLP SpanID field. However, since the log event L does not carry the characteristics of the span S (but only the SpanID) we cannot decide whether the OTLP field SpanID on the log event needs to be mapped to a span.id or a transaction.id field. As a result the SpanID OTLP field is always being mapped to the span.id field (even for associated transaction documents).

Remove processor fields

          The "processor" fields are a bit of a relic, and we should aim to remove them in the long term. With that in mind, I wonder if we should change the model a little bit so we can remove them from the apm-data codebase, and set the `processor.*` fields in our ingest pipelines, or by setting a value on `constant_keyword` fields where it makes sense.

e.g. for metrics, processor.name and processor.event are both always "metric", so we can update their field definitions to set the value in the mapping: https://github.com/elastic/apm-server/blob/23fb1577909836ebf45e65705df3fd560de5adb1/apmpackage/apm/data_stream/app_metrics/fields/fields.yml#L30-L35

IIANM the only exception to this is for the apm.traces and apm.rum data streams, where spans and transactions end up. These both have have processor.name: transaction, but they differ in processor.event (one is "span", one is "transaction"). Eventually spans and transactions should converge, but for now I think we could set the value in an ingest pipeline.

Maybe we could:

set event.kind to either "span" or "transaction" in the apm-data code
update the traces ingest pipeline to use this to populate processor.event, and then remove event.kind since those values are not valid for event.kind

WDYT?

Originally posted by @axw in #58 (comment)

Related: #47.

Remove metrics mapping log for OTel system and JVM metrics

With elastic/kibana#151826 OTel system metrics and JVM metrics are being displayed by Kibana in their raw format.

We don't need the following mapping logic in the APM intake anymore:

apm-data/input/otlp/metrics.go

Line 116 in 3ad1a5c

func (b *apmMetricsBuilder) accumulate(m pmetric.Metric) {

This issue is about cleaning up and removing the logic in the OTel mapping.

OTel instrumentation of OTLP consumer rejected metrics

For OTLP input, there is currently "monitoring" for UnsupportedMetricsDropped in otlp consumer. #156 added partial success support to it, but it returns rejected data points instead of dropped metrics.

We would like to move away from monitoring and have OTel instrumentation of rejected metrics so that all apm-data library users will have access to it.

automatically update license when 'model_generated.go' is generated

This is a follow-up of #17 , in particular this comment.

When generating the model_generated.go through the make generate command, the generated file does not contain the required license headers.

As a consequence, we have to also run make update-licenses in order to fix that.

Making the make generate also update the license in model_generated.go would remove the need to have execute a separate command.

input/otlp: record map-type attributes

We currently do not record map-type attributes when translating OTLP events to Elastic APM events. For now we may want to flatten the map, adding dots as needed. Hopefully in the future we will be using the Elasticsearch flattened field type, and this will be unnecessary.

Investigate proto.Clone performance impact

Local benchmarks show proto.Clone taking a small amount of CPU time (~8% total time). It's probably not enough to cause a regression but it's not great that something we introduced is taking a noticeable CPU time as it would decrease the impact of other performance improvements.

Clone is using reflection under the hood, we should try to minize its usage.

Avoid map allocations when mapping modelpb to modeljson

The new protobuf logic is allocating maps and copying from structpb.Struct to map[string]Any.

We don't really need to do this and could investigate passing the structpb.Struct type directly which is then encoded to json with a custom marshaling method or something similar. This would improve performance and decrease memory allocations

Improve validation strategy on empty elements

Something along the lines of []foo{validFoo, null, validFoo1} shouldn't be parsed correctly.

IMO we have two options here:

ensure that each element of a slice is set (not empty/null) as part of validation process
add required fields to each slice element type so that the error is caught when validating the slice elements

Use enum value options during protobuf model mapping

From @axw

with the protobuf enums, is it possible to use options to control the string representation? https://protobuf.dev/programming-guides/proto3/#enum-value-options
Then maybe we can avoid the manually maintained maps from names to enum values

Update dev docs after models removal and migration to protobuf

Dev docs should be updated to account for protobuf definitions and new modelpb package

The old model package will be removed

See https://github.com/elastic/apm-data/blob/main/dev_docs/HOW_TO.md

OpenTelemetry JVM metrics are not properly mapped

The JVM metrics reported by OpenTelemetry Java agents are not properly mapped. In elastic/apm-server#8777 we changed the mapping to comply with the change in the metrics semantic convention, however, this mapping logic seems to be ignored.

The metric documents do appear under discover, however with wrong field names.

Example

This is how the metric document looks right now:

And this is how a valid JVM metrics document would look like:

So, seems that this mapping logic is not being applied.

Example data:

Here is some OTLP example data for the JVM metrics: https://gist.github.com/AlexanderWert/bf3b8a6cbbd02a345038bd8e8cac520f

Hypothesis:

Let's take a concrete metric: process.runtime.jvm.memory.usage

In this mapping logic it is assumed that this metric is reported as a Gauge metric type, however, in fact this metric (process.runtime.jvm.memory.usage) has the type sum (as we can see in the example data).
So very likely the root cause is the wrong metric type in the mapping logic.

Here is the OTel spec for the metrics: https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/runtime-environment-metrics/#jvm-metrics

All types of counters (Counter, UpDownCounter) are mapped to the sum metric type in the OTLP protocol!
So we need to have the mapping in the MetricTypeSumif-branch.

Remove old model package from apm-data

Now that we removed model usage completely from APM Server we should remove the unused code from apm-data.

Setup semver

We should start versioning this library with semver, with a changelog.

Investigate being able to reuse the OTLP input with the otel SDK structs

The OTLP input uses collector structs, as that's what we get from gRPC (example, metrics).
With elastic/apm-server#11470, we duplicate the same logic with the otel SDK structs.

Investigate refactoring the structs so we can remove that logic duplication (with no performance loss), maybe by converting the SDK structs into collector ones.

Add check for breaking changes to protobuf

e.g. https://buf.build/docs/breaking/overview

Provide enablement material

We want to onboard apm-agent developers to this repository, to be able to work on open-telemetry mappings and to add new fields to the Intake API and processing.

For enablement we need to

add a high level overview over data flow and processing
provide a recorded video with a code-walkthrough
document which code changes, make commands, approval test adoptions, etc. are required for adding a new field; link to a reference PR
document which code changes are required for adding or changing open telemetry mappings
show which steps are required in the APM Server code base to update

Use the Elasticsearch `uri_parts` ingest processor to parse URLs

We should investigate whether we could use the Elasticsearch uri_parts ingest processor to parse URLs, and include only the full URL in the model types.

Originally posted by @axw in #47 (comment)

Consider using uint64 to store timestamps instead of timestamppb

If APMEvents is backed by vtproto's pool then timestamppb comes out as one of the most allocation heavy object. This was fixed in apm-aggregation by using uint64 to encode timestamps. If uint64 suits all our needs, we should consider promoting that package to apm-data: https://github.com/elastic/apm-aggregation/tree/main/aggregators/internal/timestamppb

OTel mapping doesn't populate span.Db

Our code in traces.go doesn’t seem to map db attributes onto span.db, which should be reviewed and fixed.

Protobuf benchmarks

We do have some Go benchmarks, but as we're trying to optimize our protobuf setup, it would be nice to have an automated/reproducible way of benchmarking our protobuf setup as well.

So the idea here is to setup a suite of benchmarks which would setup structs from the generated protobuf definitions, and analyze the generated size of the objects, and time to encode/decode.

Logs derived from span events drop all `span.` and `transaction.` fields

Currently, all logs derived from span events strip all span.* and transaction.* fields (including span.id and transaction.id):

apm-data/input/otlp/traces.go

Lines 880 to 881 in 2adc910

 event.Transaction = nil 

 event.Span = nil

Integrate protobuf definitions for model types in apm-server

Follow up on #36

Phase 2

Beta Give feedback

Update apm-server to use protoc-generated Go types everywhere, and removing handwritten model types
Update apm-data to decode events directly into protobuf-generated types, and remove the old model types
Update apm-server to encode events to protobuf for tail-based storage
Options

Review protobuf types before declaring it stable

We should review protobuf fields to ensure we are using proper types.

validate signed/unsigned int usage
make sure ints have proper size (32/64)
improved storage for IPs (followup from #42 (comment))

Use object notation for data_stream fields

Currently, we're setting the data_stream.* fields in dotted notation:

apm-data/model/internal/modeljson/document.go

Lines 73 to 75 in 6ef8c81

 DataStreamType string `json:"data_stream.type,omitempty"` 

 DataStreamDataset string `json:"data_stream.dataset,omitempty"` 

 DataStreamNamespace string `json:"data_stream.namespace,omitempty"`

This causes issues when using the reroute processor:

elastic/elasticsearch#96243

While I think that the reroute processors, and all processor for that matter, should support both dotted and nested field notations, we should use nested fields to work around that issue for now.

It seems unlikely that users have relied on the dotted field notation in their ingest pipeline as the set processor doesn't even work with dotted field names. The only processor for which it's possible to access dotted field names is the script processor.

The primary way to set the data_stream.* fields in an ingest pipeline is the reroute processor, but it can't be use for APM due to the dotted field notation.

Remove lint:ignore in modeldecoder/generator/slice.go

Update the modeljson generator to remove the redundant minValue check from the generated Go code for unsigned integers, while keeping the validation in the generated JSON schema.

So we can remove the nolint in https://github.com/elastic/apm-data/blob/main/input/elasticapm/internal/modeldecoder/generator/slice.go

See #123 (comment)

[docs] create data mapping dictionary for otel -> ecs mappings

Make it easier for anyone to understand which otel semantic conventions are mapped to ECS fields when processing with apm-data logic. The challenge will be to keep this up to date when done manually.
The main audience for this documentation are UI devs, users, PMs & support engineers.

docs: document release and tag process

This repo is generally supposed to be stack version independend, but some changes need to be pulled into minor or patch fixes of the stack. New features and bug fixes need to be released in minor and patch versions that can be matched with stack versions. We need to document this.

Introduce protobuf definitions for model types

For several reasons we would like to define an efficient, stable, binary encoding for model types. e.g. we would use this for storing events in Badger for tail-based sampling. These would be much faster to encode/decode, and more importantly will have strong stability guarantees

To achieve the above, we will define our intermediate, in-memory/on-disk, model types in protobuf -- this will be the source of truth. We'll take a phased approach to this, given that the existing types are used all across apm-server, and making a big-bang change would carry a significant amount of risk of introducing bugs.

With #35 merged, we have created a cleaner separation between the model types and the way they are encoded to JSON. The model types no longer have to directly map to the final document structure, though for our sanity we should probably keep them close. The model types do not need to be ECS-compliant, and we can instead evolve the JSON encoding over time without changing the model types.

Phase 1 (iteration-05)

introduce protobuf definitions (probably worth basing off @marclop's work in https://github.com/elastic/apm-ingest-queueless)
generate Go types from protobuf (into model/modelpb or something like that)
introduce code for JSON encoding protobuf types by transforming to internal/modeljson types, like we're doing with model.APMEvent now
introduce code for mapping model events to the protobuf-generated types; remove the code for translating from model types to modeljson, and instead translate model types to protoc-generated types, then protoc-generated types to modeljson

Phase 2 (iteration-06)
#52

Review protobuf int size and signed/unsigned usage

I am going to use this issue to verify the size and signed/unsigned for integers in protobuf.
Below is the list of all ints in the proto definitions. For each of them, I will validate the definition with the ingest pipeline and the json decoder and leave comments on the issue.

Note: this only looks at the integers. The floats/double aren't in here.

Once each int is validated and possibly fixed, this issue will be closed.
See #47

client.proto

port

apm-data/model/proto/client.proto

Line 29 in e5765b8

uint32 port = 3;

The maximum port number is 65 535, which is way lower than what an uint32 can carry.
Protobuf doesn't allow setting int16, so this type and every other port using uint32 is valid.

destination.proto

port

apm-data/model/proto/destination.proto

Line 26 in e5765b8

uint32 port = 2;

See port comment in client.proto above.

event.proto

severity

apm-data/model/proto/event.proto

Line 39 in e5765b8

int64 severity = 9;

Defined as int64 in modeljson, same as we have here. So there doesn't seem to be any reason to downgrade this to a lower size.

apm-data/model/internal/modeljson/event.go

Line 30 in 9c38d23

Severity int64 `json:"severity,omitempty"`

See #123

experience.proto

count

apm-data/model/proto/experience.proto

Line 32 in e5765b8

int64 count = 1;

This field is defined as int in the JSON decoder.

apm-data/input/elasticapm/internal/modeldecoder/v2/model.go

Line 1120 in e5765b8

Count nullable.Int `json:"count" validate:"required,min=0"`

See #122

http.proto

transfer_size

apm-data/model/proto/http.proto

Line 47 in e5765b8

optional int64 transfer_size = 4;

Defined as int64 in modejson

apm-data/model/internal/modeljson/http.go

Line 45 in e5765b8

TransferSize *int64 `json:"transfer_size,omitempty"` // Non-ECS field.

encoded_body_size

apm-data/model/proto/http.proto

Line 48 in e5765b8

optional int64 encoded_body_size = 5;

Defined as int64 in modeljson.

apm-data/model/internal/modeljson/http.go

Line 46 in e5765b8

EncodedBodySize *int64 `json:"encoded_body_size,omitempty"` // Non-ECS field.

decoded_body_size

apm-data/model/proto/http.proto

Line 49 in e5765b8

optional int64 decoded_body_size = 6;

Defined as int64 in modeljson.

apm-data/model/internal/modeljson/http.go

Line 47 in e5765b8

DecodedBodySize *int64 `json:"decoded_body_size,omitempty"` // Non-ECS field.

status_code

apm-data/model/proto/http.proto

Line 50 in e5765b8

int32 status_code = 7;

Defined as int in modejosn.

apm-data/model/internal/modeljson/http.go

Line 49 in e5765b8

StatusCode int `json:"status_code,omitempty"`

See #123

log.proto

line

apm-data/model/proto/log.proto

Line 37 in e5765b8

int32 line = 2;

Defined as int in modejson.

apm-data/model/internal/modeljson/log.go

Line 37 in e5765b8

Line int `json:"line,omitempty"`

See #123

message.proto

age_millis

apm-data/model/proto/message.proto

Line 29 in e5765b8

optional int64 age_millis = 3;

Defined as int64 in modejson.

apm-data/model/internal/modeljson/message.go

Line 29 in e5765b8

Millis *int64 `json:"ms,omitempty"`

metricset.proto

Metricset/doc_count

apm-data/model/proto/metricset.proto

Line 30 in e5765b8

int64 doc_count = 4;

Defined as an int64 in modeljson.
https://github.com/elastic/apm-data/blob/main/model/internal/modeljson/document.go#L76

Histogram/counts

apm-data/model/proto/metricset.proto

Line 52 in e5765b8

repeated int64 counts = 2;

Defined as int64 in modeljson.

apm-data/model/internal/modeljson/metricset.go

Line 43 in e5765b8

Counts []int64 `json:"counts"`

SummaryMetric/count

apm-data/model/proto/metricset.proto

Line 56 in e5765b8

int64 count = 1;

Defined as int42 in modejosn.
https://github.com/elastic/apm-data/blob/e5765b8f8d8992d4360231ce86d5f57a8d637366/model/internal/modeljson/metricset.go#L51C1-L51C1

See #123

AggregationDuration/count

apm-data/model/proto/metricset.proto

Line 61 in e5765b8

int64 count = 1;

Defined as int in modejson.

apm-data/model/internal/modeljson/metricset.go

Line 103 in e5765b8

Count int

See #122
See #123

process.proto

Process/ppid

apm-data/model/proto/process.proto

Line 25 in e5765b8

uint32 ppid = 1;

This is defined as an int32 in modeljson, which seems valid.
https://github.com/elastic/apm-data/blob/main/input/elasticapm/internal/modeldecoder/v2/model.go#L509

Note: Hasn't this been deprecated in ECS?
https://github.com/elastic/ecs/blob/2fb814f063746a1fac3ff1390d2e9387bdd47a2f/docs/release-notes/8.0.asciidoc?plain=1#L16

Process/pid

apm-data/model/proto/process.proto

Line 31 in e5765b8

uint32 pid = 7;

Defined as an int in modejson, which seems valid.

apm-data/model/internal/modeljson/process.go

Line 27 in e5765b8

Pid int `json:"pid,omitempty"`

ProcessThread/id

apm-data/model/proto/process.proto

Line 36 in e5765b8

int32 id = 2;

Defined as an id in modejson, which seems valid.

apm-data/model/internal/modeljson/process.go

Line 40 in e5765b8

ID int `json:"id,omitempty"`

session.proto

sequence

apm-data/model/proto/session.proto

Line 26 in e5765b8

int64 sequence = 2;

Defined as an int in modeljson.

apm-data/model/internal/modeljson/session.go

Line 22 in e5765b8

Sequence int `json:"sequence,omitempty"`

See #122

source.proto

port

apm-data/model/proto/source.proto

Line 30 in e5765b8

uint32 port = 4;

See port comment in client.proto above.

span.proto

DB/rows_affected

apm-data/model/proto/span.proto

Line 47 in e5765b8

optional uint32 rows_affected = 1;

This is a uint32 in modeljson.

apm-data/model/internal/modeljson/span.go

Line 60 in e5765b8

RowsAffected *uint32 `json:"rows_affected,omitempty"`

Composite/count

apm-data/model/proto/span.proto

Line 70 in e5765b8

uint32 count = 2;

This is an int32 in modeljson.

apm-data/model/internal/modeljson/span.go

Line 51 in e5765b8

Count int `json:"count"`

This can't be negative, so switching to uint seems valid.

stacktrace.proto

StacktraceFrame/lineno

apm-data/model/proto/stacktrace.proto

Line 28 in e5765b8

optional uint32 lineno = 2;

This is an uint32 in modeljson, which seems valid.

apm-data/model/internal/modeljson/stacktrace.go

Line 41 in e5765b8

Number *uint32 `json:"number,omitempty"`

StacktraceFrame/colno

apm-data/model/proto/stacktrace.proto

Line 29 in e5765b8

optional uint32 colno = 3;

This is an uint32 in modeljson, which seems valid.

apm-data/model/internal/modeljson/stacktrace.go

Line 42 in e5765b8

Column *uint32 `json:"column,omitempty"`

Original/lineno

apm-data/model/proto/stacktrace.proto

Line 29 in e5765b8

optional uint32 colno = 3;

This is an uint32 in modeljson, which seems valid.

apm-data/model/internal/modeljson/stacktrace.go

Line 55 in e5765b8

Lineno *uint32 `json:"lineno,omitempty"`

Original/colno

apm-data/model/proto/stacktrace.proto

Line 50 in e5765b8

optional uint32 colno = 5;

This is an uint32 in modeljson, which seems valid.

apm-data/model/internal/modeljson/stacktrace.go

Line 56 in e5765b8

Colno *uint32 `json:"colno,omitempty"`

transaction.proto

SpanCount/dropped

apm-data/model/proto/transaction.proto

Line 48 in e5765b8

optional uint32 dropped = 1;

This is an uint in modeljson, which seems valid.

apm-data/model/internal/modeljson/transaction.go

Line 39 in e5765b8

Dropped *uint32 `json:"dropped,omitempty"`

SpanCount/started

apm-data/model/proto/transaction.proto

Line 49 in e5765b8

optional uint32 started = 2;

This is an uint in modeljsob, which seems valid.

apm-data/model/internal/modeljson/transaction.go

Line 40 in e5765b8

Started *uint32 `json:"started,omitempty"`

url.proto

port

apm-data/model/proto/url.proto

Line 32 in e5765b8

uint32 port = 8;

See port comment in client.proto above.

ci: Run micro-bechmarks on every commit and PR

Description

APM Data is a crucial module, rather than just rely on the APM Server benchmarks to detect potential performance regressions, run the Go micro benchmarks for every commit and in every PR, so any regressions (and performance improvements) can be caught early

Use `modelpb.<Type>FromVTPool` wherever possible

Description

Since we added back pooling in #128, we should start using the pooled modelpb.<Type>FromVTPool wherever possible and indicate that clients should use ReturnToVTPool after they're done processing an event.

Add S3 bucket name and object key fields in the intake

input/elasticapm: generate Compound Schema Documents (JSON-Schema)

We currently generate multiple JSON Schema documents. These are synchronised to APM Agent repos for testing, which requires having to list out each file. We should look at generating a single compound (bundled) JSON Schema document to simplify this: elastic/apm-agent-python#1745 (comment)

Fuzz testing

We should fuzz test the inputs, and ensure for example that decoding cannot cause panics during translation to model types.

We should also provide a test package for producing randomised/fuzzed model.Batches, to feed into a BatchProcessor. This could then be used to ensure processors do not panic or otherwise behave badly when encountering arbitrary data that passes decoding.

Define a process for dealing with OTel SemConv changes

The mapping for OTel data currently supports a certain (old) version of the semantic conventions.

With the OTel Semantic Conventions being merged with ECS and being stabilized we have to expect versions of SemConv to come soon that will introduce many breaking changes in the field names.

We need to define a process for dealing with different versions of Semantic Conventions so that we can support newer versions of SemConv while keeping backwards compatibility with older versions. The implication is that, with a given version of apm-data we should support a range of SemConv versions.

Related Info

Semantic conventions define schema files that enumerate all the changes between versions
the semantic conventions versions theoretically may even vary per signal (within a single connection / agent / SDK)

Derivation of the `span.type` from OTel data is indeterministic

in OTel a single span can have a mix of attributes from different namespaces. For example, a span could have db.* attributes and at the same time http.* attributes.

We use the logic of this switch statement to determine the foundSpanType variable. This is done while iterating over all the attributes on a span. So basically the last attribute defines the actual foundSpanType and the corresponding mapping-logic below. This is indeterministic because we don't know the order of the span attributes.

We need a more explicit logic to derive the span.type.

Map child-ids in OTel attributes

To support inferred spans in OTel-based agents, we need to map child-ids field from OTel attributes to the child.id field.

TBD: the OTel attribute name to map from.

Map OTel `code.stacktrace` attribute top-level

Once open-telemetry/semantic-conventions#435 lands, we need to map the OTel code.stacktrace attribute top-level (instead of into labels) so we can display stacktraces on spans for OTel-based agents.

review truncating otel strings that are indexed as keywords

We currently truncate otel attributes that are indexed as keywords to 1024 chars

apm-data/input/otlp/traces.go

Line 279 in 88a3977

stringval := truncate(v.Str())

The mappings are generally created with ignore_above: 1024, which would lead to not indexing this field if the value exceeds 1024 chars.

We should review if truncating of the values for otel strings is the best choice, where the field will always be indexed, but anything above 1024 chars will be completely lost vs. not truncating the values, leading to certain fields not being indexed and searchable, but only available in _source, if exceeding the limit. When moving to synthetic source, the time to retrieve the non-indexed values might be increased.

Compare the encoding/decoding performance of maps with array-of-structs

We should also look at the encoding/decoding performance of maps, and compare against the array-of-structs approach: https://protobuf.dev/programming-guides/proto3/#backwards. We may want to do that after integrating into apm-server, in case it makes operating on the model types too painful.

Originally posted by @axw in #47 (comment)

Consider using uint64 to store timestamps instead of durationpb

Similar to #129

investigate sanitizelabel optimization

Followup from #42 (comment)

We should expect most keys to not require sanitisation, and optimise for that.

Investigate switching error grouping key calculation to use something faster

We should investigate switching from MD5 to something faster, like xxhash, for calculating error grouping keys. MD5 is cryptographic, which is not necessary for our purposes. xxHash is non-cryptographic, and considerably faster while maintaining high quality hashes.

	if exceptionMessage != "" && exceptionType != "" {
	// Per OpenTelemetry semantic conventions:
	// `At least one of the following sets of attributes is required:
	// - exception.type
	// - exception.message`

	DataStreamType string `json:"data_stream.type,omitempty"`
	DataStreamDataset string `json:"data_stream.dataset,omitempty"`
	DataStreamNamespace string `json:"data_stream.namespace,omitempty"`

elastic / apm-data Goto Github PK

apm-data's People

Stargazers

Watchers

Forkers

apm-data's Issues

When does the problem occur?

Problem

Reason

Example

Example data:

Hypothesis:

Phase 2

client.proto

destination.proto

event.proto

experience.proto

http.proto

log.proto

message.proto

metricset.proto

process.proto

session.proto

source.proto

span.proto

stacktrace.proto

transaction.proto

url.proto

Description

Description

Related Info

Recommend Projects

Recommend Topics

Recommend Org

Jobs