GithubHelp home page GithubHelp logo

raystack / firehose Goto Github PK

View Code? Open in Web Editor NEW
313.0 15.0 52.0 14.23 MB

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

Home Page: https://raystack.github.io/firehose/

License: Apache License 2.0

Dockerfile 0.07% Java 99.93%
kafka sink streaming firehose dataops bigquery postgresql influxdb prometheus apache-kafka

firehose's People

Contributors

akhildv avatar ankittw avatar anukin avatar arujit avatar deepakmarathe avatar eyeofvinay avatar fdgod09 avatar fzrvic avatar gauravsinghania avatar h4rikris avatar hmoniaga2 avatar jesrypandawa avatar kaiwren avatar kevinbheda avatar kevinbhedag avatar kn-sumanth avatar lavkesh avatar mahendrakariya avatar mayurgubrele avatar njhaveri avatar nncrawler avatar prakharmathur82 avatar pyadav avatar rajarammallya avatar ravisuhag avatar rohilsurana avatar shreyansh228 avatar sumitaich1998 avatar tygrash avatar vaish3496 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

firehose's Issues

Add instrumentation and logging to BQ sink.

Acceptance criteria:

  • Analysis of metrics for BQ sink.
  • Implementation.

Discussion:

Insert time.
Counter for success/failures of insert messages.
No of Error messages(deserialisation and repose errors from bigquery) or dlqed.
Log the offsetinfo when the error happens.
table/dataset creation logging and metrics.
stencil Proto update logging and metrics. (log exceptions etc)
Think about completeness/freshness/deduplication.

Allow ignore unknown value in firehose BQ client

There's issue on starting firehose when proto schema is not matching with BQ table schema. Sample error looks like this:
Provided Schema does not match Table xxx. Field yyy is missing in new schema

Need a feature to ignore this error and fill with default value if proto schema is incomplete.
This can be done by allowing ignoreUnknownValues while instantiating BQ client.
source: https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll#request-body

Improve documentation

  • Add changelog
  • Update roadmap

Guides

  • Filters - How to use

Concepts

  • Overview
  • Add glossary
  • Add structure - to cover the code structure
  • Add details about monitoring
  • Details about filters
  • Explain in detail about templating

Reference

  • Metric names
  • FAQS

Contribute

  • Release process
  • Development Guide - Add details about local setup

Add support for JSON data for BQ sink

Currently only ElasticSearch and MongoDB Sinks have support for JSON messages.
Let's add support for parsing JSON messages in other sinks as well.

Extend OffsetManager to use for BQ sink

Description :
AsyncConsumer in BQ sink sends futures list to OffsetManager. OffsetManager is implemented with keeping CloudSink in mind, As part of this story we extend it to be used with BQ sink as well.

Configuration :

Acceptance Criteria :

  1. Implement commit strategy for Asynchronous consumer
  2. Firehose Log sink works with Consumer Mode = Async
  3. No data loss

Dependency Scan Vulnerabilities - Snyk

Below are the list of vulnerabilities reported by dependency scan.

Summary

Tested 195 dependencies for known issues, found 127 issues, 479 vulnerable paths.
image

Issues to fix by upgrading:

A full list of issues is attached in the report below.
Reports attached.
scan report.zip

If there is an exact replica of this repo on source.golabs.io then I can help raising an MR to fix all of these dependencies also. That will help you review the same.
For some reason I am not able to in gitlab.

firehose_sink_http_response_code_total metric sends url tag. The tag values can be unbounded.

๐Ÿ› Bug Report

private void captureHttpStatusCount(HttpEntityEnclosingRequestBase httpRequestMethod, HttpResponse response) {
        String urlTag = "url=" + httpRequestMethod.getURI().getPath();
        String statusCode = statusCode(response);
        String httpCodeTag = statusCode.equals("null") ? "status_code=" : "status_code=" + statusCode;
        getInstrumentation().captureCount(SINK_HTTP_RESPONSE_CODE_TOTAL, 1, httpCodeTag, urlTag);
    }

URL can be unbounded which can results into large number of series.

Expected Behavior

URL tag should not be sent.

Steps to Reproduce

Steps to reproduce the behavior.

  1. Run firehose
  2. Have cortex series limit.
  3. cortex will throw 400 when trying to push metrics after the series limit is reached.

Environment

Kubernetes

For JDBC sink, connections are recreated with every push

๐Ÿ› Bug Report

The connections are reset after every writes to the DB. Leading to new connection creation and deletion with every write.

Expected Behavior

The connection should be recycled and the pool should reuse it rather than creating and destroying it with every connection

Steps to Reproduce

  1. Sink the data to a Postgres database with debug log
  2. You will see logs like this
    [null connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - null - Added connection org.postgresql.jdbc.PgConnection@5211101e [pool-2-thread-1] DEBUG io.odpf.firehose.sink.jdbc.JdbcSink - DB response: [1] [pool-2-thread-1] INFO io.odpf.firehose.sink.jdbc.JdbcSink - Pushed 1 messages to db. [null connection closer] DEBUG com.zaxxer.hikari.pool.PoolBase - null - Closing connection org.postgresql.jdbc.PgConnection@5211101e: (connection evicted by user)

BQ sink to create/update the destination table/dataset.

Description :
Table to be created based on the proto. Table to be updated/created during startup time & schema updates to be synchronous to avoid any errors from BQ rate limiting. Updates can happen from multiple workers parallely.
Configurations:
Configurations for the destination BQ Table
BQ labels to be applied on table.
Acceptance Criteria :
Metrics to be captured whenever schema is updated.
Fail if there are backward incompatible changes with the destination & capture metrics for the same.
Table labels to be updated appropriately.

Support asynchronous consumers with automated offset management

Description :
As part of this story, a user should be able to configure an asynchronous mode of consuming messages for the supported sinks. When the configured parallelism for processing to sinks is not sufficient then consumers should wait.

Configurations :
Configuration on consumption mode
Number of parallel threads processing the batch parallelly.
Time to wait for an empty slot for the executor pool.

Acceptance Criteria :

  1. There's an AsyncConsumer that maintains Futures per partition and sends futures list to offsetManager
  2. If pool is full, then there should be wait for pool to become empty.
  3. No Data loss.
    Note: Offset manager is out of scope of this story

Prometheus/Cortex sink

Create a new firehose sink that can push Prometheus metric format. The sink we are going to support is Prometheus/Cortex.

Prometheus Remote Write:
Features of Prometheus allow transparently sending samples. This is primarily intended for long-term storage.

Why Cortex/Prometheus?

  • Cortex is an open-source time-series database and monitoring system for applications and microservices. Based on Prometheus, Cortex adds horizontal scaling and virtually indefinite data retention.
  • It supports Prometheus write API that can push metrics so we can use it in Firehose.
  • It supports Amazon DynamoDB, Google Bigtable, Cassandra, S3, GCS, and Microsoft Azure for long-term storage of metric data.
  • It offers a global view of Prometheus time-series data that includes data in long-term storage, greatly expanding the usefulness of PromQL for analytical purposes.
  • It can isolate data and queries from multiple different independent Prometheus sources in a single cluster, allowing untrusted parties to share the same cluster.

Support for BigQuery sink Table Clustering

Currently, firehose is already supported for table partitioning on BigQuery sink.

There are some cases where partitioning only is not enough to improve the query performance on BigQuery. To fulfill that case and improve the query performance better, need to add a feature to also support clustering on BigQuery table.

Expected Behaviour:

  • BQ sink should be able to create partitioned and clustered table
  • BQ sink should be able to create clustered tables without partitioned table
  • BQ sink should be able to modify the clustered table
  • BQ sink should be able to modify the non-clustered table

GCS sink to handle all error scenarios in message/sink errors

Description :
Implement error handling, need to be backward compatible, if error info not configured (is null), the messages need to be retried.

All input message handling route need to happen as per the configurations. The error list should should be on the configuration.

Need to list some error that generic enough for all sink

Error need to be handled:

  • Deserialization Errors need to be logged
  • Unknown Fields need to be logged

Configurations :
Fail on Deserialization Errors : Default true
Fail on Unknown Fields : Default true

Acceptance Criteria :
Metrics for timeouts to be captured.
Metrics for errors to be captured.
Messages skipped will be captured, only if the batch is successfully processed.

Cloud Storage sink can be configured to write to GCS

Description :
CloudStorage sink after converting proto input to parquet files writes to the configured destination. CloudStorage is configurable.

Configuration :
Destination configurations.
Partition field

Acceptance Criteria :
Parquet files to be written to GCS filesystem.

NullPointerException on using repeated field for JSON body template in HTTP sink

๐Ÿ› Bug Report

For HTTP Sink JSON Templatised body, if there is a repeated field being used for JSON body creation, then firehose silently fails with NullPointerException while reporting metrics

java.lang.NullPointerException: null
	at java.time.Duration.between(Duration.java:473)
	at io.odpf.firehose.metrics.StatsDReporter.captureDurationSince(StatsDReporter.java:53)
	at io.odpf.firehose.metrics.Instrumentation.captureSinkExecutionTelemetry(Instrumentation.java:165)
	at io.odpf.firehose.sink.AbstractSink.pushMessage(AbstractSink.java:56)
	at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)

Expected Behavior

Firehose should throw proper errors for non-supported field types

Steps to Reproduce

Steps to reproduce the behavior.

  1. Use an input proto having repeated field
  2. Use that repeated field in JSON body template
  3. Run firehose

Need Clarification on Protobuf, Kafka and Prometheus interaction

Hello,

Apologies in advance, I was unable to access the slack linked in the bug report menu. I am trying to connect Kafka to a Prometheus instance and I am confused on how index mapping works surrounding Protobuf.

From the guide:

SINK_PROM_METRIC_NAME_PROTO_INDEX_MAPPINGโ€‹

The mapping of fields and the corresponding proto index which will be set as the metric name on Cortex. This is a JSON field.

Example value: {"2":"tip_amount","1":"feedback_ratings"}
Proto field value with index 2 will be stored as metric named tip_amount in Cortex and so on
Type: required

Would "tip_amount" be a record header? Can firehose handle Kafka messages with variable record list lengths?

Thank you,
Liam

GCS as DLQ sink to be supported & be configurable.

Description : As part of this multiple DLQ sinks to be supported, Log Sink/ GCS along with the kafka DLQ sink.
Configuration :
DLQ sink & the corresponding configurations.
Acceptance Criteria :
GCS DLQ sink should be written a single file for every batch & follow the same contract.
Design Consideration :
Change in Sink Response can be carried out as part of this.

BigTable Sink using Depot

Lets add BigTable sink in Firehose using the BigTable sink connector implementation in ODPF Depot library.

Tasks -

  • Implement Bigtable sink using Depot library
  • Bigtable sink documentation

Add Troubleshooting section in docs

Add a Troubleshooting section in Firehose documentation containing common runtime problems and their solutions.

Sink - specific troubleshooting as well as generic issues ( related to Stencil client, Kafka consumer, etc) need to be covered in this section.

Firehose to gRPC sink job failure with error `Uncaught exception in the SynchronizationContext. Panic! java.lang.IllegalStateException: Could not find policy 'pick_first'. `

Firehose to gRPC sink job failure with error Uncaught exception in the SynchronizationContext. Panic! java.lang.IllegalStateException: Could not find policy 'pick_first'.

Nov 14, 2022 5:57:37 PM io.grpc.internal.ManagedChannelImpl$2 uncaughtException
SEVERE: [Channel<1>: (127.0.0.1:6565)] Uncaught exception in the SynchronizationContext. Panic!
java.lang.IllegalStateException: Could not find policy 'pick_first'. Make sure its implementation is either registered to LoadBalancerRegistry 
        or included in META-INF/services/io.grpc.LoadBalancerProvider from your jar files.
        at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.<init>(AutoConfiguredLoadBalancerFactory.java:92)
        at io.grpc.internal.AutoConfiguredLoadBalancerFactory.newLoadBalancer(AutoConfiguredLoadBalancerFactory.java:63)
        at io.grpc.internal.ManagedChannelImpl.exitIdleMode(ManagedChannelImpl.java:406)
        at io.grpc.internal.ManagedChannelImpl$RealChannel$2.run(ManagedChannelImpl.java:972)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.internal.ManagedChannelImpl$RealChannel.newCall(ManagedChannelImpl.java:969)
        at io.grpc.internal.ManagedChannelImpl.newCall(ManagedChannelImpl.java:911)
        at io.grpc.internal.ForwardingManagedChannel.newCall(ForwardingManagedChannel.java:63)
        at io.grpc.stub.MetadataUtils$HeaderAttachingClientInterceptor.interceptCall(MetadataUtils.java:74)
        at io.grpc.ClientInterceptors$InterceptorChannel.newCall(ClientInterceptors.java:156)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
        at io.odpf.firehose.sink.grpc.client.GrpcClient.execute(GrpcClient.java:59)
        at io.odpf.firehose.sink.grpc.GrpcSink.execute(GrpcSink.java:38)
        at io.odpf.firehose.sink.AbstractSink.pushMessage(AbstractSink.java:46)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkWithFailHandler.pushMessage(SinkWithFailHandler.java:34)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkWithRetry.pushMessage(SinkWithRetry.java:54)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkFinal.pushMessage(SinkFinal.java:28)
        at io.odpf.firehose.consumer.FirehoseSyncConsumer.process(FirehoseSyncConsumer.java:43)
        at io.odpf.firehose.launch.Main.lambda$multiThreadedConsumers$0(Main.java:65)
        at io.odpf.firehose.launch.Task.lambda$run$0(Task.java:49)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Expected Behavior

A firehose job need to interact with gRPC API for the response.

Steps to Reproduce

  1. Proto used to reproduce the scenario
syntax = "proto3";
package io.odpf.dagger.consumer;
option java_multiple_files = true;
option java_package = "io.odpf.dagger.consumer";
option java_outer_classname = "SampleGrpcServerProto";

service TestServer {
  rpc TestRpcMethod (TestGrpcRequest) returns (TestGrpcResponse) {}
}
message TestGrpcRequest {
  string field1 = 1;
  string field2 = 2;
}
message TestGrpcResponse {
  bool success = 1;
  repeated Error error = 2;
  string field3 = 3;
  string field4 = 4;
}
message Error {
  string code = 1;
  string entity = 2;
}
  1. Write a simple gRPC API which expects two fields and gives back response as is.

  2. Run a firehose job in local with below local properties which consumes data from local kafka and uses gRPC as sink.

java -jar build/libs/firehose-0.4.2.jar

KAFKA_RECORD_PARSER_MODE=message
SINK_TYPE=grpc
INPUT_SCHEMA_PROTO_CLASS=io.odpf.dagger.consumer.TestGrpcRequest
SCHEMA_REGISTRY_STENCIL_ENABLE=false
SOURCE_KAFKA_BROKERS=127.0.0.1:9092
SOURCE_KAFKA_TOPIC=test-grpc-request
SOURCE_KAFKA_CONSUMER_GROUP_ID=sample-grpc-group-id2
SINK_GRPC_SERVICE_HOST=127.0.0.1
SINK_GRPC_SERVICE_PORT=6565
SINK_GRPC_METHOD_URL=io.odpf.dagger.consumer.TestServer/TestRpcMethod
SINK_GRPC_RESPONSE_SCHEMA_PROTO_CLASS=io.odpf.dagger.consumer.TestGrpcResponse

The job is failing with the above mentioned error.

##Analysis:
In the current implementation, the gRPC client chooses default LoadBalancerProvider ('pick_first') and default NameResolverProvider(DNS). The implementation classes PickFirstLoadBalancerProvider and DnsNameResolverProvider respectively are missing.

We could able to solve the issue with the including implementation classes through service provider like creating META-INF/services folder and creating a file named io.grpc.LoadBalancerProvider with value as io.grpc.internal.PickFirstLoadBalancerProvider and create another file io.grpc.NameResolverProvider with value io.grpc.internal.DnsNameResolverProvider under it.

Also if we provide only one service provider io.grpc.LoadBalancerProvider and miss other, we are getting below error.

Failed to resolve name. status=Status{code=UNAVAILABLE, description=Failed to initialize xDS, 
cause=io.grpc.xds.XdsInitializationException: Cannot find bootstrap configuration
Environment variables searched:
- GRPC_XDS_BOOTSTRAP
- GRPC_XDS_BOOTSTRAP_CONFIG

Java System Properties searched:
- io.grpc.xds.bootstrap
- io.grpc.xds.bootstrapConfig
        at io.grpc.xds.BootstrapperImpl.bootstrap(BootstrapperImpl.java:101)
        at io.grpc.xds.SharedXdsClientPoolProvider.getOrCreate(SharedXdsClientPoolProvider.java:90)
        at io.grpc.xds.XdsNameResolver.start(XdsNameResolver.java:155)
        at io.grpc.internal.ManagedChannelImpl.exitIdleMode(ManagedChannelImpl.java:412)
        at io.grpc.internal.ManagedChannelImpl$RealChannel$2.run(ManagedChannelImpl.java:972)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.internal.ManagedChannelImpl$RealChannel.newCall(ManagedChannelImpl.java:969)
        at io.grpc.internal.ManagedChannelImpl.newCall(ManagedChannelImpl.java:911)
        at io.grpc.internal.ForwardingManagedChannel.newCall(ForwardingManagedChannel.java:63)
        at io.grpc.stub.MetadataUtils$HeaderAttachingClientInterceptor.interceptCall(MetadataUtils.java:74)
        at io.grpc.ClientInterceptors$InterceptorChannel.newCall(ClientInterceptors.java:156)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
        at io.odpf.firehose.sink.grpc.client.GrpcClient.execute(GrpcClient.java:59)
        at io.odpf.firehose.sink.grpc.GrpcSink.execute(GrpcSink.java:38)
        at io.odpf.firehose.sink.AbstractSink.pushMessage(AbstractSink.java:46)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkWithFailHandler.pushMessage(SinkWithFailHandler.java:34)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkWithRetry.pushMessage(SinkWithRetry.java:54)
        at io.odpf.firehose.sinkdecorator.SinkDecorator.pushMessage(SinkDecorator.java:28)
        at io.odpf.firehose.sinkdecorator.SinkFinal.pushMessage(SinkFinal.java:28)
        at io.odpf.firehose.consumer.FirehoseSyncConsumer.process(FirehoseSyncConsumer.java:43)
        at io.odpf.firehose.launch.Main.lambda$multiThreadedConsumers$0(Main.java:65)
        at io.odpf.firehose.launch.Task.lambda$run$0(Task.java:49)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
}

REDIS TTL does not set for keyvalue and hashset data structure

๐Ÿ› Bug Report

TTL is not set for hashset and keyvalue redis data type

Expected Behavior

TTL should be set

Steps to Reproduce

Steps to reproduce the behavior.
1.SET SINK_REDIS_TTL_TYPE: DURATION|EXACT_TIME
2.SET SINK_REDIS_TTL_VALUE: 1000 in seconds |<unix_timestamp_in_future>
3. Run the fireshose

BQ sink to write to the configured destination.

Description : As part of this, all records to be written to the destination table, with error handling.
Acceptance criteria :

  • Records should be written to BQ without any data loss if there are no errors.
  • Check what exceptions should be thrown.
  • All the errors should be handled and set in the response, to be retried or dlqed

Cloud Storage Sink in Firehose to write Parquet Files

Description :
As part of this we introduce CS sink in firehose which creates parquet-mr files & write to local filesystem, to be rotated based on the configured size/time.

Configuration :
Destination configurations.
Partition field

Acceptance Criteria :

  1. Parquet files to be written with the metadata{topic, offset, partition}
  2. Parquet files to be rotated based on size defaulting to 256 MB.
  3. Parquet files to be rotated based on time as well defaulting to hour.
  4. No cleanup is necessary for Parquet files.
  5. Parquet files to be created in the configured destination path.

GCS sink to be optimized by parallelizing the parquet-creation / uploads.

Description :
As part of this GCS sink to be optimized where creation of parquet files & uploading to GCS will happen parallelly, without blocking them each other.
Acceptance Criteria :
GCS sinks to handle commits by itself.
Design Considerations :
SinkFactory to create sinks & fail if async consumer is configured.
SinkFactory to decide whether commits are auto managed by sink.
Refactor SinkFactory & FirehoseConsumerFactory.

End to end verification of no data loss in GCS sinks.

  1. Given message is missing.
  2. Number of messages are less in a given day.

Pre-requisite :

  1. code fixes are done from the review comments.

Outcome:

  1. To capture any metrics that will help us answer/confirm that there is no issue in the deployment.
  2. SOP's to verify no missing records.
  3. Dashboards to check if metrics are coming.
  4. Deploy on kubernetes.
  5. Load testing

Deprecate jaeger tracing in Firehose

WHAT ?

Remove dependency of jaeger-client from Firehose along with usages anywhere in the code.

WHY ?

As mentioned on their Github handle, Jaeger Clients are being deprecated and users are being recommended to move to OpenTelemetry APIs and SDKs.

Announcement on Github handle
Announcement on their documentation
Issue on Github for the deprecation

Firehose, as of release 0.2, has a dependency on jaeger-client. However, tracing is not used actively in Firehose in production and hence this dependency can be removed safely.

Is there a deadline ?

As per the notice on jaeger-tracing :

We plan to continue accepting pull requests and making new releases of Jaeger clients through the end of 2021. 
In January 2022 we will enter a code freeze period for 6 months, during which we will no longer accept pull requests with 
new features, with the exception of security-related fixes. After that we will archive the client library repositories and will 
no longer accept new changes.

Add instrumentation and logging to GCS sink.

Acceptance criteria:

  1. Analysis of metrics for GCS sink.
  2. Implementation.
  3. Tracing.

Metrics:

  1. record_write_count , tags : filename(partition+uuid)
  2. file_open_total
  3. file_closed_total, tags: success(true/false)
  4. file_closing_time_milliseconds
  5. file_size_bytes_total
  6. file_upload_total , tags: success(true/false)
  7. file_upload_time_milliseconds
  8. file_upload_bytes
Discussion: 
1. Distribution of the file size.
2. upload time.
3. success/failures of upload.
4. how many open files are there
5. time taken to close parquet files.
6. messages read/messages per parquet file.
7. No of Error messages or dlqed. 
8. Think about completeness/freshness/deduplication.

Deprecate config SINK_HTTP_PARAMETER_SCHEMA_PROTO_CLASS

WHAT ?

Deprecate config SINK_HTTP_PARAMETER_SCHEMA_PROTO_CLASS

WHY ?

For a Firehose with HTTP Sink configured with header or query parameter source ( that is, SINK_HTTP_PARAMETER_SOURCE != disabled), the proto class that is used for parsing the incoming Kafka message during request creation is configured using SINK_HTTP_PARAMETER_SCHEMA_PROTO_CLASS.

This is confusing, as there is already a config INPUT_SCHEMA_PROTO_CLASS which tells the proto class that needs to be used for parsing the incoming Kafka message.

Ideally, we would like to keep a single variable which denotes this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.