GithubHelp home page GithubHelp logo

kafka-rust / kafka-rust Goto Github PK

View Code? Open in Web Editor NEW
1.2K 24.0 131.0 1.12 MB

Rust client for Apache Kafka

License: MIT License

Rust 97.84% Shell 1.77% Dockerfile 0.39%
rust kafka kafka-rust rust-client

kafka-rust's Introduction

Kafka Rust Client

Build Status

Project Status

This project is starting to be maintained by John Ward, the current status is that I am bringing the project up to date with the latest dependencies, removing deprecated Rust code and adjusting the tests.

New Home

Welcome to kafka-rust's new home: https://github.com/kafka-rust

Documentation

Sponsors

Thank you to our sponsors, this helps me to spend more time on this project and also helps with infrastructure

Upstash

Upstash: Serverless Kafka

  • True Serverless Kafka with per-request-pricing
  • Managed Apache Kafka, works with all Kafka clients
  • Built-in REST API designed for serverless and edge functions

Start for free in 30 seconds!

Installation

This crate works with Cargo and is on crates.io. The API is currently under heavy movement although we do follow semantic versioning (but expect the version number to grow quickly).

[dependencies]
kafka = "0.9"

To build kafka-rust the usual cargo build should suffice. The crate supports various features which can be turned off at compile time. See kafka-rust's Cargo.toml and cargo's documentation.

Supported Kafka version

kafka-rust is tested for compatibility with a select few Kafka versions from 0.8.2 to 3.1.0. However, not all features from Kafka 0.9 and newer are supported yet.

Examples

As mentioned, the cargo generated documentation contains some examples. Further, standalone, compilable example programs are provided in the examples directory of the repository.

Consumer

This is a higher-level consumer API for Kafka and is provided by the module kafka::consumer. It provides convenient offset management support on behalf of a specified group. This is the API a client application of this library wants to use for receiving messages from Kafka.

Producer

This is a higher-level producer API for Kafka and is provided by the module kafka::producer. It provides convenient automatic partition assignment capabilities through partitioners. This is the API a client application of this library wants to use for sending messages to Kafka.

KafkaClient

KafkaClient in the kafka::client module is the central point of this API. However, this is a mid-level abstraction for Kafka rather suitable for building higher-level APIs. Applications typically want to use the already mentioned Consumer and Producer. Nevertheless, the main features or KafkaClient are:

  • Loading metadata
  • Fetching topic offsets
  • Sending messages
  • Fetching messages
  • Committing a consumer group's offsets
  • Fetching a consumer group's offsets

Bugs / Features / Contributing

There's still a lot of room for improvement on kafka-rust. Not everything works right at the moment, and testing coverage could be better. Use it in production at your own risk. Have a look at the issue tracker and feel free to contribute by reporting new problems or contributing to existing ones. Any constructive feedback is warmly welcome!

As usually with open source, don't hesitate to fork the repo and submit a pull request if you see something to be changed. We'll be happy to see kafka-rust improving over time.

Integration tests

When working locally, the integration tests require that you must have Docker (1.10.0+) and docker-compose (1.6.0+) installed and run the tests via the included run-all-tests script in the tests directory. See the run-all-tests script itself for details on its usage.

Creating a topic

Note unless otherwise explicitly stated in the documentation, this library will ignore requests to topics which it doesn't know about. In particular it will not try to retrieve messages from non-existing/unknown topics. (This behavior is very likely to change in future version of this library.)

Given a local kafka server installation you can create topics with the following command (where kafka-topics.sh is part of the Kafka distribution):

kafka-topics.sh --topic my-topic --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1

See also Kafka's quickstart guide for more information.

Alternative/Related projects

  • rust-rdkafka is an emerging alternative Kafka client library for Rust based on librdkafka. rust-rdkafka provides a safe Rust interface to librdkafka.

kafka-rust's People

Contributors

aaabramov avatar dead10ck avatar flier avatar hamirmahal avatar humb1t avatar jedisct1 avatar johnward avatar l4l avatar mblair avatar midnightexigent avatar msk avatar quartzinquartz avatar r3h0 avatar sevagh avatar spicavigo avatar thijsc avatar tshepang avatar winding-lines avatar xitep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafka-rust's Issues

Handle broker outage

kafka-rust currently does a poor job of handling errors. It basically pushes all the responsibility to client code. However, that client code's only option, is to reload metadata and to try the operation again. There are various situations, though, where kafka-rust can do much better.

In particular - in the scope of this ticket - we'd like Producer and Consumer to try to "survive" the outage of a particular kafka broker in a cluster. In other words, the two clients shall transparently deal with partition leader re-assignment and temporary kafka errors.

More details to come.

Specify Kafka Protocol Version

I've been playing around with this library, and it appears to not work with Kafka version 0.8.2.1 as I keep getting unexpected EOF errors. Does this library just not support Kafka 0.8 or am I doing something wrong?

library not found for -lsnappy on OS X El Capitan

Just updated to OS X El Capitan and can't use your library

 cargo build

returns

ld: library not found for -lsnappy

here is the output from homebrew:

Igors-MacBook-Pro: igorkhomenko$ brew install snappy
Warning: snappy-1.1.3 already installed
Warning: You are using OS X 10.11.
We do not provide support for this pre-release version.
You may encounter build failures or other breakage.

Fetching group topic offsets does not work

I'm trying to use this lib to calculate the offset lag for a particular group ID. It seems able to fetch topic metadata though KafkaClient::fetch_offsets just fine, but when specifying a group with KafkaClient::fetch_group_topic_offsets, it returns an UnknownTopicOrPartition error in the resulting offset entry.

fn main() {
    ...
    let kafka_config = config.kafka_config;

    let mut client = KafkaClient::new(kafka_config.broker_list.clone());
    client.load_metadata(&kafka_config.topics).unwrap();

    let offsets = client.fetch_offsets(&kafka_config.topics, FetchOffset::Latest)
        .unwrap();
    println!("{:#?}", offsets);

    for topic in kafka_config.topics {
        let group_offsets = client.fetch_group_topic_offsets(&kafka_config.group, &topic)
            .unwrap();
        println!("{:#?}", group_offsets);
    }
}

this prints out

{
    "foo": [
        PartitionOffset {
            offset: Ok(
                299905
            ),
            partition: 3
        },
        PartitionOffset {
            offset: Ok(
                299905
            ),
            partition: 6
        },
    ...
}
[
    TopicPartitionOffset {
        offset: Err(
            Kafka(
                UnknownTopicOrPartition
            )
        ),
        topic: "foo",
        partition: 3
    },
    TopicPartitionOffset {
        offset: Err(
            Kafka(
                UnknownTopicOrPartition
            )
        ),
        topic: "foo",
        partition: 6
    },
    ...
]

These are active Kafka (v0.9) topics that are known to be working with the official Java library, and have committed offsets. It's also worth mentioning that when I create a Consumer and poll messages on the same group ID, it does not find a committed offset for the group, even though it does have one--it falls back to the offset specified by the fallback option.

consumer: allow usage without a group

It should be possible to use Consumer without a group; if no group is specified reading starts at the Consumer's fallback_offset. Committing offsets through a Consumer without a group shall not result in an error but translate into a no-op.

dnscrypt client is not working if original DNS server is non-recursive

The following function in "check-hijacking.sh" will never succeed if the original DNS server is working with non-recursive mode. That causes dnscrypt client in "standby" mode.

try_resolution() {
exec alarmer 5 dig +tries=2 +time=3 +short google-public-dns-a.google.com
| egrep '^8[.]8[.]8[.]8' > /dev/null 2>&1
}

Consumer performance

The Consumer currently clones every messages it finally delivers to its client code (and further continues to reference the originally cloned message.) Given that messages can be pretty large this needs to be changed. (I fear we might need to drop the Iterator implementation and provide some other - potentially more complicated - means of iterating the fetched messages.)

consumer: move skipping messages with earlier offsets than requested in kafkaclient

Consumer already properly skips messages with a smaller offset than the one requested. This occurs mainly on topics with compressed messages as Kafka merely hands out the compressed block of messages containing the requested offset.

We'd like to make this "skipping" feature available directly through KafkaClient#fetch_messages's method, since that is a guarantee the client should already be providing. So far it seems, this could be done already in the protocol while parsing the fetch_messages response. This could avoid iterating the messages after protocol parsing again.

We shall not to forget to cleanup the code in Consumer.

Clarify ownership in API / Avoid needless allocations

Hello ...

with ... edec13c Removed references from struct methods since they want the ownership of objects anyway ... the API has changed to "swallow" host and topic names. Was there a particular reason for doing so? If i understand it correctly, this is consistent with various client input/output structures (e.g. util::ProduceMessage).

However, the fact that the API "swallows" topics as Strings requires clients to clone them on their input side (for repeated calls of the corresponding methods); Further, the fact that "topics" are a String in the output structures, implies repeated allocations even though very often the topic will vary little. This seems needless given that the client maintains an already allocated map of topics internally (which could be reused.)

Are we open to changes that would break the current client API?

SASL/Kerberos support

Similar to #50, Kafka 0.9.0 added support for SASL/Kerboros authentication in KAFKA-1686, so it'd be also nice if this library supported it. I've started framing out a GSSAPI binding which can take care of the authentication handshake, but we'll also need some socket wrapper that can manage wrapping and unwrapping messages in SASL encrypted packets.

KafkaClient.send_messages() only sends the last message

Hi,

Looks like KafkaClient.send_messages() only sends one message (always the last one) no matter what the size of the messages vector is.

Even the sample code:

    let m1 = "a".to_string().into_bytes();
    let m2 = "b".to_string().into_bytes();
    let req = vec!(utils::ProduceMessage{topic: "my-topic".to_string(), message: m1},
                    utils::ProduceMessage{topic: "my-topic".to_string(), message: m2});
    client.send_messages(1, 100, req);  // required acks, timeout, messages

Actually only writes m2. Consumers only get b, and the offset is only increased by 1 even though both messages should go to the same topic.

Reading snappy compressed message

Reading a topic with snappy compressed messages results in fewer messages delivered through fetch_messages (and fetch_messages_multi) then actually delivered by kafka (i'm using a 0.8.2.1 kafka server.)

Snappy compression

Sorry for lazily opening an issue instead of coming up with a pull request, but would it be possible to support Snappy compression for sent (not received) messages?

Thanks for kafka-rust!

Update dependency on crc

Once crc with PR#7 is published on crates.io update our dependency on crc. This will align the utilized version of lazy_static which we inherit once through the crc and once through the openssl modules in distinct versions now.

cannot build, missing `-lsnappy`

I tried to build with this crate for simple consumer usage locally on osx 10.11.3. System details:

(master)$ uname -a
Darwin Computer-2.local 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
(master)$ cargo --version
cargo 0.11.0-nightly (3ff108a 2016-05-24)
(master)$ rustc --version
rustc 1.10.0-nightly (97e3a2401 2016-05-26)
(master)$

Example app here: https://github.com/samphippen/kafka-example-app.

The error is this:

(master)$ cargo build --verbose
       Fresh gcc v0.3.28
       Fresh log v0.3.6
       Fresh fnv v1.0.2
       Fresh pkg-config v0.3.8
       Fresh lazy_static v0.1.16
       Fresh crc v1.2.0
       Fresh libc v0.2.11
       Fresh byteorder v0.4.2
       Fresh ref_slice v1.0.0
       Fresh miniz-sys v0.1.7
       Fresh openssl-sys v0.7.13
       Fresh snappy v0.3.0
       Fresh flate2 v0.2.14
       Fresh lazy_static v0.2.1
       Fresh bitflags v0.7.0
       Fresh openssl-sys-extras v0.7.13
       Fresh openssl v0.7.13
       Fresh kafka v0.3.1
   Compiling scraper_worker v0.1.0 (file:///Users/sam/dev/jack-the-scrobbler/scraper_worker)
     Running `rustc src/main.rs --crate-name scraper_worker --crate-type bin -g --out-dir /Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug --emit=dep-info,link -L dependency=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug -L dependency=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps --extern kafka=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libkafka-d02df2bf964e3426.rlib -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-d799b404184be4f1/out -L native=/usr/local/Cellar/openssl/1.0.2g/lib -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-sys-extras-c284211a61168afa/out -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/miniz-sys-60c8d67696f63a43/out`
error: linking with `cc` failed: exit code: 1
note: "cc" "-m64" "-L" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/scraper_worker.0.o" "-o" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/scraper_worker" "-Wl,-dead_strip" "-nodefaultlibs" "-L" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug" "-L" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps" "-L" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-d799b404184be4f1/out" "-L" "/usr/local/Cellar/openssl/1.0.2g/lib" "-L" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-sys-extras-c284211a61168afa/out" "-L" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/miniz-sys-60c8d67696f63a43/out" "-L" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libkafka-d02df2bf964e3426.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/liblog-bf16bb9a4912b11d.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libflate2-d719035eaa7c6a88.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libsnappy-4450cd7892562f9a.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libcrc-1fceeef3c778b3f2.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libfnv-4e449da685eb30c8.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libbyteorder-7a494f72a43262ac.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/liblazy_static-3a04918be71c80ee.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libminiz_sys-722889de4af2439c.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libopenssl-77636b95c8014a35.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/liblazy_static-359f5533c970cd71.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libbitflags-0e272044714c8076.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libopenssl_sys_extras-4b36efa6bfc0515d.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libopenssl_sys-c114e0e82665d5da.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/liblibc-6b483f9a7097e9a4.rlib" "/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libref_slice-115b3b4ccf1706d0.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libstd-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libcollections-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/librustc_unicode-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libunwind-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/librand-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liballoc-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liballoc_jemalloc-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liblibc-cb705824.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libcore-cb705824.rlib" "-l" "snappy" "-l" "snappy" "-l" "ssl" "-l" "crypto" "-l" "System" "-l" "pthread" "-l" "c" "-l" "m" "-l" "compiler-rt"
note: ld: library not found for -lsnappy
clang: error: linker command failed with exit code 1 (use -v to see invocation)

error: aborting due to previous error
error: Could not compile `scraper_worker`.

Caused by:
  Process didn't exit successfully: `rustc src/main.rs --crate-name scraper_worker --crate-type bin -g --out-dir /Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug --emit=dep-info,link -L dependency=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug -L dependency=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps --extern kafka=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/deps/libkafka-d02df2bf964e3426.rlib -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-d799b404184be4f1/out -L native=/usr/local/Cellar/openssl/1.0.2g/lib -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/openssl-sys-extras-c284211a61168afa/out -L native=/Users/sam/dev/jack-the-scrobbler/scraper_worker/target/debug/build/miniz-sys-60c8d67696f63a43/out` (exit code: 101)

Any advice? It looks like I'm missing lib snappy, but I'd prefer to not have to install any libraries on my system to work with this package. Is this something where we need to statically link libsnappy in the build of this crate?

client/consumer/producer: time api

Rework the public API such that it accepts std::time::Duration in places where it currently expects the user to provide a duration in milliseconds. This will be less prone to unit mismatches and errors by mistake. So far the places identified are:

  • kafka::producer::Builder::with_ack_timeout
  • kafka::client::KafkaClient::produce_messages (ack_timeout parameter)
  • kafka::client::KafkaClient::set_retry_backoff_time and kafka::client::KafkaClient::retry_backoff_time
  • kafka::client::KafkaClient::fetch_max_wait_time and kafka::client::KafkaClient::set_fetch_max_wait_time
  • kafka::consumer::Builder::with_fetch_max_wait_time

Since const_fn is not stable yet (and Duration constructor functions are not const), we'll keep the type of the following but rename them for sake of clarity:

  • kafka::producer::DEFAULT_ACK_TIMEOUT -> DEFAULT_ACK_TIMEOUT_MILLIS
  • kafka::client::DEFAULT_RETRY_BACKOFF_TIME -> DEFAULT_RETRY_BACKOFF_TIME_MILLIS
  • kafka::client::DEFAULT_FETCH_MAX_WAIT_TIME -> DEFAULT_FETCH_MAX_WAIT_TIME_MILLIS

Internally, we might persist the values as milliseconds if necessary to avoid repeated unnecessary conversions from the duration representation. However, the conversion of a Duration to an i32 (as often required by the kafka protocol) might fail; we shall introduce an InvalidDuration error (ideally with some information about which duration was found to be invalid) and fail if the conversion is not safely possible: For Producer and Consumer we would fail upon the call to their Builder::create() method. In KafkaClient we'll be failing lazily in fetch_message and produce_messages.

This will be a breaking change of course.

All new, future public API aspects around a duration shall be represented with the information being exposed as a std::time::Duration.

consumer: automatically assigned partitions

Great work on this library so far! Do you have any plans to support automatically load balancing partitions over consumers as described here?

If so, anything I could do to help get this done?

producer: ensure distributing messages with the same key to the same partition

The "key" - if available - of a message is typically used to derive a consistent partition which a message should be delivered to. Usually one computes a hash code of the key and maps that consistently on one of the partitions of the topic.

Clients of the Producer API can achieve this already by supplying a custom Partitioner to the Producer. However, we shall make this the standard behaviour of the DefaultPartitioner.

Unfortunately, topic metadata held by the Producer (which is supplied by KafkaClient) deliver only "available" partitions, which makes it impossible to implement the "consistency" aspect of the requirement. Therefore, this will require an extension to KafkaClient as well.

Regarding the hash algorithm we don't need a cryptographic hash function. However, we need to use one which doesn't different depending the platform - this rules out farmhash for example. (it seems the java based producer uses murmur2-32bit, but i don't see the need to use the same since various kafka clients seem not to stick a "default" hash function, e.g. the java based producer uses a different hash function than the scala based one.)

Use latest rust-openssl

It would be nice to upgrade to rust-openssl 0.8.x (current stable branch) in order to avoid conflicts with other crates linking openssl 0.8.

consumer: allow resetting offset

Consumer should allow its clients to reset/seek its offset. This allows repeated runs over a topic's data (for testing purposes for example).

OS X issue

Hi there,

I have Kafka in my dependencies

[dependencies]
kafka = "0.1.6"

then I added

extern crate kafka;

then do 'cargo build'

and receive this big error log:

error: linking with `cc` failed: exit code: 1
note: "cc" "-m64" "-L" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib" "-o" "
....
note: ld: warning: directory not found for option '-L/Users/igor/kafka-pipe/kafka-pipe/.rust/lib/x86_64- apple-darwin'
ld: warning: directory not found for option '-L/Users/igor/kafka-pipe/kafka-pipe/lib/x86_64-apple-darwin'
ld: library not found for -lsnappy
clang: error: linker command failed with exit code 1 (use -v to see invocation)

error: aborting due to previous error 
Could not compile `kafka-pipe`

utils::{TopicPartitionOffset, PartitionOffset} have Result fields

Currently, the utils::{TopicPartitionOffset, PartitionOffset} structs' offset fields are Results. In my opinion, it's kind of strange to have a struct with a Result in the middle of it for anything but maybe custom Error types. It can make error handling on the user-side really inconvenient.

Additionally, it's not entirely clear to me why TopicPartitionOffset is needed at allโ€”it seems that it exists only to flatten responses, which leads to lots of inefficient cloning of data that it already had. I think in an instance like the linked one, it would be more appropriate to return a map which maps topics to PartitionOffsets; that way, only a move of one string is necessary, rather than a clone for every single offset response.

What do you think? I tried to start by changing the Results to just their underlying type, but I can see it's a non-trivial refactor, so I thought I'd get your input first.

error: multiple rlib candidates for `kafka` found

I'm a beginner of kafka and rust, to use rust client as a consumer, I add the github example in src/main.rs

extern crate kafka;
use kafka::client::KafkaClient;
fn main() {
    let mut client = KafkaClient::new(&vec!("localhost:9092".to_string()));
    client.load_metadata_all();
    // OR
    // client.load_metadata(&vec!("my-topic".to_string())); // Loads metadata for vector of topics
 }

and add kafka = "*" in the cargo.toml dependencies

then I run $ cargo build I have the following error, is there anyone can help me to solve this?

   Compiling kafka v0.1.6 (file:///Users/admin/myrustdemo/kafka-rust-master)
src/main.rs:2:1: 2:20 error: multiple rlib candidates for `kafka` found
src/main.rs:2 extern crate kafka;
              ^~~~~~~~~~~~~~~~~~~
src/main.rs:2:1: 2:20 note: candidate #1: /Users/admin/myrustdemo/kafka-rust-master/target/debug/libkafka.rlib
src/main.rs:2 extern crate kafka;
              ^~~~~~~~~~~~~~~~~~~
src/main.rs:2:1: 2:20 note: candidate #2: /Users/admin/myrustdemo/kafka-rust-master/target/debug/deps/libkafka-73c7ae9da4235a7d.rlib
src/main.rs:2 extern crate kafka;
              ^~~~~~~~~~~~~~~~~~~
src/main.rs:2:1: 2:20 error: can't find crate for `kafka`
src/main.rs:2 extern crate kafka;
              ^~~~~~~~~~~~~~~~~~~
error: aborting due to 2 previous errors
Could not compile `kafka`.

Consider formatting with rustfmt

It keeps the code nice and clean :) Currently, it would fix quite a lot of formatting:

 src/client/metadata.rs     |  20 +++----
 src/client/mod.rs          | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
 src/client/network.rs      |  61 +++++++++++---------
 src/client/state.rs        |  36 ++++++------
 src/client_internals.rs    |  11 ++--
 src/codecs.rs              |  75 ++++++++++++++-----------
 src/compression/gzip.rs    |   9 +--
 src/compression/snappy.rs  |  31 ++++++----
 src/consumer/assignment.rs |   5 +-
 src/consumer/builder.rs    |  16 ++++--
 src/consumer/mod.rs        | 113 ++++++++++++++++++++++++-------------
 src/consumer/state.rs      | 179 +++++++++++++++++++++++++++++++++++-----------------------
 src/error.rs               |  32 ++++++-----
 src/lib.rs                 |  11 ++--
 src/producer.rs            |  32 +++++------
 src/protocol/consumer.rs   | 162 ++++++++++++++++++++++++----------------------------
 src/protocol/fetch.rs      | 186 +++++++++++++++++++++++++++++++++++++-----------------------
 src/protocol/metadata.rs   |  60 ++++++++------------
 src/protocol/mod.rs        |  58 ++++++++++---------
 src/protocol/offset.rs     |  69 +++++++++--------------
 src/protocol/produce.rs    |  82 +++++++++++++--------------
 src/protocol/zreader.rs    |  17 +++---
 22 files changed, 877 insertions(+), 666 deletions(-)

Allow producing key/value pairs

KafkaClient::send_messages doesn't allow sending key/value pairs but limits itself to sending "values" only. This is limiting kafka-rust from general purpose usage. We shall allow the user to specify a key and a value as part of a ProduceMessage. This will require a breaking change.

SSL support

Kafka has support for SSL in KAFKA-1684, so it'd be nice to support it here as well. Hyper supports SSL channels, so it's implementation would be a useful reference.

Stronger validity on produce_message#required_acks

It appears that as of Kafka 0.10 the protocol specifies the following regarding the RequiredAcks field in a "produce message request":

This field indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response.

If anything else is specified by the client, the broker should be responding with the InvalidRequiredAcksCode(21) error code. 1) We should verify this. 2) We should prepare for this API already now even for Kafka 0.8 and 0.9 servers. Changes to kafka-rust's API are desired to make this type-safe and prevent clients from making an error accidentally.

Producer performance

This library is very promising :) Did you do any performance comparison with the official Kafka client library?

client: actively close inactive connections as soon as possible

We shall actively close inactive/idle connections as soon as possible to prevent brokers closing them first. In particular, we'd like to actively close idle connection as soon as possible after the "connection idle timeout" is reached (this is, after a connection is idle for this amount of time but not less) - this timeout configuration option is/was being introduced by #117.

By avoiding the active connection shutdown at the broker side, we'll ...

  • ... free the remote peer socket sooner for re-use at the brokers and ...
  • ... avoid the broker having connections sitting in the TIME_WAIT state.

client: re-connect if connection closed by broker

As of Kafka 0.9, brokers will actively shutdown client connections which are idle for more than a broker configured timeout (the broker configuration parameter is connections.max.idle.ms; defaults to 10 minutes). kafka-rust should re-establish its broken connections to brokers.

Using example/console-producer.rs to send a message, leaving it untouched for more than 10 minutes, and then sending another message should not result in an error (currently UnexpectedEOF.)

Regression causing lost messages

Change 62535d6 introduced a serious regression.

The first message for a given topic is now always lost.

    let m1 = "a".to_owned().into_bytes();
    let m2 = "b".to_owned().into_bytes();
    let m3 = "c".to_owned().into_bytes();
    let req = vec!(utils::ProduceMessage{topic: "test".to_owned(), message: m1},
                   utils::ProduceMessage{topic: "test".to_owned(), message: m2},
                   utils::ProduceMessage{topic: "test".to_owned(), message: m3}
                   );
    client.send_messages(1, 100, req);

=> Only m2 and m3 are sent, m1 is ignored.

I'd suggest reverting this change.

producer: transparently avoid sending too many messages at once

Client code can supply a list with way too many messages to be sent at once, leading to Kafka rejecting the "produce_message" response entirely. Producer shall be able to deal with such a situation and break up the list into smaller chunks and issue multiple "produce_messages" instead. Keep in mind:

  • It is important to preserve the order of messages (as passed in by client code) for a particular topic partition.
  • We might need to re-design the error code of the Producer::send_all method since it could end up containing multiple offsets for the same topic partition. Actually, I think we should result in a plain Result<()> here. Alternatively, we just ensure that only the latest offset produced to a particular topic partition is contained in the result.
  • KafkaClient::produce_messages shall not be extended by any "retry" or "break-up the list" logic, to further provide precise control to clients. However, it's return type will need to be changed such that client code can easily find out which messages were successfully sent and which are subject to a retry.

consumer: lots empty messages in kafka live data

I deployed a consumer as the example, I came across lots empty msgs, but in our production kafka live data there are no empty messages. Is there anyone could help me figure this out? Many thanks!

Allow builds without certain dependencies

It would be desirable to be able to build without dependencies to native libraries. In certain deployments it is not necessarily to link against openssl and/or the utilized compression libraries. We can exclude them and the corresponding functionality using features.

Genericize consumer builder functions

Currently the builder API only takes String for several of its functions. This isn't very ergonomic and could be improved. For example:

    pub fn with_group(mut self, group: String) -> Builder {
        self.group = group;
        self
    }

Becomes:

    pub fn with_group<S: Into<String>>(mut self, group: S) -> Builder {
        self.group = group.into();
        self
    }

This would not require call site changes because the Into<String> implementation for String is a no-op, so it should be a non-breaking API change.

Building on a system without installed Snappy

Hey there,

I'm working on a Heroku application that will be utilizing Kafka and am trying to build this library. As part of this I have a script which builds the libsnappy.a file by make installing Snappy into $CACHE_DIR/snappy-build/lib, here is the script:

#!/usr/bin/env bash
# bin/compile <build-dir> <cache-dir> <env-dir>

set -e
set -o pipefail

# Build related variables.
BUILD_DIR=$1
CACHE_DIR=$2
ENV_DIR=$3

if [[ ! -d "$CACHE_DIR/snappy" ]]; then
    mkdir -p $CACHE_DIR
    cd $CACHE_DIR
    git clone https://github.com/google/snappy
    cd snappy
    ./autogen.sh
    ./configure --prefix="$(pwd)/../snappy-build"
    make
    make install
fi

export LD_LIBRARY_PATH="$CACHE_DIR/snappy-build/lib"
export LD_RUN_PATH="$CACHE_DIR/snappy-build/lib"

However whenever I try to push my application the kafka-rust library tries to link it incorrectly. See:

src/main.rs:5:5: 5:31 warning: unused import, #[warn(unused_imports)] on by default
src/main.rs:5 use kafka::client::KafkaClient;
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
error: linking with `cc` failed: exit code: 1
note: "cc" "-Wl,--as-needed" "-m64" "-L" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib" "/source/target/debug/kafka_rust_test.0.o" "-o" "/source/target/debug/kafka_rust_test" "-Wl,--gc-sections" "-pie" "-nodefaultlibs" "-L" "/source/target/debug" "-L" "/source/target/debug/deps" "-L" "/source/target/debug/build/miniz-sys-d03126dbc9ee0074/out" "-L" "/source/target/debug/build/openssl-bade188098f75a08/out" "-L" "/usr/lib" "-L" "/source/target/debug/build/openssl-sys-extras-4daf08572eefce4c/out" "-L" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "-Wl,-Bdynamic" "/source/target/debug/deps/libkafka-9ca67eb32e4e1893.rlib" "/source/target/debug/deps/libiron-4fdc411e735a87b7.rlib" "/source/target/debug/deps/libmodifier-550b2f0e350e2963.rlib" "/source/target/debug/deps/liblibc-dd3420cb049117bb.rlib" "/source/target/debug/deps/libhyper-6df9e9da10e4a36f.rlib" "/source/target/debug/deps/libunicase-4ad2965620fe21a9.rlib" "/source/target/debug/deps/libsolicit-8632432b3a4330d6.rlib" "/source/target/debug/deps/libhttparse-5c82294627258d33.rlib" "/source/target/debug/deps/libhpack-8f91a695370f3d75.rlib" "/source/target/debug/deps/libplugin-99f6a9d43520a61a.rlib" "/source/target/debug/deps/liberror-d27f5056e37dd662.rlib" "/source/target/debug/deps/liblanguage_tags-9df4f269b9ee12d4.rlib" "/source/target/debug/deps/libnum_cpus-9a6b3f359403ec12.rlib" "/source/target/debug/deps/libbyteorder-3945c3ad3e0e1d9c.rlib" "/source/target/debug/deps/libtraitobject-4ea485452a3a4a0b.rlib" "/source/target/debug/deps/libtypemap-487ec1564575dd9b.rlib" "/source/target/debug/deps/libconduit_mime_types-8c1fe30d92f8233a.rlib" "/source/target/debug/deps/libmime-f5540d134e188f4e.rlib" "/source/target/debug/deps/liblog-87d547eff707fc8e.rlib" "/source/target/debug/deps/libcookie-9ec7d33888fc3f77.rlib" "/source/target/debug/deps/liburl-b7bb31aacec501cd.rlib" "/source/target/debug/deps/libuuid-fed17b74aa7673e2.rlib" "/source/target/debug/deps/libmatches-737aa40e66529b02.rlib" "/source/target/debug/deps/libopenssl-bade188098f75a08.rlib" "/source/target/debug/deps/liblazy_static-007034d2ad8108ce.rlib" "/source/target/debug/deps/libbitflags-646076c1f4684754.rlib" "/source/target/debug/deps/libopenssl_sys_extras-4daf08572eefce4c.rlib" "/source/target/debug/deps/libopenssl_sys-5ec15f2e42328238.rlib" "/source/target/debug/deps/libflate2-1e8d03096e238ad4.rlib" "/source/target/debug/deps/libminiz_sys-d03126dbc9ee0074.rlib" "/source/target/debug/deps/libserde-a8734c231f4c36ee.rlib" "/source/target/debug/deps/libnum-02177b937f857300.rlib" "/source/target/debug/deps/librand-340832a8942cb900.rlib" "/source/target/debug/deps/librustc_serialize-3c33cb2a40992011.rlib" "/source/target/debug/deps/libtypeable-7ddee84661471c9b.rlib" "/source/target/debug/deps/libtime-22c21fe32894ddad.rlib" "/source/target/debug/deps/liblibc-adb8b8e7aaa2f93f.rlib" "/source/target/debug/deps/libunsafe_any-081a220f4ebf6660.rlib" "/source/target/debug/deps/libtraitobject-a729e3ae66a1ef57.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcollections-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_unicode-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/librand-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc_jemalloc-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-17a8ccbd.rlib" "/root/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-17a8ccbd.rlib" "-l" "snappy" "-l" "c" "-l" "m" "-l" "ssl" "-l" "crypto" "-l" "dl" "-l" "pthread" "-l" "gcc_s" "-l" "pthread" "-l" "c" "-l" "m" "-l" "rt" "-fuse-ld=gold" "-l" "compiler-rt"
note: /usr/sbin/ld.gold: error: cannot find -lsnappy

Could you perhaps offer some guidance? I've read through the build.rs script in this repo and have been trying to convince it unsuccessfully.

mismatched types with KafkaClient::new

Hi there,

I have just copied your example

 let kafka_client = KafkaClient::new(&vec!("localhost:9092".to_string()));

and got the following error

src/main.rs:48:45: 48:80 error: mismatched types:
 expected `collections::vec::Vec<collections::string::String>`,
 found `&collections::vec::Vec<collections::string::String>`
 (expected struct `collections::vec::Vec`,
 found &-ptr) [E0308]

Could you provide a working example and/or update the README file

consumer: iteroperability with java clients

Recent java clients seem to commit the offset of the message they want to consume next (observed with KafkaConsumer from org.apache.kafka:kafka_2.10:0.9.0.1), while kafka-rust's Consumer commits the offset of the message it already consumed. For interchangeability we might want to consider switching to the Java client's strategy. Note: this is would be a breaking change for existing kafka-rust consumer programs!

consumer: allow consuming multiple topics

I'm trying to use a Rust consumer to read from multiple topics. This is the code I have now:

extern crate kafka;
use kafka::client::KafkaClient;
use kafka::consumer::Consumer;
use kafka::utils;
fn main(){
    let mut client = KafkaClient::new(vec!("localhost:9092".to_owned()));
    let res = client.load_metadata_all();
    let topics = client.topic_partitions.keys().cloned().collect(); 
    let offsets = client.fetch_offsets(topics, -1);
    for topic in &topics {
    let mut con = Consumer::new(client, "test-consumer-group".to_owned(), "topic".to_owned()).partition(0);
    let mut messagedata = 0;
    for msg in con {
        println!("{}", str::from_utf8(&msg.message).unwrap().to_string());
    }
  }
}

below is the error:

    src/main.rs:201:19: 201:25 error: use of moved value: `topics` [E0382]
src/main.rs:201     for topic in &topics {
                                  ^~~~~~
    note: in expansion of for loop expansion
    src/main.rs:201:5: 210:6 note: expansion site
    src/main.rs:167:40: 167:46 note: `topics` moved here because it has type `collections::vec::Vec<collections::string::String>`, which is non-copyable
    src/main.rs:167     let offsets = client.fetch_offsets(topics, -1);
                                                       ^~~~~~
    src/main.rs:203:37: 203:43 error: use of moved value: `client` [E0382]
    src/main.rs:203     let mut con = Consumer::new(client, "test-consumer-group".to_owned(), "topicname".to_owned()).partition(0);
                                                    ^~~~~~
    note: in expansion of for loop expansion
    src/main.rs:201:5: 210:6 note: expansion site
    note: `client` was previously moved here because it has type     `kafka::client::KafkaClient`, which is non-copyable
    error: aborting due to 2 previous errors

To better explain my question, here is my partial workable code for just one topic:

let mut con = Consumer::new(client, "test-consumer-group".to_owned(), "testtopic".to_owned()).partition(0);

for msg in con {
    println!("{}", str::from_utf8(&msg.message).unwrap().to_string());
}

And I tested the fetch_message function, it works for multiple topics, but the result I have (msgs) is Topicmessage, I don't know how to get message from Topicmessage.

let msgs = client.fetch_messages_multi(vec!(utils::TopicPartitionOffset{
                                            topic: "topic1".to_string(),
                                            partition: 0,
                                            offset: 0 //from the begining
                                            },
                                        utils::TopicPartitionOffset{
                                            topic: "topic2".to_string(),
                                            partition: 0,
                                            offset: 0
                                        },
                                        utils::TopicPartitionOffset{
                                            topic: "topic3".to_string(),
                                            partition: 0,
                                            offset: 0
                                        }));
for msg in msgs{
    println!("{}", msg);
}

consumer: optional message crc validation

Kafka provides the possibility to validate fetched messages based on a CRC check. Currently, this is not implemented in kafka-rust.

The validation shall be part of KafkaClient. It shall be possible to disabled it - we'll go with enabled validation by default. Consumer shall provide a convenient "setter" method through its Builder for this setting as well.

It'd be nice to have a separate benchmarks for fetching messages with the validation enabled/disabled.

Parallel communication to brokers

KafkaClient maintains a connection to each explored kafka broker. When sending or fetching messages the communication to these brokers happens synchronously.

Example 1: when delivering a bunch of messages to two (or multiple) brokers, first a message set to broker 1 is sent and its acknowledgement awaited, and only then a message set to broker 2 is sent and its acknowledgement waited.

Example 2: when fetching messages from partitions spread over multiple brokers, first we check broker 1, and only then check broker 2, etc.

This is clearly inefficient and results in poor latencies on a multi-node kafka deployment. We might want to spawn and maintain multiple threads within a KafkaClient, though I believe MIO would be ideal for this (and would avoid the overhead for maintaining the threads).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.