GithubHelp home page GithubHelp logo

uswitch / bifrost Goto Github PK

View Code? Open in Web Editor NEW
99.0 99.0 15.0 58 KB

Safely archive data from Apache Kafka to S3 with no Hadoop dependencies :)

License: Eclipse Public License 1.0

Clojure 100.00%

bifrost's People

Contributors

dacamo76 avatar dayooliyide avatar pingles avatar tgk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bifrost's Issues

RAM usage

Hi all -

I don't know much about Java / Clojure so it's a bit hard for me to give a great bug report. But basically, I update the default config to my S3 / Zookeeper settings and start Bifrost with Lein, per the README.

It starts and runs, makes Zookeeper connections, reads the metadata from my Kafka nodes, writes some data in /tmp/ as expected, and then my container's supervisor comes around and kills the process for too much memory usage (I have 1.8Gb free for use by Bifrost).

Is there a way to limit the memory usage? Or do I just need more memory? If this amount of usage is unexpected, any tips on debugging?

I tried changing uploaders-n to 1 hoping that would help, but no luck.

Thanks!
-Charlie

Schema for config

Use prismatic schema to ensure the system config is correct before starting.

Handle rebalance events

Bifrost currently doesn't handle rebalancing events- it assumes a single consumer process.

ZooKeeper error hangs creation of process

When starting Bifrost if there's a ZooKeeper exception thrown (e.g. #<ZkTimeoutException org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000>) it hangs the creation of processes for consuming the partition's messages. Add some proper re-try with decay timeout handling.

Functionality Questions about how customized data extraction can get

Hi - I have some functionality questions about Bifrost:

  1. Can Bifrost output data onto S3 in a format that is consumable by Redshift?
  2. How often can files be written out? Every hour? Every minute?
  3. Can Bifrost extract data by a data field in the Kafka data, rather than the Kafka created_at timestamp?

Thanks!

Add support for tar files

The baldr file-format is minimal and simple. We have since realised that tar-files offers a similar sequential file format that is slightly more difficult to implement readers from scratch for, but which have operating system support on many platforms, as well as in the Java standard library. It would be nice to have the option to store messages in tar files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.