GithubHelp home page GithubHelp logo

apache / incubator-heron Goto Github PK

View Code? Open in Web Editor NEW
3.6K 283.0 604.0 179.62 MB

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Home Page: https://heron.apache.org/

License: Apache License 2.0

Python 11.33% C++ 16.53% Shell 2.10% Java 53.54% C 0.14% HTML 0.37% JavaScript 11.17% Makefile 0.01% CSS 0.65% M4 0.18% Scala 1.32% Perl 0.09% Dockerfile 0.01% Starlark 2.57% Mustache 0.01%
heron streaming messaging

incubator-heron's Introduction

Build Status

logo

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor.

Heron in Apache Incubation

Documentation

https://heron.incubator.apache.org/
Confluence: https://cwiki.apache.org/confluence/display/HERON

Heron Requirements:

  • Java 11
  • Python 3.6
  • Bazel 6.0.0

Contact

Mailing lists

Name Scope
[email protected] User-related discussions Subscribe Unsubscribe Archives
[email protected] Development-related discussions Subscribe Unsubscribe Archives

Slack

Self-Register to our Heron Slack Workspace

Meetup Group

Bay Area Heron Meetup, We meet on Third Monday of Every Month in Palo Alto.

For more information:

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

incubator-heron's People

Contributors

aahmed-se avatar ajorgensen avatar ashvina avatar cckellogg avatar code0x58 avatar congwang avatar dependabot[bot] avatar erenavsarogullari avatar huijunw avatar huijunwu avatar jerrypeng avatar jingwei avatar joshfischer1108 avatar jrcrawfo avatar kramasamy avatar lewiskan avatar lucperkins avatar maosongfu avatar mycfelix avatar nicknezis avatar nlu90 avatar nwangtw avatar objmagic avatar saileshmittal avatar srkukarni avatar thinker0 avatar windhamwong avatar windie avatar xiaoyao1991 avatar yaoliclshlmch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-heron's Issues

Gracefull Shutdown for Kill Topology

Use executor termination instead of using a separate Shutdown class

ExecutorService taskExecutor = Executors.newFixedThreadPool(4);
while(...) {
taskExecutor.execute(new MyTask());
}
taskExecutor.shutdown();
try {
taskExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
...
}

python:topology_unittest fails on darwin (Python 2.7.9)

$ bazel clean
$ bazel build --config=darwin heron/...
$ bazel test --config=darwin heron/...
Executed 87 out of 87 tests: 86 tests pass and 1 fails locally.
FAIL: //heron/common/tests/python:topology_unittest

"""
============================= test session starts ==============================
platform darwin -- Python 2.7.9 -- py-1.4.27 -- pytest-2.6.4
collected 9 items

../../../../../../../../heron/common/tests/python/topology_unittest.py ...F.....

=================================== FAILURES ===================================
____________________ TopologyTest.test_set_execution_state _____________________

self = <topology_unittest.TopologyTest testMethod=test_set_execution_state>

def test_set_execution_state(self):
  # Set it to None
  self.topology.set_execution_state(None)
  self.assertIsNone(self.topology.execution_state)
self.assertIsNone(self.topology.dc)

E AttributeError: Topology instance has no attribute 'dc'

../../../../../../../../heron/common/tests/python/topology_unittest.py:28: AttributeError
====================== 1 failed, 8 passed in 0.52 seconds ======================
"""

bazel build is failing to find setuptools

Hi,

I am trying to build heron following the steps documented here: docs/developers/compiling.md. On Ubuntu (15.10) the build process is failing with the following error:

~/workspace/heron$ bazel build --config=ubuntu heron/...
INFO: Found 245 targets...
ERROR: /home/ashvin/workspace/heron/heron/controller/src/python/BUILD:5:1: null failed: _pex failed: error executing command bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex --entry-point heron.controller.src.python.controller bazel-out/local_linux-fastbuild/bin/heron/controller/src/ python/heron-controller.pex ... (remaining 1 argument(s) skipped).
Traceback (most recent call last):
File "/home/ashvin/.cache/bazel/_bazel_ashvin/204570d05ac2208dc12ec96bf54e525f/heron/bazel-out/ local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/_pex.py", line 169, in sys.exit(main())
File "/home/ashvin/.cache/bazel/_bazel_ashvin/204570d05ac2208dc12ec96bf54e525f/heron/bazel-out/ local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/_pex.py", line 162, in main
pex_builder.build(output)
File "/home/ashvin/.cache/bazel/_bazel_ashvin/204570d05ac2208dc12ec96bf54e525f/heron/bazel-out/ local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 414, in build
self.freeze(bytecode_compile=bytecode_compile)
File "/home/ashvin/.cache/bazel/_bazel_ashvin/204570d05ac2208dc12ec96bf54e525f/heron/bazel-out/ local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 398, in freeze
self._prepare_bootstrap()
File "/home/ashvin/.cache/bazel/_bazel_ashvin/204570d05ac2208dc12ec96bf54e525f/heron/bazel-out/ local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 358, in _prepare_bootstrap
raise RuntimeError('Failed to find setuptools while building pex!')
RuntimeError: Failed to find setuptools while building pex!
INFO: Elapsed time: 0.959s, Critical Path: 0.61s

I have installed all the packages listed docker/Dockerfile.ubuntu14.04 including setuptools.
python-setuptools/wily,now 18.4-1 all [installed]

I must be missing something basic. Could you please help me resolve this.

Thanks,
Ashvin

Failed to load properties file

I got the following error:

SEVERE: Failed to load properties file: /home/ajorgensen/cluster/heron.local.working.directory:.conf

When running:

ajorgensen@heron2001:~/heron-cli-0.0.1-TEST$ bin/heron-cli2 submit --config-loader=com.twitter.heron.scheduler.aurora.AuroraConfigLoader --config-path /home/ajorgensen/ "" /home/ajorgensen/deploy/test-topology-CURRENT.jar com.crashlytics.
heron.TestTopology
Deprecation Warning: fatjar will be deprecated soon. Please use tar format ..
Jan 21, 2016 6:07:49 PM com.twitter.heron.api.HeronSubmitter submitTopology
INFO: To deploy a topology in initial state: RUNNING
Launching topology test-topology
Jan 21, 2016 6:07:52 PM com.twitter.heron.scheduler.service.SubmitterMain submitTopology
INFO: Config to override in Submitter:  config.property="" config.path="/home/ajorgensen/" heron.config.loader="com.twitter.heron.scheduler.aurora.AuroraConfigLoader" deactivated="False" filepath="/home/ajorgensen/deploy/test-topology-CURRENT.jar" config.loader="com.twitter.heron.scheduler.aurora.AuroraConfigLoader" classname="com.crashlytics.heron.TestTopology" heron.dir="/home/ajorgensen/heron-cli-0.0.1-TEST" command="submit" heron.unknown.args="[]" heron.verbose="False" config.overrides="" heron.config.path="/home/ajorgensen/"
Jan 21, 2016 6:07:52 PM com.twitter.heron.scheduler.aurora.AuroraConfigLoader applyConfigOverride
SEVERE: Cluster parts must be dc/role/environ (without spaces)
Exception in thread "main" java.lang.RuntimeException: Failed to load config. File: /home/ajorgensen/ Override:  config.property="" config.path="/home/ajorgensen/" heron.config.loader="com.twitter.heron.scheduler.aurora.AuroraConfigLoader" deactivated="False" filepath="/home/ajorgensen/deploy/test-topology-CURRENT.jar" config.loader="com.twitter.heron.scheduler.aurora.AuroraConfigLoader" classname="com.crashlytics.heron.TestTopology" heron.dir="/home/ajorgensen/heron-cli-0.0.1-TEST" command="submit" heron.unknown.args="[]" heron.verbose="False" config.overrides="" heron.config.path="/home/ajorgensen/"
        at com.twitter.heron.scheduler.service.SubmitterMain.submitTopology(SubmitterMain.java:76)
        at com.twitter.heron.scheduler.service.SubmitterMain.main(SubmitterMain.java:59)
User main failed with status 1. Bailing out...

It's unclear to me what config file it failed to load.

Python cli jars specification should be automated

Currently python invocation of java includes hard-coded jars. We need to make this jars as a file that is included as resource or config file. Do not use wild card since it could introduce undesirable bugs.

CLI should check if JAVA_HOME is set

Check if JAVA_HOME is in heron-cli - for sandbox when running in the scheduler - either the java home needs to be set or it will pick up the environment variable JAVA_HOME

Disable openssl since it is not used

Disable openssl during the compilation of libevent. openssl is not used in Heron and furthermore, in El Capitan Mac OS X, it is installed in non-standard path which libevent does not seem to pick it up.

docker build should use a separate scratch pad directory

Heron docker build should use a separate directory for scratch rather than using the source docker directory. This could prevent accidentally checking in of the intermediate files. Example suggestion for scratch pad directory is ~/.heron-compile

Packer installation needs to be added to vagrant init.sh

PackerUploader.java uses packer binary to look upload packages.
heron/scheduler/src/java/com/twitter/heron/scheduler/twitter/PackerUploader.java:45 has the related code.
It should also be added as a dependency somewhere in the compile files.

Default minimum heap size for instance

Currently, the default value is 1 GB, but users can override this value to smaller value. However, we are using code cache size of 64 MB and perm gen size of 128 MB. This means the users cannot specify a size lower than 194 MB. Need to address this.

Config validator

Since there are several configs that are being read - we need to ensure that the basic config is present before starting anything.

Ability to inject IScheduler instance in SchedulerMain

I am developing a custom scheduler for Heron and looking at MesosScheduler as an example. As I understand Launcher launches instance of Scheduler. In my case Scheduler is a service running in its own container to serve topology management requests like deactivate. The instance launched by the Launcher has context information to communicate with the cluster resource manager (RM). Once the Scheduler starts I am invoking SchedulerMain.runScheduler() to start SchedulerServer. I notice that this method is creating a new instance of Scheduler. Is this necessary? If the caller of runScheduler could provide the instance of Scheduler, then the Scheduler can retain some of the context information provided by Launcher.

build error, missing file?

Hey, just got a vagrant image going to build heron this morning bumped into this issue

vagrant@master:/vagrant$ ~/bin/bazel build --config=darwin heron/...
INFO: Found 238 targets...
ERROR: missing input file '//scripts:env_exec.sh'.
ERROR: /vagrant/3rdparty/yaml-cpp/BUILD:11:1: //3rdparty/yaml-cpp:yaml-cpp-srcs: missing input file '//scripts:env_exec.sh'.
ERROR: /vagrant/3rdparty/yaml-cpp/BUILD:11:1 1 input file(s) do not exist.
INFO: Elapsed time: 12.516s, Critical Path: 1.30s.
vagrant@master:/vagrant$ ls -l scripts/
total 16
-rw-r--r-- 1 vagrant vagrant  208 Dec 27 18:25 BUILD
-rwxr-xr-x 1 vagrant vagrant 8799 Dec 27 18:25 errors.sh
vagrant@master:/vagrant$ cat scripts/BUILD |grep env
    name = "env_exec",
    srcs = ["env_exec.sh"],

seems like a missing file? I haven't dug into the tgz yet to see if it was there figure I would ask first in case just need commit or something else?

--config-file is no longer a cli option

It looks like the --config-file option has been removed from the cli (commit: 9e37a8b#diff-b8bd943536eddc058a45964a78def943) (https://github.com/twitter/heron/blame/master/docs/operators/deployment/aurora.md#L52

usage: heron-cli2 submit [--config-path CONFIG_PATH]
                         [--config-loader CONFIG_LOADER]
                         [--config-property CONFIG_PROPERTY] [--deactivated]
                         [--verbose]
                         config-overrides filepath classname

The documentation for this is now out of data. One improvement that I think we should make here is to check whether --config-path is a file or a directory. If it is a directory then do what it's doing now and look for aurora_scheduler.conf in that folder however it is a file then we should just use it. For example if my config file is called heron.aurora instead of aurora_scheduler.com I would like to be able to pass it in directly instead of having to rename the file.

Make local scheduler container 0 fault tolerant

Currently, the container 0 running tmaster, scheduler is not being restarted. We should restart it when there is failure. This could be tricky since it not clear who will be agent that restarts it? Executor will restart the tmaster and scheduler process but the death of executor is not fault tolerant.

SchedulerStateManager should be synchronous

Currently, scheduler state manager is nothing but a delete to IStateManager. This leads to network wait calls added everytime, a scheduler state manager call is invoked - this is not necessary since all the calls are synchronous. A better approach would be convert it into synchronous and returns the appropriate return value.

Build failure on AWS Ubuntu (14.04): common.proto, tmaster.proto File not found.

Using Docker Ubuntu 14.04 install, including manual GNU libtools upgrade to 2.4.6 (from 2.4.2 using apt-get install).

INFO: From ProtocJava heron/proto/proto_metrics_java_src.srcjar:
common.proto: File not found.
tmaster.proto: File not found.

metrics.proto: Import "common.proto" was not found or had errors.
metrics.proto: Import "tmaster.proto" was not found or had errors.
metrics.proto:52:12: "Status" is not defined.
metrics.proto:64:12: "heron.proto.tmaster.TMasterLocation" is not defined.
ERROR: /home/ubuntu/heron/heron/proto/BUILD:96:1: error executing shell command: 'set -e
rm -rf bazel-out/local_linux-fastbuild/genfiles/heron/proto/proto_metrics_java_src.srcjar.srcs
mkdir bazel-out/local_linux-fastbuild/genfiles/heron/proto/proto_metrics_java_src.srcjar.srcs
b...' failed: bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped).
INFO: Elapsed time: 256.834s, Critical Path: 253.01s.

Add the notion of Sink to topologies

While storm does not have the notion of Sink - instead it uses bolt themselves, it might be worthwhile to introduce Sink as special form of bolt. This will decouple the processing logic from storing results and paves the way to write bolts for various databases, file systems, etc (similar to spouts).

Build failure on local Ubuntu (14.04), "Failed to find setuptools while building pex"

PEX may be Anaconda Python related using version 2.7.6, also re-installed Python using Docker Ubuntu 14.04 instructions including manual installation: python-setuptools.

INFO: From PexPython heron/shell/src/python/heron-shell.pex:
Traceback (most recent call last):
File "/home/tony/.cache/bazel/_bazel_tony/91ccb56ec8436235118164d4e388a63e/heron/bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/_pex.py", line 169, in
sys.exit(main())
File "/home/tony/.cache/bazel/_bazel_tony/91ccb56ec8436235118164d4e388a63e/heron/bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/_pex.py", line 162, in main
pex_builder.build(output)
File "/home/tony/.cache/bazel/_bazel_tony/91ccb56ec8436235118164d4e388a63e/heron/bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 414, in build
self.freeze(bytecode_compile=bytecode_compile)
File "/home/tony/.cache/bazel/_bazel_tony/91ccb56ec8436235118164d4e388a63e/heron/bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 398, in freeze
self._prepare_bootstrap()
File "/home/tony/.cache/bazel/_bazel_tony/91ccb56ec8436235118164d4e388a63e/heron/bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex.runfiles/3rdparty/pex/pex/pex_builder.py", line 358, in _prepare_bootstrap
raise RuntimeError('Failed to find setuptools while building pex!')
RuntimeError: Failed to find setuptools while building pex!
ERROR: /home/tony/workspace/heron/heron/shell/src/python/BUILD:17:1: null failed: _pex failed: error executing command bazel-out/local_linux-fastbuild/bin/3rdparty/pex/_pex --entry-point heron.shell.src.python.main bazel-out/local_linux-fastbuild/bin/heron/shell/src/python/heron-shell.pex ... (remaining 1 argument(s) skipped).

Modularize heron executor

Currently heron executor is not modularized - looks like monolithic. Need to separate into multiple files one for each process it starts, parse args using argparse and pick some of the args from config files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.