GithubHelp home page GithubHelp logo

pyleus-1's Introduction

写在前言: 旨在分享,写的很简单
需求:实时分析squid的命中率,根据需求,使用到了flume,storm,kafka 
数据读写,思路:
 	1.先通过squid定义,将log写到rsyslog上 
	2.通过flume syslogtcp type方式,获取数据,sink到kafka 
	3.Python编写topology,借助pyleus模块,spout从kafka读取数据,bolt做一次分析,将数据实时写入到mysql中,写入到hbase中更好



pyleus安装: 
	安装包:git clone https://github.com/Yelp/pyleus.git 
	运行命令:pip2.6 install pyleus (如果不执行,将无法使用pyleus相关命令,前提是安装了pip)

pyleus命令: 
	pyleus build examples/squid_hit_topology/pyleus_topology.yaml 编译topology应用 pyleus local squid_hit_topology.jar 本地运行
	pyleus --verbose build /home/pyleus/examples/squid_hit_topology/pyleus_topology.yaml 查看编译时的详细信息
	pyleus --verbose submit -n 10.4.2.176 squid_hit_topology.jar 查看提交运行时的详细信息,-n 后面的IP是storm 的nimbusIP,可以为主机名,前提是你的/etc/hosts文件有针对其主机名的解析 	
	pyleus kill -n 10.4.2.176 topology_name 停止topology程序的运行
	topology_name名称的定义在:pyleus_topology.yaml name: squid_hit_topology 即topology项目名,详细看代码文件:examples/squid_hit_topology/pyleus_topology.yaml

    在整个程序设计中,kafka的spout是需要注意的,同时代码文件中上传了flume的配置文件
    pyleus_topology.yaml 定义spout和bolt
    name: squid_hit_topology
    workers: 2
    topology:
        - spout:
            name: data_source_kafka
            type: kafka
            options:
                #配置kafka的topic
                topic: squid_log
                #配置zookeeper地址,多个用逗号隔开
                zk_hosts: 10.63.13.138:2181,10.63.13.107:2181,10.63.13.143:2181
                #配置给kafka存储consumer offsets的ZK Root Path
                zk_root: /pyleus/partition
                #Kafka consumer ID
                consumer_id: squid
                #定义需要从某个offset开始吗
                #默认为false
                from_start: false
                binary_data: true
主要注意的是:
	zk_root和consuer_id 在zk的路径是会自动创建的,自动创建的前提是,数据被成功的消费

pyleus-1's People

Contributors

caibird1990 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.