nycleaner / pyleus-1 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from caibird1990/pyleus
python pyleus kafka spout
This project forked from caibird1990/pyleus
python pyleus kafka spout
写在前言: 旨在分享,写的很简单 需求:实时分析squid的命中率,根据需求,使用到了flume,storm,kafka 数据读写,思路: 1.先通过squid定义,将log写到rsyslog上 2.通过flume syslogtcp type方式,获取数据,sink到kafka 3.Python编写topology,借助pyleus模块,spout从kafka读取数据,bolt做一次分析,将数据实时写入到mysql中,写入到hbase中更好 pyleus安装: 安装包:git clone https://github.com/Yelp/pyleus.git 运行命令:pip2.6 install pyleus (如果不执行,将无法使用pyleus相关命令,前提是安装了pip) pyleus命令: pyleus build examples/squid_hit_topology/pyleus_topology.yaml 编译topology应用 pyleus local squid_hit_topology.jar 本地运行 pyleus --verbose build /home/pyleus/examples/squid_hit_topology/pyleus_topology.yaml 查看编译时的详细信息 pyleus --verbose submit -n 10.4.2.176 squid_hit_topology.jar 查看提交运行时的详细信息,-n 后面的IP是storm 的nimbusIP,可以为主机名,前提是你的/etc/hosts文件有针对其主机名的解析 pyleus kill -n 10.4.2.176 topology_name 停止topology程序的运行 topology_name名称的定义在:pyleus_topology.yaml name: squid_hit_topology 即topology项目名,详细看代码文件:examples/squid_hit_topology/pyleus_topology.yaml 在整个程序设计中,kafka的spout是需要注意的,同时代码文件中上传了flume的配置文件 pyleus_topology.yaml 定义spout和bolt name: squid_hit_topology workers: 2 topology: - spout: name: data_source_kafka type: kafka options: #配置kafka的topic topic: squid_log #配置zookeeper地址,多个用逗号隔开 zk_hosts: 10.63.13.138:2181,10.63.13.107:2181,10.63.13.143:2181 #配置给kafka存储consumer offsets的ZK Root Path zk_root: /pyleus/partition #Kafka consumer ID consumer_id: squid #定义需要从某个offset开始吗 #默认为false from_start: false binary_data: true 主要注意的是: zk_root和consuer_id 在zk的路径是会自动创建的,自动创建的前提是,数据被成功的消费
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.