Ansible Scripts to Deploy Zookeeper, Kafka and YARN on an Linux-based env
Command execution order:
- ansible-playbook -i openstack_inventory ansible_kafka.yml
- ansible-playbook -i openstack_inventory ansible_prepare_yarn.yml
- ansible-playbook -i openstack_inventory ansible_yarn.yml
Artefact versions:
- kafka_version: kafka_2.12-3.0.0
- zookeeper_version: zookeeper-3.5.5
- hadoop_version: hadoop-3.3.1
- Java JDK 8
Known issues during the process
- Kafka has some permission issues starting. I have fixed them in the install.yml.
- Major issue is the version of hadoop. According to this: https://www.morling.dev/blog/bytebuffer-and-the-dreaded-nosuchmethoderror/ i have fixed it manually. The
hadoop-yarn-common-3.3.1.jar
needs a rebuild with some changes. - Additional issue related to this thread: https://stackoverflow.com/questions/58300578/how-to-fix-resource-changed-on-src-filesystem-issue/58405426#58405426. Fixed and together with the above change, copy the jar
hadoop-yarn-common-3.3.1.jar
to {soft_link_base_path}/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.3.1.jar. - Also the yarn.properties file is needed to run the samza job.
- Lessons learned:
- Package your app and run everything from the yarn resource manager VM.
- Copy you tar.gz app to all the nodes at the specific path in the yarn.properties file (from yarn.package.path).
yarn.package.path="<your packaged app>-dist.tar.gz"
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.container.count=2
# default system
job.coordinator.system=kafka
job.default.system=kafka
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.consumer.zookeeper.connect=127.0.0.1:2181
systems.kafka.producer.bootstrap.servers=127.0.0.1:9092
systems.kafka.default.stream.replication.factor=1