kafka-ops / julie Goto Github PK

View Code? Open in Web Editor NEW

417.0 15.0 113.0 2.75 MB

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!

License: MIT License

Java 94.07% Shell 5.37% Dockerfile 0.56%

kafka topics configuration acls ci-cd gitops-toolkit kafka-cluster gitops

julie's Introduction

An operational manager for Apache Kafka (Automation, GitOps, SelfService)

Note - Governance - Hibernation state

I'm gratefuly of how many people the JulieOps project has helped during it existance, it is totally mind blowing to get more than 300 starts for a humble human like me, thanks everyone!!.

Sadly this days, between my workload and personal arrangements, the project has been lacking proper mantainance and care, what honestly makes me very sad as I would love to see it grow and provide more and more people with such features, I'm a big beliver of self service and automation.

So, until new notice, or something change, you should take the project with care, as currently it is mostly on a long winter hibernation :-) I'm sorry for this, but I can't do more as a mostly sole mantainer.

Thanks again to everyone who was, is or will be involved with the project life.

-- Pere

README

NOTE: This project was formally known as Kafka Topology Builder, old versions of this project can still be found under that name.

JulieOps helps you automate the management of your things within Apache Kafka, from Topics, Configuration to Metadata but as well Access Control, Schemas. More items are plan, check here for details.

The motivation

One of the typical questions while building an Apache Kafka infrastructure is how to handle topics, configurations and the required permissions to use them (Access Control List).

The JulieOps cli, in close collaboration with git and Jenkins (CI/CD) is here to help you setup an organised and automated way of managing your Kafka Cluster.

Where's the docs?

We recommend taking time to read the docs. There's quite a bit of detailed information about GitOps, Apache Kafka and how this project can help you automate the common operational tasks.

Automating Management with CI/CD and GitOps

You might be wondering what is the usual workflow to implement this approach:

Action: As a user, part of a developer team (for example), I like to have some changes in Apache Kafka.

Change Request: As a user:

Go to the git repository where the topology is described
Create a new branch
Perform the changes need
Make a pull request targeting master branch

Approval process: As an ops admin, I can:

Review the pull request (change request) initiated by teams
Request changes when need
Merge the requests.

Considerations:

Using webhooks, the git server (github, gitlab or bitbucket) will inform the CI/CD system changes had happened and the need to apply them to the cluster.
All changes (git push) to master branch are disabled directly. Changes only can happen with a pull request. Providing a Change Management mechanism to fit into your org procedures.

Help??

If you are using the JulieOps tool, or plan to use it in your project? might be you have encounter a bug? or a challenge? need a certain future? feel free to reach out into our gitter community.

Feature list, not only bugs ;-)

What can you achieve with this tool:

Support for multiple access control mechanisms:
- Traditional ACLs
- Role Bases Access Control as provided by Confluent
Automatically set access control rules for:
- Kafka Consumers
- Kafka Producers
- Kafka Connect
- Kafka Streams applications ( microservices )
- KSQL applications
- Schema Registry instances
- Confluent Control Center
- KSQL server instances
Manage topic naming with a topic name convention
- Including the definition of projects, teams, datatypes and for sure the topic name
- Some of the topics are flexible defined by user requirements
Allow for creation, delete and update of:
- topics, following the topic naming convention
- Topic configuration, variables like retention, segment size, etc
- Acls, or RBAC rules
- Service Accounts (Experimental feature only available for now in Confluent Cloud)
Manage your cluster schemas.
- Support for Confluent Schema Registry

Out of the box support for Confluent Cloud and other clouds that enable you to use the AdminClient API.

How can I run JulieOps directly?

This tool is available in multiple formats:

As a Docker image, available from docker hub
As an RPM package, for the RedHat alike distributions
As a DEB package, for Debian based distros
Directly as a fat jar (zip/tar.gz)
As a fat jar.

The latest version are available from the releases page.

How to execute the tool

This is how you can run the tool directly as a docker image:

docker run purbon/kafka-topology-builder:latest julie-ops-cli.sh  --help
Parsing failed cause of Missing required options: topology, brokers, clientConfig
usage: cli
    --brokers <arg>                  The Apache Kafka server(s) to connect
                                     to.
    --clientConfig <arg>             The client configuration file.
    --dryRun                         Print the execution plan without
                                     altering anything.
    --help                           Prints usage information.
    --overridingClientConfig <arg>   The overriding AdminClient
                                     configuration file.
    --plans <arg>                    File describing the predefined plans
    --quiet                          Print minimum status update
    --topology <arg>                 Topology config file.
    --validate                       Only run configured validations in
                                     your topology
    --version                        Prints useful version information.

If you install the tool as rpm, you will have available in your $PATH the julie-ops-cli.sh. You can run this script with the same options observed earlier, however you will need to be using, or be in the group, for the user julie-kafka.

An example topology

An example topology should look like this (in yaml format):

context: "context"
source: "source"
projects:
- name: "foo"
  consumers:
  - principal: "User:app0"
  - principal: "User:app1"
  streams:
  - principal: "User:App0"
    topics:
      read:
      - "topicA"
      - "topicB"
      write:
      - "topicC"
      - "topicD"
  connectors:
  - principal: "User:Connect1"
    topics:
      read:
      - "topicA"
      - "topicB"
  - principal: "User:Connect2"
    topics:
      write:
      - "topicC"
      - "topicD"
  topics:
  - name: "foo" # topicName: context.source.foo.foo
    config:
      replication.factor: "2"
      num.partitions: "3"
  - name: "bar" # topicName: context.source.foo.bar
    config:
      replication.factor: "2"
      num.partitions: "3"
- name: "bar"
  topics:
  - name: "bar" # topicName: context.source.bar.bar
    config:
      replication.factor: "2"
      num.partitions: "3"

more examples can be found at the example/ directory.

Also, please check, the documentation in the docs for extra information and examples on managing ACLs, RBAC, Principales, Schemas and many others.

Troubleshooting guides

If you're having problems with JulieOps I would recommend lookup up two main sources of information:

The project issues tracker. Highly possible others might have had your problem before.
Our always work in progress troubleshooting guide

Interested in contributing back?

Interested on contributing back? might be have an idea for a great future? or wanna fix a bug? Check our contributing doc for guidance.

Building JulieOps from scratch (source code)

The project is build using Java and Maven, so both are required if you aim to build the tool from scratch. The minimum version of Java supported is Java 8, note it soon will be deprecated here, it is only keep as supported for very legacy environments.

It is recommended to run JulieOps with Java 11 and an open JDK version.

Building a release

If you are interested on building a release artifact from the source code, check our release doc for guidance.

Nightly builds as well as release builds are regularly available from the Actions in this project.

Nightly release build are available as well from here.

julie's People

Contributors

Stargazers

Watchers

Forkers

liko9 michaelhussey redaous asedovski sadpdtchr shmoli gitter-badger atix-ag hashimati eric-asuncion kalinga etacassiopeia nerdynick sverrehu piotrsmolinski christophschubert saberza kalyanaraopilli akselh michaelpearce-gain osodevops michaelandrepearce cityindex jeqo maestre3d venkyraghav khaes-kth vanoord alessio-santacroce marcelamsler fujohnwang sknop schnaker85 hdulay leonardobonacci learn3184 sonlinux msdias magnussmith oorobfuoo lsolovey schocco willbdaniels mharnold13 marnold-twilio wdaniels-twilio solita-juusoma seunsmooth leosilvadev danielpetisme ludovic-boutros yremmet jaywojick mvanbrummen ksilin abraham-leal niyiodumosu ogomezso danielmabbett omar-alqashlan yoniv87 minyibii jwiederholdconfluent aminebenami cedillomarcos khesoem jplaroche-telus angoothachap davidnavalho archi-code bjaggi grampurohitp sduff danielprinz glennf flasheras priyam-stash rizkyramadhanch dabafinance avinashsi balakumarbalasundaram vvamzy vijayvungarala jmadih manoharanjob cenkusands metasync rafhuys-klarrio eazamaau jniebuhr nachomdo systematicainvestments vairamuthu-shanmugaraj miller45 alexandrchikur joerajeev vict0rw0ng fobhep dwimsey baran121

julie's Issues

Option to store ACL status outside of filesystem

Currently the ACL dump is stored in the local filesystem, however as a user of the Kafka Topology Builder in a k8s environment, i might be interested on having the state on a 3-party storage like a k-v store.

Restructure header of descriptor file

Currently, the Topology Builder interprets every top-level field in the YAML between the context and projects as an additional naming component. IMHO this leads to two issues:

we cannot add additional meta-data fields to the 'header' of a topology YAML, at least not between context and projects.
This goes against the YAML spec (https://yaml.org/spec/1.2/spec.html), see quote below. Which in turn means we have to rely on hand-written parsers.

Spec Construction of native data structures from the serial interface should not use key order or anchor names for the preservation of application data.

I propose to change the header definition to either

allow a list of entries for the config, or
use a sub-object together with an explicit formatting string.

Example for 1:

context:
- contextValue
- subContexValue1
- subContextValue1

Example for 2:

context:
  main: mainValue
  sub1: sub1Value
  sub2: sub2Value
topicNameFormat:
  {main}.{sub1}.{sub2}.{project.name}.{topic.name}

Add web ui

As a user of the kafka topology builder, I would like to have a simple web ui where users and team can manage their own setup.

Should build an RPM package as deliverable for install in Redhat systems

Schema Registry client does not support security configs

When the Schema Registry client is built here https://github.com/purbon/kafka-topology-builder/blob/e97fc7390cd7e60bfd0e481b1916bf71815f1e67/src/main/java/com/purbon/kafka/topology/KafkaTopologyBuilder.java#L73, it uses no config constructor. This means we can not pass security parameters.

Can we pass parameters from the tool config (maybe prefixed?) here? and use this constructor:

https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/CachedSchemaRegistryClient.java#L106

Platform RBAC limitations

As a (ops)user I would love to be able to do something like this:

---
platform:
  kafka:
    ClusterAdmin:
      - principal: "User:Hans"
    SecurityAdmin:
      - principal: "User:Fritz"
  schema_registry:
    ClusterAdmin:
      - principal: "User:Hans"
    SecurityAdmin:
      - principal: "User:Fritz"

Above being a complete yaml file.
However I think there are currently two things "forbiding" this.

One cannot create a descriptor containing "only" platform descriptions
There is currently no detailed role-assignment for platform components.
Using the deployment like in the example file will assign ClusterAdmins, only?

platform:
  schema_registry:
    - principal: "User:SchemaRegistry"

Error with connectors block

Deploying like this will cause an error:

---
team: "planetexpress"
source: "source"
projects:
  - name: "natas"
    consumers:
      - principal: "User:Bender"
      - principal: "User:Fry"
      - principal: "User:Lila"
    producers:
      - principal: "User:Fry"
      - principal: "User:Lila"
    connectors:
      - principal: "User:Connect1"
        topics:
          read:
            - "topicA"
            - "topicB"
      - principal: "User:Connect2"
        topics:
          write:
            - "topicC"
            - "topicD"
    topics:
      - name: "foo"
        config:
          replication.factor: "1"
          num.partitions: "1"
      - dataType: "avro"
        name: "bar"
        config:
          replication.factor: "1"
          num.partitions: "1"
    rbac:
      - ResourceOwner:
        - principal: "User:Professor"
      - DeveloperManage:
        - principal: "User:Zoidberg"

However removing the connectors block will work.

kafka-topology-builder.sh --brokers localhost:9093 --clientConfig topology.properties --topology topology_docs.yaml 
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.admin.AdminClientConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
        at com.purbon.kafka.topology.api.mds.MDSApiClient.getClusterIds(MDSApiClient.java:202)
        at com.purbon.kafka.topology.roles.AdminRoleRunner.forKafkaConnect(AdminRoleRunner.java:54)
        at com.purbon.kafka.topology.roles.RBACProvider.setAclsForConnect(RBACProvider.java:61)
        at com.purbon.kafka.topology.AccessControlManager.syncApplicationAcls(AccessControlManager.java:157)
        at com.purbon.kafka.topology.AccessControlManager.lambda$sync$2(AccessControlManager.java:107)
        at java.util.ArrayList.forEach(ArrayList.java:1257)
        at com.purbon.kafka.topology.AccessControlManager.lambda$sync$3(AccessControlManager.java:105)
        at java.util.ArrayList.forEach(ArrayList.java:1257)
        at com.purbon.kafka.topology.AccessControlManager.sync(AccessControlManager.java:74)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.run(KafkaTopologyBuilder.java:87)
        at com.purbon.kafka.topology.BuilderCLI.processTopology(BuilderCLI.java:154)
        at com.purbon.kafka.topology.BuilderCLI.main(BuilderCLI.java:118)

Team should be renamed as context for better understanding

Team is confusing, should be renamed to context.

Add support for Confluent RBAC

As a user of the Topology Builder I would like to be able to leverage the new RBAC functionality provided by Confluent. See https://docs.confluent.io/current/security/rbac/index.html

This future can be leverage by using the MDS Api, see https://docs.confluent.io/current/security/rbac/mds-api.html or the confluent CLI. There is a preference for the API driven approach.

RBAC provider should support acls status sync

Currently the RBAC provider does not support an async a ROLES async cleanup, leaving only to map in the current defined roles, but no clean up is done automatically.

For this to happen, we need to implement:

Return after every call to an RBAC operations module, their state.
The clear acls function, including usage of the delete function in MDS, see https://docs.confluent.io/current/security/rbac/mds-api.html#delete--principals-principal-roles-roleName

Update the example yaml files to follow the new formats

Current internal tool format has removed the key users, this made the examples to break. It needs to be updated.

Have an ansible role, that wrap the functionality

As a devops team, it could be of lot of valuable to have an ansible role that run the tool automatically.

one can directly call the sh script available when installing the rpm page like this.

docker run purbon/kafka-topology-builder:latest kafka-topology-builder.sh  --help
Parsing failed cause of Missing required options: topology, brokers, clientConfig
usage: cli
    --allowDelete          Permits delete operations for topics and
                           configs.
    --brokers <arg>        The Apache Kafka server(s) to connect to.
    --clientConfig <arg>   The AdminClient configuration file.
    --help                 Prints usage information.
    --quite                Print minimum status update
    --topology <arg>       Topology config file.

this could later be as well integrated into cp-ansible.

Issue in example.yaml

Imho setting a SecurityAdmin on project-layer does not make sense:

https://github.com/purbon/kafka-topology-builder/blob/e0bf9a5f852a1879c1f2e68929b3d3fd5df673ae/example/descriptor-with-rbac.yaml#L32

Should build an DEB package as deliverable for install in Debian based systems

Release the 1.0 artifacts

docker
rpm
deb
tar.gz

Generate yaml of existing topic.

Hi,
it will be nice to have options that Kafka Topology Builder can generate yaml of existing topic/acl in Kafka Cluster. It will be useful with migration from imperative create topic to declarative one like this project. In case you forget about same topic it could remove data.

Support Confluent Cloud CLI

As a user of the topology builder, I would like to have a more complete ccloud mode, adding support for ccloud CLI.

NOTE: Currently Confluent Cloud can be used through the benefits of AdminClient API, users can manage topics and acls, for example.

Add a "dry run" parameter

Many cli tools have a "dry run" option (sometimes -n) which does everything except executing the modifications. This should be a very small change, but useful for the paranoid admin.

Topologies from different teams

The topology builder won't allow topologie files for different teams when using the "dir" option, ie specifying a directory path with the --topology flag.
Being able to use that however- would be imho highly useful.

Exception in thread "main" java.io.IOException: Topologies from different teams are not allowed
        at com.purbon.kafka.topology.KafkaTopologyBuilder.buildTopology(KafkaTopologyBuilder.java:101)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.run(KafkaTopologyBuilder.java:78)
        at com.purbon.kafka.topology.BuilderCLI.processTopology(BuilderCLI.java:154)
        at com.purbon.kafka.topology.BuilderCLI.main(BuilderCLI.java:118)

EDIT: When applying single topolgy files for different teams it won't work neither - only the first file will be "executed"

Reduce number of ACLs created by moving to prefixed ACLs

Given the limitations on the total number of ACLs in Confluent Cloud
(currently 1,000 for basic and standard, and 10,000 for dedicated cluster)
and when use centralized ACLs (currently 1,000 per cluster), we should strive to optimize the
total number of ACLs being created.

Currently, we create one literal ACL for each consumer/producer and each topic in a project for the necessary operations (READ/WRITE and DESCRIBE).
This could be replaced with a single prefixed ACL for each consumer/producer and operation combination.
READ entries for consumer groups be prefixed instead of providing wildcard access by default.

Support of RBAC on Schemas

Today kafka-topology-builder supports RBAC and knows how to configure Schema Registry itself, but doesn't know how to grant permissions to user XXX to schemas XYZ (ie. resource type is a schema.

It would be great to support RBAC on Schemas : https://docs.confluent.io/current/schema-registry/security/rbac-schema-registry.html#log-on-to-confluent-cli-and-grant-access-to-sr-users.

Kafka Topology Builder 1.0-RC1

The first release candidate for 1.0 has been cut today, the process would look like:

At least 3 release candidates to iron out as much test as possible. If they are all not need at the end, 1.0.0 might come earlier, but not necessary. If more are need there will be.
Have a voting and betting process with the help of the community. Help is very welcome help testing the functionality meet the bare minimum expectations people using gitiops for Apache Kafka looks like. Your help is very welcome.

more details about the process can be found in our wiki -> https://github.com/purbon/kafka-topology-builder/wiki/1.0-RC-tests.

this issue has been created to track the questions and votes related to the 1.0.0 release.

NOTE: If you find bugs, please open them as separate issues in this project.

Thanks a lot for your help,

Delete ACLs

What is the proposed way to remove ACLs using the Topology Builder?

Add datatype as option value in the parser

The datatype currently needs the value unknown to flag is not applicable, the parser should parse values with and without, so data type should be optional.

allowDelete will delete internal topics

Hi - when running the topology-builder with allowDelete the cli tools seems stuck for quite a while.
Looking at the output of the topics afterwards seems to show, that actually ALL topics, unless described in the descriptor are gone, including eg control-center topics.
That explains also why the control-center crashes afterwards.

I propose the allowDelete should ONLY touch topics that are not internal.
Or maybe only work in the project-space(s) the descriptor is containing.

Also allowDelete seems not to change Role assignments, is that correct?

Use cluster defaults when no number of partitions and replication factor is given in topology

When no replication factor and number of partitions are specified, the cluster defaults (broker configs num.partitions and replication.factor should be used.

This would be especially beneficial for Confluent Cloud use cases (only possible replication factor is 3).

Alternatively, a configuration exception should be used.

Current behavior in this case: use RF=2 and num.partitions=3 as fallback (hard-coded).

Preliminary implementation: https://github.com/christophschubert/kafka-topology-builder/tree/add-cluster-defaults

Schemas management failed when schemas are outside execution context

When topology-builder runs in docker context, schema management feature failed if schemas are outside of the docker execution context.

java -Dlog4j.configuration=file:/var/lib/jenkins/workspace/myProject/tmp/log4j.properties -cp /var/lib/jenkins/workspace/myProject/dev/topology/configuration.yaml /var/lib/jenkins/workspace/myProject/dev/topology/test-value.avsc -jar /usr/local/kafka-topology-builder/bin/kafka-topology-builder.jar --clientConfig /var/lib/jenkins/workspace/myProject/dev/kafka-admin.properties --topology /var/lib/jenkins/workspace/myProject/dev/topology/configuration.yaml --brokers kafka-dev.mydomain.com:9094
Error: Could not find or load main class .var.lib.jenkins.workspace.myProject.dev.topology.test-value.avsc

Let me know if you want more information

Extend the number of top level attributes in the descriptor file

Currently the descriptor file can only select team and source, however many teams might like to customise this structure much deeper.

As a user of the topology tool I would like to have a variable level of top level attributes.

They should compose first to last, as of today.

Add support for handling more than 1 topology per call

Currently the Kafka Topology Builder receive as a parameter a single, however when the file growth it should be interesting and possible to receive a directory and that internally the tool handled the wrapping of all content into a single description.

QQ: - How to manage project structure/directories meanings, reading multiple files in order or without order? etc.

Removal of acls is not working

I've found a bug related to removal of acls after a change in the topology, e.g. removal of a consumer on a topic will not remove the acls for that user. As I understand the code I cannot see that removal of acls is working for any case actually. The problem is originally in AccessControlManager:

public void clearAcls() {
try {
clusterState.load();
if (allowDelete) {
plan.add(new ClearAcls(controlProvider, clusterState));
}
} catch (Exception e) {
LOGGER.error(e);
} finally {
if (allowDelete && !dryRun) {
clusterState.reset();
}
}
}

ClusterState is given to the ClearAcls action, but in the finally block the clusterState is reset and all bindings that ClearAcls should later remove are cleared. So when the ClearAcls action is run there are no bindings to remove acls for.

It was initially a bit confusing that there was a test in AccessControlManagerIT named testAclsCleanup that seemed to verify the acl removal feature. But this test is actually broken because when executing accessControlManager.apply() in the test there are actually three actions that are executed in this order:

ClearAcls
SetAclsForConsumer
ClearAcls

So when the last ClearAcls action is run then the clusterState is again populated with the 3 bindings from SetAclsForConsumer...so they will be removed.

I already have a fork with a fix for this bug and I'm happy to contribute a PR for the fix.

Check Kafka Cluster ID

If you create a properties file containing valid Settings to connect to a cluster as admin client, however INCORRECT values for the MDS part, you won't get an error.

example:

This is my correct kafka cluster id:

 kafka-cluster-id: abcd

This is the topologybuilder.properties file:

[...]
topology.builder.mds.kafka.cluster.id=efgh # --> incorrect

--> Topics will be creates without a problem, however there is no error message whatsoever that ACLs/Roleassignments could not be done.

Increase partition number

do we want this as a feature? kafka-topics can do that ...

Make internal topic and group.id configurable for RBACProvider

RBACProvider implementation currently assumes that internal topics and group id for connect (https://github.com/purbon/kafka-topology-builder/blob/master/src/main/java/com/purbon/kafka/topology/roles/RBACProvider.java#L82-L87) and schema registry (https://github.com/purbon/kafka-topology-builder/blob/master/src/main/java/com/purbon/kafka/topology/roles/RBACProvider.java#L171-L174) are the default one.

It would be great to make this configurable.
This is particularly true for connect, as we can have multiple connect cluster and this requires using different group.id + different topics.

[Ideas for the future] would not be amazing to have a service broker for kafka

Explore the idea of integrating the kafka-topology-builder in a service broker approach.

Ref: https://www.openservicebrokerapi.org/

Configuration from environment variables

It would be good for the tool to support reading configuration from environment variables.

This could be done by exposing command-line options for each of the sensitive configuration options like sasl.jaas.config or by having a convention where environment variables with a certain prefix a converted to client properties.

Improve the ACL deletion process

Hi,
as a user of kafka topology builder I would like that only the ACLs that are required to be deleted are actually done.

Improve documentation

The documentation could be polished by

amending the example.yaml with explanation what single block will actually do
explaining the "allowDelete" function in more detail
explaining the possibility to use multiple topology files at once
explaining the possibility to use a dirctory instead of single topology files
explaining the platform rbac feature and stating the limitations

Managing Existing Cluster [Users, Topics, ACLs are preset] with kafka-topology-builder

Hallo, @purbon i was present at your recent talk in Linuxstamtisch. And excited to tryout kafka-topology-builder.

But I am bit afraid! I am afraid of damaging our existing Setup. We have a running Kafka cluster with some Users, Topics, and ACLs.

I fear if i try with a test.yml with some test data, then it may overwrite all other previous data, that is what i meant by GitOps and 'Single Point of Truth'.

Is there a way, i can retrive the Old Settings as a base.yml file and test further by adding something to it.

Please feel free to ask incase I am not clear with my intention.

Warm regards,
Kalinga Ray

Cannot deploy topologies - no proper log

Hi - I am using the topology-manager to deploy a basic set of topologies:

---
team: "team"
source: "source"
projects:
  - name: "bar"
    zookeepers: []
    consumers: []
    streams: []
    connectors: []
    topics:
      - dataType: "json"
        name: "events"
        config:
          replication.factor: "1"
          num.partitions: "1"
      - dataType: "avro"
        name: "events"
        config:
          replication.factor: "1"
          num.partitions: "1"

I am using the following config:

bootstrap.servers=localhost:9093
sasl.mechanism=PLAIN
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
   username="admin" \
   password="admin-secret";

I can succesfully use the exact same file-Config in a kafka-topics command:

kafka-topics --command-config topology.properties --bootstrap-server localhost:9093 --list

However the topology-manager won't work and I get the following log output:

# kafka-topology-builder.sh --broker localhost:9093 --clientConfig topology.properties --topology topology.yaml 

log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.admin.AdminClientConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
        at com.purbon.kafka.topology.serdes.JsonSerdesUtils.parseApplicationUser(JsonSerdesUtils.java:17)
        at com.purbon.kafka.topology.serdes.ProjectCustomDeserializer.deserialize(ProjectCustomDeserializer.java:58)
        at com.purbon.kafka.topology.serdes.ProjectCustomDeserializer.deserialize(ProjectCustomDeserializer.java:21)
        at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4189)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2476)
        at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:2929)
        at com.purbon.kafka.topology.serdes.JsonSerdesUtils.parseApplicationUser(JsonSerdesUtils.java:19)
        at com.purbon.kafka.topology.serdes.JsonSerdesUtils.addProject2Topology(JsonSerdesUtils.java:28)
        at com.purbon.kafka.topology.serdes.TopologyCustomDeserializer.deserialize(TopologyCustomDeserializer.java:44)
        at com.purbon.kafka.topology.serdes.TopologyCustomDeserializer.deserialize(TopologyCustomDeserializer.java:18)
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4218)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3079)
        at com.purbon.kafka.topology.serdes.TopologySerdes.deserialise(TopologySerdes.java:30)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.parseListOfTopologies(KafkaTopologyBuilder.java:126)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.buildTopology(KafkaTopologyBuilder.java:95)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.run(KafkaTopologyBuilder.java:77)
        at com.purbon.kafka.topology.BuilderCLI.processTopology(BuilderCLI.java:153)
        at com.purbon.kafka.topology.BuilderCLI.main(BuilderCLI.java:118)

Are there any means to get a proper log - or is there any other error in my usage?

Sign RPMs

Would it be possible to sign the RPMs that are built from the release pipeline?

Add Schema management support

The current proposal is to add schemas support for the project.

    topics:
      - name: "foo"
        config:
          replication.factor: "1"
          num.partitions: "1"
      - name: "bar"
        dataType: "avro" # TODO What for is this field?
        schemas:
          key.schema.string: '{\"type\": \"string\"}'
          key.schema.type: "AVRO"
          # TODO key.schema.file
          value.schema.string: '{\"type\": \"string\"}'
          value.schema.type: "AVRO"
          # TODO value.schema.file
#...

so there are 2 alternative ways to define schemas: string or file.

After CP 5.5 schema type is an essential field as we need to support JSON, Protopuf & AVRO out of the box.

Perhaps, add support for custom schema types later on.

A topology builder tool should be able to add SASL/SCRAM users

As a user of the topology build tool I want the option to add SASL/SCRAM users.

ClassNotFoundException: com.purbon.topology.roles.RBACProvider

Hi I am getting the following error:

kafka-topology-builder.sh --broker localhost:9093 --clientConfig topology.properties --topology topology.yaml 
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.admin.AdminClientConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.io.IOException: java.lang.ClassNotFoundException: com.purbon.topology.roles.RBACProvider
        at com.purbon.kafka.topology.KafkaTopologyBuilder.buildAccessControlProvider(KafkaTopologyBuilder.java:208)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.run(KafkaTopologyBuilder.java:78)
        at com.purbon.kafka.topology.BuilderCLI.processTopology(BuilderCLI.java:153)
        at com.purbon.kafka.topology.BuilderCLI.main(BuilderCLI.java:118)
Caused by: java.lang.ClassNotFoundException: com.purbon.topology.roles.RBACProvider
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at com.purbon.kafka.topology.KafkaTopologyBuilder.buildAccessControlProvider(KafkaTopologyBuilder.java:177)

topology.yaml

team: "team"
source: "source"
projects:
  - name: "bar"
    zookeepers: []
    consumers:
      - principal: "User:Blub"
    producers: []
    streams: []
    connectors: []
    topics:
      - dataType: "json"
        name: "events"
        config:
          replication.factor: "1"
          num.partitions: "1"
    rbac:
      - ResourceOwner:
          - principal: "User:Foo"

topology.properties

bootstrap.servers=localhost:9093
sasl.mechanism=PLAIN
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
   username="admin" \
   password="admin-secret";

topology.builder.access.control.class="com.purbon.topology.roles.RBACProvider"
topology.builder.mds.server="http://localhost:8090"
topology.builder.mds.user="alice"
topology.builder.mds.password="alice-secret"
topology.builder.mds.kafka.cluster.id="UtBZ3rTSRtypmmkAL1HbHw"

When leaving the RBAC-related out of topology properties and yaml file the deployment works like a charm.
Method of deploying was copying descriptor.yaml and topologies file into a docker container running your latest build from dockerhub.

Should topologies with only topics be supported?

Currently it is possible to use a topology with only topics. While this might be of interest for setting up environments, it is clearly not a good practise for production like environments.

example topology:

---
team: "team"
source: "source"
projects:
  - name: "foo"
    topics:
      - name: "foo"
        config:
          replication.factor: "1"
          num.partitions: "1"
      - dataType: "avro"
        name: "bar"
        config:
          replication.factor: "1"
          num.partitions: "1"
  - name: "bar"
    topics:
      - dataType: "avro"
        name: "bar"
        config:
          replication.factor: "1"
          num.partitions: "1"

questions:

should that even be supported?
should this be supported, but raise a HUGE warning to notice the problem?

thoughts?

Support of RBAC on Connectors

Today kafka-topology-builder supports RBAC and knows how to configure Kafka Connect itself, but doesn't know how to grant permissions to user XXX on connector XYZ (ie. resource type is a connector).
It would be great to support RBAC on connectors : https://docs.confluent.io/current/connect/rbac/connect-rbac-example.html.

Allow customized topic name format

Add a topicNameFormat configuration element with tokens that pull in environment variables, and also ones that correspond to the config elements. That way, if people wanted to tweak the default conventions, it's configurable.
e.g.
topicNameFormat={env.CP_ENVIRONMENT_NAME}-{team}-{source}-{project}-{topic}

Debian package name

Debian package (.deb) is not installable due to a non accepted pacakage name.

Seems that you placed the package description in the package name value, spaces are not accepted.

Adding validation rules

Hi,

As an OPS I may want to add some restrictions on the topology itself like :

a maximum number of partition
maximum retention
enforce the group.id naming with regexp or things like that
enforcing that a public topic (based on naming convention) have a schema

Even if most of it can be done via some scripting, it would be really awesome to get this built-in in the tool.
Maybe adding a --validator option through the CLI or something like that.
If the validation rules are OK => perform the action and return 0 error code.
If the validation rules are KO => print the error and stop before doing any actions.

This may help people in their CI/CD integration.

Ideally, the validation should be a public interface so it can be customized outside of the common validation rules that we see.

WDYT about that?

Cheers,
Jean-Louis

Support quotas for clients

Support quotas, as references from:

https://docs.confluent.io/current/kafka/post-deployment.html#enforcing-client-quotas

Use a proper configuration library

Currently the configuration is managed with property files, but it would be better to use a library like:

as they provide a more dynamic and versatile configuration for applications.