GithubHelp home page GithubHelp logo

Comments (9)

amacciola avatar amacciola commented on July 22, 2024

@josevalim with the master branch it fixes the ton of rebalancing errors and Genserver crash errors that are thrown repeatedly when stopping pipelines but also brings along the error mentioned above. I cannot live with the error mentioned. So for now i will have to live with the errors and crashes. A fix for these would be greatly appreciated

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

@amacciola your fixes were not supposed to have changed the semantics of the code but it seems it did. I just pushed another approach to master, can you please try it out?

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim Will do. trying now

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim just tested it out and same issue. It fixes all the crashes and errors from the :drain_after_revoke method.

but now the messages do not ingest properly after starting/stopping/starting

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

I see. It seems that assignments_revoked has to fail so brod cleans it up. So I think the failures will have to stay. We can rewrite the failures to something else, so it is a bit more pleasant on the eyes though.

It may also be that terminate_child is too abrupt and you may be better off using GenServer.stop(NameOfThePipeline) to shut it down. Does it make a difference?

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim makes sense. I will try using the GenServer.stop(NameOfThePipeline) method and see if it makes a difference. otherwise i will deal with the errors for now.

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

@josevalim

Sorry i have been in here so much lately but i am now using the master branch and using GenServer.stop(NameOfThePipeline) instead of using the terminate_child and i do notice less crashing/errors.

However i am still having the same issues. But this time i have dug into it more and i have more details. So the issue only happens if we Stop a Broadway pipeline right after starting it. In low latency envs like my local machine i see no issues. But in our cloud envs with more latency to Kafka we see issues.

I believe when we start a Broadway Pipeline it sends its assignments via brod to the consumer_group for each partition. (pls correct me if i am butchering this at all) but when we shutdown the Pipeline all we are doing basically is just sending a shutdown signal to a Genserver. So i do not believe it is finishing the process of starting the Broadway Pipeline and the subscribing of the assignments for the consuner_group. The next time i start the Broadway Pipeline, since i am using the same consumer_group_id it just rejoins the consumer_group instead of re configuring it, as it should but now it will never fetch data from a handful of partitions

Is there a proper way to stop a Broadway Pipeline ? something like a Broadway.stop ? or BroadwayKafka.stop ? I am just trying to figure out if this is something that we need to solve for our in our applications. With fetching metadata to ensure a Pipeline has been started successfully before we can do a shutdown or if a graceful shutdown would just wait for it to be successfully started

from broadway_kafka.

josevalim avatar josevalim commented on July 22, 2024

The stopping a Broadway pipeline is not a scenario that we tested because there is no public API for it. If we implemented it, it would be implemented with GenServer.stop, and I would assume it would work in most cases, but Kafka is extremely complicated so we would need to run a bunch of manual tests.

You are welcome to try a pull request that adds this functionality or at least add some integration tests that reproduces the failures you are seeing so we can try to investigate them further.

from broadway_kafka.

amacciola avatar amacciola commented on July 22, 2024

In our scenario users are able to start, pause, start ingestion from a UI which stops the pipelines on the backend. And they can do these actions right after one another if they chose. How i found a fix for this is just before i stop a pipeline i check if the :brod.fetch_committed_offsets() total amount of partitions with offsets committed matched how many partitions the topic has. If there were 0 offsets committed i just reset the consumer_group_id that way it will use a new ID the next time the pipeline starts.

If there are only a subset of the total partitions with offsets committed then something went wrong (Pipeline was shutdown too quick after starting), and i clear any data from our system that may have been ingested and then reset the consumer_group_id.

from broadway_kafka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.