GithubHelp home page GithubHelp logo

Comments (4)

lofifnc avatar lofifnc commented on August 17, 2024 1

This is an interesting pattern. This issue has come up on github before but I wasn't able to come up with a solution. I think I'm going to put this in some form into the wiki.
Thanks for your writeup.

from flink-spector.

whipit avatar whipit commented on August 17, 2024

This was not an issue with the framework, but rather my understanding (or lack thereof) of the timing of events. I'll explain here with a contrived example so others don't make the same mistake I did.

Basically, lets say I have two streams - a "Number" stream, and a "Multiplier" stream. My CoProcessFunction accepts both these streams as input. It keeps track (via ValueState) both the last Number it has seen as well as the last Multiplier it has seen. Upon receiving a Number N, if it has already seen a multiplier LM (last multiplier), then it produces a new Number with value N * LM. Similarly, upon seeing a multiplier M, if it has already seen a number LN (last number), then it produces a new Number with value LN * M.

For example, given a Number Stream [1,] and a Multiplier Stream [10,20], the output stream should be Numbers [10, 20]....or so I thought!

But what actually happens is that for the Test that "passed", the sequence of events was -

  1. Saw Number 1, stored it in LastNumberSeen. LastMultiplierSeen is NULL, so no output.
  2. Saw Multiplier 10, stored in LastMultiplierSeen. LastNumberSeen is 1 - so output 1*10 = 10
  3. Saw Multiplier 20, stored in LastMultiplierSeen. LastNumberSeen is 1 - so output 1*20 = 20

and the output was [10, 20]

However, the test that failed, the sequence of events was

  1. Saw Multiplier 10, stored in LastMultiplierSeen. LastNumberSeen is NULL - so no output
  2. Saw Multiplier 20, stored in LastMultiplierSeen. LastNumberSeen is NULL - so no output
  3. Saw Number 1, stored it in LastNumberSeen. LastMultiplierSeen is 20, so output 1*20 = 20

and the output was [20]

My mistake was that I had an invalid assumption on the ordering of events between the 2 streams.

We can control the ordering on a single stream (by using EventTimeInputBuilder). But not sure how we can control the ordering on 2 separate streams. In reality we cannot, but for testing it would be good to have this determinism.

If you have any thoughts on that please share.

Thanks, and sorry for the false alarm.

from flink-spector.

whipit avatar whipit commented on August 17, 2024

"But not sure how we can control the ordering on 2 separate streams. "

I was able to guarantee ordering by creating one ordered stream that contains both Number and Multiplier types (using a stream of type DataStream) - using the EventTimeBuilder. I then filtered this stream twice - to get 2 streams one containing only Number and the other only Multiplier. Finally, I map both those streams to produce a DataStream and DataStream.

Here's a code snippet for illustration.

        EventTimeInputBuilder<Object> myCombinedBuilder = EventTimeInputBuilder.startWith(new Object())
                .emit(new Number(1), after(1, TimeUnit.SECONDS))
                .emit(new Multiplier(10), after(1, TimeUnit.SECONDS))
                .emit(new Multiplier(20), after(1, TimeUnit.SECONDS));

        DataStream<Object> combinedStream = this.createTestStream(myCombinedBuilder);

        DataStream<Number> numberOnlyStream = combinedStream.filter(new FilterFunction<Object>() {
            @Override
            public boolean filter(Object value) throws Exception {
                return value instanceof Number;
            }
        }).map(new MapFunction<Object, Number>() {
            @Override
            public Number map(Object value) throws Exception {
                return (Number) value;
            }
        });

        DataStream<Multiplier> multiplierOnlyStream = combinedStream.filter(new FilterFunction<Object>() {
            @Override
            public boolean filter(Object value) throws Exception {
                return value instanceof Multiplier;
            }
        }).map(new MapFunction<Object, Multiplier>() {
            @Override
            public Multiplier map(Object value) throws Exception {
                return (Multiplier)value;
            }
        });

from flink-spector.

whipit avatar whipit commented on August 17, 2024

One oddity you may notice is that I start the EventTimeInputBuilder with a placeholder object that is a super class of both Number and Multiplier types -- in my case I just used Object(). This is to get the builder to produce a stream which can accept both types. It gets filtered out, so a bit wasteful.

EventTimeInputBuilder.startWith(new Object())

from flink-spector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.