GithubHelp home page GithubHelp logo

Realtime Join about trill HOT 32 CLOSED

microsoft avatar microsoft commented on August 25, 2024
Realtime Join

from trill.

Comments (32)

afriedma avatar afriedma commented on August 25, 2024

Greetings @cybertyche, any chance you will be able to provide feedback soon? Thanks in advance, @afriedma

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

My sincere apologies for the delay - I had the unfortunate luck of catching the flu this week.

The behavior you describe is currently by design, but that doesn't mean it can't change. The situation is this: there are a number of scenarios where the join operator cannot reasonably make its output until after time moves forward. These scenarios usually involve the possibility of an end edge arriving and thus "deactivating" a join result.

That said, it seems as if we could be doing better with this and more proactive about pushing data to the results from the join synopsis.

So, if this is more a question of "this is odd behavior, what's up", then that's what's up. :-) If this is more of a question of "I really need lower latency and this issue is blocking me" then I will raise the priority of this issue.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Thank you @cybertyche for reply, hope you feeling better, as warm weather seems to be almost upon us :)

I am glad, you confirmed this behavior, I am getting around this issue at the moment by merging a punctuation stream with interval of x, which forces Trill to generate output.

Want to switch gears a bit, if you don't mind. I have another use case where I need to implement a sliding window, from what I gathered in the examples, you do it by AlterEventDuration((start, end) => end - start + TimeSpan.FromSeconds(windowSize).Ticks)

My question is, once again I have a real-time scenario, and I create events with StreamEvent.CreateStart

The output I am getting, is that events never leave the window, am I doing something incorrectly or this is intended behavior?

Thanks again, @afriedma

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

I would suggest instead using the built-in method ExtendLifetime(long). Let me know if you still get the same response.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Just to follow up on my previous comment, I am trying to count children withing the sliding window of 10 sec with punctuation inserted every 5 seconds. My child object is keyed on id and parent id, I have GetHashCode() implemented as well. If id and parentid are the same, it should not be counted twice. Here is the stream construct

var slidingWindow = ingressStream .Where(x => x.Market == market) .AlterEventDuration((start, end) => end - start + windowSize.Ticks);
var clippedIngressStream = slidingWindow .ClipEventDuration(slidingWindow, x => x, x => x) .GroupApply( x => x.ParentId, x => x.Count(), (key, count) => new ChildOrderRateStatModel { ParentOrderId = key.Key, ChildCount = count, Market = market });

My test code is the following

childSubject.OnNext(CreateChild("3", "2", "188", 1m, OrderState.Open)); childSubject.OnNext(CreateChild("3", "2", "188", 2m, OrderState.Open)); childSubject.OnNext(CreateChild("3", "2", "188", 1m, OrderState.Open)); childSubject.OnNext(CreateChild("3", "2", "188", 4m, OrderState.Closed)); Thread.Sleep(5000);

I am expecting a single record with ReportedValue of 1, but instead I get following output on StreamEventKind.Start

ParentOrderId[2] - Market[188] - ReportedValue[1] - ReportTms[4/4/2019 6:16:53 PM]
ParentOrderId[2] - Market[188] - ReportedValue[3] - ReportTms[4/4/2019 6:16:53 PM]

Your input would be greatly appreciated.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Sorry, I missed your response, I did try the function you recommended, but getting the same unexpected result from my previous comment, code below

`var slidingWindow = ingressStream
.Where(x => x.Market == market)
.ExtendLifetime(windowSize.Ticks);

            var clippedIngressStream = slidingWindow
                .ClipEventDuration(slidingWindow, x => x, x => x)
                .GroupApply(
                    x => x.ParentId,
                    x => x.Count(),
                    (key, count) => new ChildOrderRateStatModel
                        { ParentOrderId = key.Key, ChildCount = count, Market = market });`

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

Just so I have some clarity, what is the reason for adding the ClipEventDuration here? It seems to undermine the sliding window so that there are not going to be overlapping lifespans.

from trill.

afriedma avatar afriedma commented on August 25, 2024

I am trying to use ClipEventDuration to truncate the last event with the same key and replace it with the latest one. Child A arrives with Quantity 1M, then quantity is updated to 2M, I am sending latest Child A into the Trill and want Trill to consider them as same object withing the sliding window.

Hope this helps.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

(one side note while I think about the issue - if you reuse a stream variable, such as in stream.ClipEventDuration(stream, ...), you will want to wrap that in a .Multicast, such as stream.Multicast(s => s.ClipEventDuration(s, ...)); doing so will prevent multiple subscription chains down to your ingress source)

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

I think I see the problem, maybe. Because your ClipEventDuration is using the full object identity on both sides (x => x) as the key, then quantity is being taken into consideration as part of that equality test. Thus you will not be treating child A with quantity 1M as the same as child A with quantity 2M. To do that, you'll want to make your keys only those fields that you want considered for equality, maybe something like (x => new { x.id, x.parentid }).

from trill.

afriedma avatar afriedma commented on August 25, 2024

My model object does implement IEquatable interfaces and I did implement GetHashCode and Equals using the key you suggesting. Are you saying Trill not using my objects equality by default?

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

If your class implements IEquatable, we should indeed be using it. I just double-checked the clip operation, it does use it. It's a little strange to have IEquatable simply ignore fields with relevant data...

I'm curious if you simply omit the quantity field altogether, or if you change the quantity fields to all have the same value if that fixes the issue.

from trill.

afriedma avatar afriedma commented on August 25, 2024

I set the quantities to be the same, I also put a breakpoint into my object GetHashCode and equals methods, they are being called, and re-wrote the query with multicast, but unfortunately I still get a weird result, see my query and my object below

`
var slidingWindow = ingressStream
.Where(x => x.Market == market)
.ExtendLifetime(windowSize.Ticks);

            var clippedIngressStream = slidingWindow
                .Multicast(s => s.ClipEventDuration(s, x => x, x => x))
                .GroupApply(
                    x => x.ParentId,
                    x => x.Count(),
                    (key, count) => new ChildOrderRateStatModel
                        { ParentOrderId = key.Key, ChildCount = count, Market = market });

public class ChildOrderModel : IEquatable
{
public ChildOrderModel(string id, string parentId)
{
Id = id.DeepCopy();
ParentId = parentId.DeepCopy();
}

    public string Id { get; }

    public string ParentId { get; }

    public OrderState State { get; set; }

    public string Market { get; set; }

    public decimal Quantity { get; set; }

    public bool Equals(ChildOrderModel other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return Id == other.Id && ParentId == other.ParentId;
    }

    public DateTime StartTime { get; set; }

    public override int GetHashCode()
    {
        return Id.GetHashCode() ^ ParentId.GetHashCode();
    }

    public override string ToString() => $"Id[{Id}] - ParentId[{ParentId}] - State[{State}] - Market[{Market}] - Quantity[{Quantity}]";
}

`

Any other ideas? It seems such a simple use case, but results don't make a any sense.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

I'm curious what happens if you override Equals if you also hit a breakpoint there.

If you'd like, you can also put together a standalone testcase file and I can investigate further from there.

from trill.

afriedma avatar afriedma commented on August 25, 2024

I do hit breakpoint on Equals as well, once. Yes, I would very appreciate if we can get to the bottom of this case use case, as it is an important one for me. Just to reiterate, I am trying to count children withing sliding window, the only caveat is my children objects are updating and I want Trill to understand that and use the latest inside the window.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

OK - if you can drop a .cs file here I'll pick it up and debug further.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Program.zip

Here you go, thank in advance for your help.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

OK, I think I have some answers for you.

Part of the issue is that, either because of the JIT or because C# code is magically fast, several of your DateTime.Now calls are returning the same value. You're creating 4 values (for now, let's call them A, B, C, and E, just because I'm mean) and they're being assigned time values like this:
A -> t1
B -> t2
C -> t2
E -> t2

This pattern means that ClipEventDuration doesn't do what you'd expect. You're going to get A clipped by B, C, or E (take your pick), while B, C, and E are all unclipped. What you'll need to do in code is make sure that your input data is not simultaneous, because ClipEventDuration is not going to remove that simultaneity.

A couple of other small points:

  • In the multicast, you'll want to use the lambda variable in place of all instances of the stream, so instead of stream.Multicast(s => s.ClipEventDuration(stream, ...)) you want to use stream.Multicast(s => s.ClipEventDuration(s, ...)).
  • The code in the .zip file called ExtendLifetime on data that is just start edges. Start edges already have a lifetime duration of infinity, so ExtendLifetime won't change those durations at all. My suspicious is that you'd like to do that after the ClipEventDuration, but that doing so would impact your condition that "equal" payloads shouldn't overlap.
  • Lastly, there's a pretty cool operator called Stitch - it takes two payloads considered "equal" that have adjacent timelines and merge them into a single payload with a single lifetime. So, if A goes from t1 to t2 and also from t2 to t3, then Stitch will merge them into A from t1 to t3.

Here's the code I ended up with after a little playing around. I hope this helps!

    public class ChildOrderModel : IEquatable<ChildOrderModel>
    {
        public ChildOrderModel(string id, string parentId)
        {
            Id = id;
            ParentId = parentId;
        }

        public string Id { get; }

        public string ParentId { get; }

        public OrderState State { get; set; }

        public string Market { get; set; }

        public decimal Quantity { get; set; }

        public bool Equals(ChildOrderModel other)
        {
            if (other is null) return false;
            if (ReferenceEquals(this, other)) return true;
            return Id.Equals(other.Id) && ParentId.Equals(other.ParentId);
        }

        public override bool Equals(object obj) => base.Equals(obj);

        public DateTime StartTime { get; set; }

        public override int GetHashCode() => Id.GetHashCode() ^ ParentId.GetHashCode();

        public override string ToString() => $"Id[{Id}] - ParentId[{ParentId}] - State[{State}] - Market[{Market}] - Quantity[{Quantity}] - StartTime[{StartTime}]";
    }

    [TestMethod]
    public void Main()
    {
        ISubject<ChildOrderModel> childSubject = new Subject<ChildOrderModel>();
        ISubject<StreamEvent<ChildOrderModel>> punctuationStream = new Subject<StreamEvent<ChildOrderModel>>();
        var result = new List<object>();

        var childStreamable = childSubject
            .Select(e => StreamEvent.CreateStart(e.StartTime.Ticks, e))
            .Merge(punctuationStream)
            .ToStreamable(DisorderPolicy.Adjust());

        var slidingWindow = childStreamable
            .Where(x => x.Market == "1234")
            .Multicast(s => s.ClipEventDuration(s, x => x, x => x))
            .Stitch()
            .ExtendLifetime(TimeSpan.FromSeconds(10).Ticks);

        var clippedIngressStream = slidingWindow
           .GroupApply(
               x => x.ParentId,
               x => x.Count(),
               (key, count) => new
               { ParentOrderId = key.Key, ChildCount = count, Market = "1234" });

        var task = clippedIngressStream.ToStreamEventObservable().Where(o => o.IsData).ForEachAsync(o => result.Add(o));

        Observable.Interval(TimeSpan.FromSeconds(5)).Subscribe(t =>
        {
            var timeTicker = DateTime.Now.Ticks;
            punctuationStream.OnNext(StreamEvent.CreatePunctuation<ChildOrderModel>(timeTicker + 1));
        });

        childSubject.OnNext(CreateChild("3", "2", "1234", 1m, OrderState.Open));
        childSubject.OnNext(CreateChild("3", "2", "1234", 2m, OrderState.Open));
        childSubject.OnNext(CreateChild("3", "2", "1234", 1m, OrderState.Open));
        childSubject.OnNext(CreateChild("3", "2", "1234", 4m, OrderState.Closed));
        childSubject.OnCompleted();
        punctuationStream.OnCompleted();
        task.Wait();

        Assert.Fail();
    }

    private static long count = 0;

    private static ChildOrderModel CreateChild(string id, string parentId, string market, decimal quantity, OrderState state)
    {
        return new ChildOrderModel(id, parentId)
        {
            Market = market,
            Quantity = quantity,
            State = state,
            StartTime = new DateTime(count++)
        };
    }

from trill.

afriedma avatar afriedma commented on August 25, 2024

Thank you so much for the analysis and the reply, it makes sense. I will study Stich operator in detail.
All my use cases are real-time and as you mentioned, I do end up creating only start edges. Would I eventually have a memory blow up? What would be the recommended approach for creating events in real time environment.

from trill.

afriedma avatar afriedma commented on August 25, 2024

I also tried running your code, but I get no elements in the result list. Am I missing any other code?
Shouldn't there be one record with count of one in the result?

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

You're getting the empty result because there is a bug I fixed on the Stitch operator that I had not pushed to master yet - I've pushed it to the branch called "Provider" for the moment, and will try to push to master next week.

How you create your events really depends on the query that you are trying to execute - however, creating start edges shouldn't result in a memory blow-up.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Got It, I will wait for the new version and try again. Thank's again.

Earlier you wrote "What you'll need to do in code is make sure that your input data is not simultaneous, because ClipEventDuration is not going to remove that simultaneity." I was hoping that it would only be an issue in testing, but it looks like I am getting simultaneous timestamp in real life as well. I don't see a good way to make sure the uniqueness of the timestamps, I browsed System.Reactive, there is not much there to delay each individual element occurence. Any idea how I should tackle that?

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

If the events that are simultaneous are also equivalent to one another (in that they should only be counted once like you had in other cases) then I would suggest doing a Distinct() over them to remove all but one of the simultaneous events. The multiplicity of equivalent data that is also simultaneous seems to be the issue for your query, so removing the duplication would make the simultaneity irrelevant.

from trill.

afriedma avatar afriedma commented on August 25, 2024

I am not sure if Distinct will work, as it won't pass the latest element, if it's equal to previous. Where in my case two child orders may be equal but with different quantities, therefore I need both of them in Trill. I don't want to add quantity to my equality criteria.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

Distinct in Trill does not eliminate duplicates over all data across all time, just within each timestamp. So if you have data value A valid from t1 until t2, and value A again valid from t3 until t4, if t3 > t2 then Distinct will have no effect. If t2 < t3, then from t2 until t3 one of the A's would be flagged as a duplicate and removed.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Got it, I will try it and revert back.

from trill.

nsulikowski avatar nsulikowski commented on August 25, 2024

dear @cybertyche
earlier you wrote:
“The behavior you describe is currently by design, but that doesn't mean it can't change”
i m also trying to use Trill in real time scenarios, and delayed joins makes the tool unusable.
i would expect that setting may be BatchSize=1 should configure Trill as a real time tool, where joins report as soon as the overlap is detected (and everything else is real time)
i hope you consider the change
best
nestor

from trill.

nsulikowski avatar nsulikowski commented on August 25, 2024

Dear @afriedma, you mentioned "I am getting around this issue at the moment by merging a punctuation stream with interval of x, which forces Trill to generate output."
Can you be more specific? I'd love to implement a workaround until this gets fixed. Thanks.

from trill.

afriedma avatar afriedma commented on August 25, 2024

Hi @nsulikowski, see sample code below

ISubject<ChildOrderModel> ChildSubject = new Subject<ChildOrderModel>()
ISubject<StreamEvent<ChildOrderModel>> punctuationStream = new Subject<StreamEvent<ChildOrderModel>>()

var childStreamable = ChildSubject
                    .Select(e => StreamEvent.CreateStart(e.StartTime.Ticks, e))
                    .Merge(punctuationStream)
                    .ToStreamable(DisorderPolicy.Adjust());

Observable.Interval(TimeSpan.FromSeconds(5)).Subscribe(t =>
                {
                    var timeTicker = DateTime.Now.Ticks;
                    punctuationStream.OnNext(StreamEvent.CreatePunctuation<ChildOrderModel>(timeTicker + 1));
                });

Hope this helps.

from trill.

nsulikowski avatar nsulikowski commented on August 25, 2024

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

@afriedma A new version of the NuGet has been built and pushed, and the Stitch fix is there and in the master branch.

from trill.

afriedma avatar afriedma commented on August 25, 2024

@cybertyche thank you for your help and prompt feedback, I will try it out.

from trill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.