GithubHelp home page GithubHelp logo

Comments (11)

cybertyche avatar cybertyche commented on August 25, 2024 3

Ah, the wonderment that is spam filters. I'm glad you're having fun with Trill - it's a blast to work on, too.

Re: Partition

The partition method does something kind of special and magical. I'll try to explain as best I can.

There is a concept within Trill called "partitioned streams". This feature is one way to get around the restriction within Trill that all data must be in order post-ingress. What it allows is for data to follow an independent timeline per partition. For instance, if you have data coming from a collection of sensors, and you want to do a query per sensor (normally done using GroupApply) but each sensor's data may arrive at the processing node with different network lag, partitioned streams allows each sensor to have its data treated as its own timeline. Global disorder policies (e.g., Drop) are then applied on a per-sensor basis rather than globally.

The way that you "enable" this feature is by ingressing PartitionedStreamEvents instead of StreamEvents. Alternatively, one can enable this feature by using ToPartitionedStreamable instead of ToTemporalStreamable. In both of these cases, the result is that you end up with a stream of type IStreamable<PartitionKey, P>. The marker "PartitionKey" in the key type of the streamable means you're in partitioned world, the world's strangest theme park.

Now, the method Partition allows the user to introduce partitions in the middle of a query rather than at ingress. This method allows the user to then do temporal operations on the data without worrying about keeping all data in order. For instance, a concrete feature request that we got was to be able to do different windowing on data based on a key. The Partition method allows the user to split the timeline, thus allowing each individual partition to be windowed independently without any fear of misordering. You could then have one partition do a tumbling window on an hour, another partition have a hopping window of 10 minutes with a hop of a minute, and so forth.

A good example of the Partition method in action is the Rules Engine example in our samples repo.

from trill.

AlgorithmsAreCool avatar AlgorithmsAreCool commented on August 25, 2024 2

For posterity,

If you set you periodic punctuations to the same interval as your window, Trill will emit punctuation event for the missing intervals.

var query =
    inputObservable
    .ToTemporalStreamable(
        startEdgeExtractor: item => item.Timestamp.Ticks, 
        periodicPunctuationPolicy: PeriodicPunctuationPolicy.Time(TimeSpan.TicksPerHour)
        )
    .TumblingWindowLifetime(TimeSpan.TicksPerHour)
    ...

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

My sincere apologies for how long it has taken to comment on this - for some reason, I didn't get a notification that your issue posted. I'm looking at this right now and will give you some guidance within a couple of hours.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

Re: Discoverability

Most of the things marked as not being discoverable are fields, methods, or types that we would rather not be public but have to be in order for code generation to function properly. However, the specific example you give is clearly an oversight - one should be able to access the key directly. I will fix that in the next release.

from trill.

AlgorithmsAreCool avatar AlgorithmsAreCool commented on August 25, 2024

Hey, it isn't your fault about the delay. I tripped some kind of spam filter when i made this issue and Github suspended my account so it was hidden from you. It took them a couple of days to fix it.

Also, there isn't a massive rush on me needing answers. I've been having a lot of fun reading all the papers and working with Trill.

from trill.

AlgorithmsAreCool avatar AlgorithmsAreCool commented on August 25, 2024

That is a nice bit of flexibility to have!

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

Re: your initial example

Have you tried using TumblingWindowLifetime(TimeSpan.TicksPerHour) instead of your GroupApply? That should give you one result per hour, only returned when the full hour of data has passed.

from trill.

AlgorithmsAreCool avatar AlgorithmsAreCool commented on August 25, 2024

Hmm, looking at my notes, later versions of my queries did switch to tumbling windows. They seem to work great with one exception:

If there is no data for a time period, the data returned data will skip over the empty windows.

As an example, I have a bunch of logs I'm ingressing and filtering for particular events. I'm trying to get per-hour counts of event occurrence. My query looks like this

await
    LogExtractor
        .Create()
        .ExtractSingleSiteDirectory(siteLogFolder)
        .AsObservable()
        .Where(e => e.Entry.Method == "ScaryEvent")
        .ToTemporalStreamable(
            e => e.Entry.Timestamp.Ticks,
            DisorderPolicy.Drop(TimeSpan.TicksPerMinute),
            flushPolicy: FlushPolicy.FlushOnPunctuation,
            periodicPunctuationPolicy: PeriodicPunctuationPolicy.None()
            )
        .Select(e => Empty.Default) //hopefully saving memory since i only need counts???
        .TumblingWindowLifetime(TimeSpan.TicksPerHour)
        .Count()
        .ToStreamEventObservable()
        .ForEachAsync(evt => {
            var start = ToDateTime(evt.StartTime).Value;
            if (evt.IsData)
            {
                var end = ToDateTime(evt.EndTime);
                Console.WriteLine($"{start} {end}  - {evt.Payload}");
            }
        });

This works, but produces output like this

5/22/2019 4:00:00 AM 5/22/2019 5:00:00 AM  - 2970
5/22/2019 5:00:00 AM 5/22/2019 6:00:00 AM  - 2750
5/22/2019 6:00:00 AM 5/22/2019 7:00:00 AM  - 35
5/22/2019 7:00:00 AM 5/22/2019 8:00:00 AM  - 1595
5/23/2019 4:00:00 PM 5/23/2019 5:00:00 PM  - 240
5/23/2019 5:00:00 PM 5/23/2019 6:00:00 PM  - 855
5/23/2019 6:00:00 PM 5/23/2019 7:00:00 PM  - 4610

Note that there is a time skip in between 8AM -> 5PM where there were no events.
I can manually fill in these empties with a little stateful selector on the egress, but is there a built in way to get the empty windows too?

from trill.

AlgorithmsAreCool avatar AlgorithmsAreCool commented on August 25, 2024

Oh and lastly has the research from this paper made it into the wild yet? I'm super interested in the multiple egress latency selection stuff.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

I'm sorry to say the multiple latency work hasn't made it into the main branch nor is on the GitHub site (still only on our deprecated internal repo). I'll reach out to Yinan and see if he's willing to at least port his research prototype to GitHub in a branch.

from trill.

cybertyche avatar cybertyche commented on August 25, 2024

I believe that there is a way to fill in the missing gaps with nulls - let me try to reach into the deep well of my brain's recycle bin and see if I can page that back.

from trill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.