GithubHelp home page GithubHelp logo

yinzheng-zhong / tfgen Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 87 KB

A transition-based feature generator Python package for machine learning.

License: GNU General Public License v3.0

Python 94.81% Shell 5.19%

tfgen's Introduction

Transition-based Feature Generator (TFGen)

PyPI version

Description

The process log/event log will be used as input for the feature generator. The feature generator will generate transition matrices.

How to use

Installation

pip install tfgen    # normal install
pip install --upgrade tfgen  # update tfgen

How to use

The observable event classes are first required, and they can be acquired using the method "get observable ec()". The generated features will change if the event classes or the order of the event classes are changed, so it is preferable to save the observable event classes for later use. A dataframe/list/array of attributes should be the method's first parameter. The datasets we'll use for the following code examples are available in release v0.2.1 on Github.

from tfgen.observe_event_classes import get_observable_ec

data_for_ec = pd.read_csv('test_data_for_ec.csv')
ec = get_observable_ec(data_for_ec[['Flags', 'S/C']])  # Flags and S/C are the attributes

We can now start creating the TFGen object. A list of observable event classes that we obtained in the previous step is the first parameter. The window size is the second variable.

from tfgen import TFGen
tfgen = TFGen(ec, window_size=500)

We are currently loading the event log data to create features. Before sending the data to TFGen, make sure it is already in chronological order. EOT marking is required at the ending of each case, and it must be done under each attribute. Without EOT, the TET will continue to expand. Possibly like this:

Example of input data.
Case_ID Flags S/C
... ... ...
13 000.ACK.FIN. C
13 000.ACK. S
14 000.SYN. C
13 000.ACK.RST. S
13 EOT EOT
14 000.ACK.SYN. S
... ... ...
data_for_feature = pd.read_csv('test_data_with_eot.csv')

We can load the offline dataset or load the dataset in an online streaming mode. The method for loading the dataset in offline mode is:

tfgen.load_from_dataframe(data_for_feature, case_id_col='Case_ID', attributes_cols=['Flags', 'S/C'])
output = tfgen.get_output_list()  # this will return a list of data.

Note that the output is a list (or other iterable) of tuples where each tuple contains two variables (case_id, transition_table). The case_id comes from the last processed event, and it can be used for labelling the data for supervised learning or validation. "get_output_list()" can only be used in offline mode.

The following example uses the generator as input for online streaming mode.

# replace this generator with the actual generator
def replace_with_the_actual_generator():
    while True:
        for rows in data_for_feature.values:
            case_id = rows[0]
            event_attrs = rows[[2, 3]]

            yield case_id, event_attrs  # event_attr is an iterable with multiple attributes.

# Use the generator as an input for online streaming.
tfgen.load_from_generator(replace_with_the_actual_generator)
out = tfgen.get_output_generator()  # this will return a generator as the output.

Only the input methods "load_from_dataframe()" or "load_from_generator()" can be used with the output method "get_output_generator()".

The data can be entered one at a time into TFGen. Due to the fact that TFGen requires several events to initialise, the output is not guaranteed. To use this method, handle the InitialisingException exception.

from tfgen import InitialisingException

data_for_feature_array = data_for_feature.values
for sample in data_for_feature_array:
    case_id = sample[0]
    event_attrs = sample[[2, 3]]

    # tfgen.load_next(<your data sample>). The sample is a tuple of (case_id, event_attrs)
    # and event_attrs is an iterable with multiple attributes.
    tfgen.load_next(case_id, event_attrs)
    try:
        print(tfgen.get_output_next())
    except InitialisingException:
        continue

The output method "get_output_next()" is compatible with all input methods.

Methods

Currently, the "Classic" and the "ClassicLargeSparse" methods for feature generation are available. The "Classic"method is employed by default. The "ClassicLargeSparse" method can be used to output Scipy sparse matrices for event logs that contain a larger number of event classes.

from tfgen import TFGen

tfgen = TFGen(ec, window_size=500, method=TFGen.ClassicLargeSparse)

Implementing New Methods

By deriving from the "BaseMethod" located in "tfgen/methods/base method.py," one can extend the existing methods by creating new method classes. All classes must be placed under "tfgen/methods/" directory. The next event sample must be obtained using method "self.get_next_data()," and the generated feature must be sent to the output using method "self.send_data()". "self.finished" will become "True" if the input stream reaches the end.

from tfgen.methods.base_method import BaseMethod

class NewMethod(BaseMethod):
def __init__(self, ec_lookup_table, window_size, input_stream, output_stream):
    super().__init__(ec_lookup_table, window_size, input_stream, output_stream)

# entry
def start_processing(self):
    while True:
        # event is a tuple of (case_id, event_attrs)
        event = self.get_next_data()
        # do something
        self.send_data(processed_data)
        if self.finished:
            break

Then include the new method in the "TFGen" class found in "tfgen/tfgen.py". Two locations are required to be modified.

class TFGen:
    METHOD_CLASSIC = 101
    METHOD_CLASSIC_LARGE_SPARSE = 102
    METHOD_NEW_METHOD = 103  # The first location. New method class

    def _select_method(self, method):
    if self.method == TFGen.METHOD_CLASSIC:
        return Classic(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    elif self.method == TFGen.METHOD_CLASSIC_LARGE_SPARSE:
        return ClassicLargeSparse(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    # The second location. The instance to the new method class
    elif self.method == TFGen.METHOD_NEW_METHOD:
        return NewMethod(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    else:
        raise Exception("Method not supported")

tfgen's People

Contributors

yinzheng-zhong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.