GithubHelp home page GithubHelp logo

painless-data-science-examples's Introduction

Overview

This is a place to prototype interesting examples of using Painless to achieve ad hoc data analysis with Elasticsearch. The idea is to have somewhere we can collaborate on developing examples which showcase what you can do with Painless or are preproduct features which we can explore as scripts. An example must include a working Painless snippet and a Python test harness. The test harness must be able to create an index against which one can exercise the functionality and allow one to run it via the Python client. Ideally, any important implementation details should be discussed in the README of each example. It is fine to include multiple implementations of the same task to showcase different features of Painless. As a minimum discussions should include dangers and mitigations, such as using and how to avoid using too much memory in a scripted metric aggregation.

Motivation

This grew out of a request to implement the apriori algorithm within the Elastic stack. It turns out that a scripted metric aggregation is able to do this, which is great. However, it is not straightforward to work out how to do this if 1. your primary programming language is not Java, 2. you use only the existing documentation. These examples are intended to provide a reference place where data scientist users of Elasticsearch can see pedagogical examples of using scripting to perform ad hoc data analysis tasks with Elasticsearch. Aside from providing useful out-of-the-box functionality, the hope is to showcase how much one can achieve and help introduce this community to this useful functionality.

Usage

Set up a virtual environment called env

python3 -m venv env

Activate it

source env/bin/activate

Install the required dependencies

pip3 install -r requirements.txt

Once you start an Elasticsearch instance, then each example includes code to generate some sample data. This is typically done using the Demo object from the demo module, for example:

>>> from examples.apriori.demo import Demo
>>> demo = Demo(user_name='my_user', password='my_password')
>>> demo.setup()

where 'my_user' and 'my_password' are the user name and password for the Elasticsearch instance you've started. The Demo object also allows you to run the aggregation using the Elasticsearch Python to see the result on the demo data set, for example:

>>> demo.run()

For the apriori example you should see output like:

FREQUENT ITEM SETS DEMO...
FREQUENT_SETS(size=1)
   DIAMETER_PEER_GROUP_DOWN / support = 0.163
   DIAMETER_PEER_GROUP_DOWN_RX / support = 0.1385
   NO_PEER_GROUP_MEMBER_AVAILABLE / support = 0.309
   DIAMETER_PEER_GROUP_UP_TX / support = 0.1535
   PAD-Failure / support = 0.175
   NO_PROCESS_STATE / support = 0.1385
   NO_RESPONSE / support = 0.3305
   DIAMETER_PEER_GROUP_UP_RX / support = 0.145
   IP_REACHABLE / support = 0.5105
   RELAY_LINK_STATUS / support = 0.3675
   POM-Failure / support = 0.1765
   MISMATCH_REQUEST_RESPONSE / support = 0.1815
   vPAS-Failure / support = 0.1755
   PROCESS_STATE / support = 0.291
   IP_NOT_REACHABLE / support = 0.351
   DIAMETER_PEER_GROUP_DOWN_GX / support = 0.1405
FREQUENT_SETS(size=2)
   PAD-Failure PROCESS_STATE / support = 0.1445
   DIAMETER_PEER_GROUP_UP_TX POM-Failure / support = 0.1475
   MISMATCH_REQUEST_RESPONSE PAD-Failure / support = 0.1525
   ...

Each example directory also includes the scripted metric request in a text file, for example examples/apriori/scripted_metric_frequent_sets.txt. This can be pasted also be pasted and run kibana dev console as follows:

GET apriori_demo/_search
{
  "size": 0,
  "query": {
    "function_score": {
      "random_score": {}
    }
  },
  ...
}

painless-data-science-examples's People

Contributors

tveasey avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.