GithubHelp home page GithubHelp logo

sketch-imiss / sketch Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 1.09 MB

This repository provides streaming algorithms that can be used for monitoring large-scale data streams.

License: MIT License

Python 100.00%
cardinality-estimation frequency-estimation streaming persistency-estimation data-partitioning sampling

sketch's Introduction

A Python Library for Monitoring High-speed Network Traffic

version python ray

This repository contains our Python implementation of sketch methods for monitoring high-speed network traffic.

Monitoring high-speed network traffic is a fundamental but important task in the fields of network security and management, which helps network administrators to better understand the runtime status by measuring a variety of metrics such as cardinality, frequency and persistency. Due to limited computational and memory resources, it becomes impractical to exactly compute metrics. Sketch is an effective and efficient data structure for network monitoring, which utilizes a family of hash functions to embed the raw dataset into a compact structure. Sketch keeps a trade-off among accuracy, computational costs, and memory usage. However, existing sketch methods still suffer from the problems of low accuracy, high time and memory complexities.

In this repository, we select three representative metrics, namely cardinality, frequency, and persistency, and develop fast and accurate sketch structures for estimations.

Cardinality Estimation

Given a sequence of user-item pairs, cardinality estimation aims to approximate the number of distinct items that each user connects to.

Frequency Estimation

The frequency of each user is defined as its number of occurrences.

Persistency Estimation

When dividing a long period into a set of timeslots without overlapping, the persistency of each user is defined as the number of timeslots that it occurs at least once.

Sampling Strategy

This repository supports three sampling strategies according to different estimation tasks:

  • Fixed-probability sampling
  • Reservoir sampling
  • Sample and hold

Partitioning Strategy

This repository provides two partitioning strategies:

  • Random partitioning
  • Hashing partitioning

Requirement

  • Python >= 3.9
  • Ray >= 1.12.1

Note

The part of persistency estimation is being rebuilt, we will upload it soon.

Citation

If you use our code, please cite:

Cardinality Estimation

Peng Jia, Pinghui Wang, Yuchao Zhang, Xiangliang Zhang, Jing Tao, Jianwei Ding, Xiaohong Guan, Don Towsley. Accurately Estimating User Cardinalities and Detecting Super Spreaders over Time. Transactions on Knowledge and Data Engineering (TKDE), 2020, 34 (1): 92-106.
Pinghui Wang, Peng Jia, Xiangliang Zhang, Jing Tao, Xiaohong Guan, Don Towsley. Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities over Time. International Conference on Data Engineering (ICDE), 2019: 1094-1105.

Frequency Estimation

Peng Jia, Pinghui Wang, Junzhou Zhao, Ye Yuan, Jing Tao, Xiaohong Guan. LogLog Filter: Filtering Cold Items within a Large Range over High Speed Data Streams. International Conference on Data Engineering (ICDE), 2021: 804-815.

Persistency Estimation

Pinghui Wang, Peng Jia, Jing Tao, Xiaohong Guan. Detecting a Variety of Long-Term Stealthy User Behaviors on High Speed Links. Transactions on Knowledge and Data Engineering (TKDE), 2018, 31 (10): 1912-1925.
Pinghui Wang, Peng Jia, Jing Tao, and Xiaohong Guan. Mining Long-Term Stealthy User Behaviors on High Speed Links. International Conference on Computer Communications (INFOCOM), 2018: 2051-2059.

Contact

For more details, you can drop us an email.

sketch's People

Contributors

jphgxq avatar sketch-imiss avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

brahimmade

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.