GithubHelp home page GithubHelp logo

dlt1412 / hawks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sea-shunned/hawks

0.0 1.0 0.0 8.14 MB

A package for generating synthetic clusters with control over "difficulty"

License: MIT License

Python 100.00%

hawks's Introduction

HAWKS Data Generator

HAWKS Animation

HAWKS is a tool for generating controllably difficult synthetic data, used primarily for clustering. This repo is associated with the following paper:

  1. Shand, C, Allmendinger, R, Handl, J, Webb, A & Keane, J 2019, Evolving Controllably Difficult Datasets for Clustering. in Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . The Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13/07/19. https://doi.org/10.1145/3321707.3321761 (Nominated for best paper on the evolutionary machine learning track at GECCO'19)

The academic/technical details can be found there. What follows here is a practical guide to using this tool to generate synthetic data.

If you use this tool to generate data that forms part of a paper, please consider either linking to this work or citing the paper above.

Installation

Installation is available through pip by:

pip install hawks

or by cloning this repo (and installing locally using pip install .).

Running HAWKS

Like any other package, you need to import hawks in order to use it. The parameters of hawks are configured via a config file system. Details of the parameters are found in the user guide. For any parameters that are not specified, default values will be used (as defined in hawks/defaults.json).

The example below illustrates how to run hawks. Either a dictionary or a path to a JSON config can be provided to override any of the default values.

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import hawks

SEED_NUM = 42

# Fix the seed number
config = {
    "hawks": {
        "seed_num": SEED_NUM
    }
}
# Any missing parameters will take the default seen in configs/defaults.json
generator = hawks.create_generator(config)
# Run the generator
generator.run()
# Get the best dataset found and it's labels
datasets, label_sets = generator.get_best_dataset()
# Stored as a list for multiple runs
data, labels = datasets[0], label_sets[0]
# Run KMeans on the data
km = KMeans(
    n_clusters=len(np.unique(labels)), random_state=SEED_NUM
).fit(data)
# Get the Adjusted Rand Index for KMeans on the data
ari = adjusted_rand_score(labels, km.labels_)
print(f"ARI: {ari}")

User Guide

For a more detailed explanation of the parameters and how to use HAWKS, please read the user guide.

Issues

As this work is still in development, plain sailing is not guaranteed. If you encounter an issue, first ensure that hawks is running as intended by navigating to the tests directory, and running python tests.py. If any test fails, please add details of this alongside your original problem to an issue on the GitHub repo.

Feature Requests

At present, this is primarily academic work, so future developments will be released here after they have been published. If you have any suggestions or simple feature requests for HAWKS as a tool to use, please raise that on the GitHub repo.

If you are interested in extending this work or collaborating, please email cameron(dot)shand(at)manchester(dot)ac(dot)uk.

hawks's People

Contributors

sea-shunned avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.