GithubHelp home page GithubHelp logo

jackzhousz / vc_sample Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tobiasrp/vc_sample

0.0 0.0 0.0 12.14 MB

Python prototype for optimally stratified sampling

License: MIT License

Python 0.47% Jupyter Notebook 99.53%

vc_sample's Introduction

vc sample

Our void-and-cluster method (https://arxiv.org/abs/1907.05073) was designed to perform optimally stratified sampling of spatiotemporal scattered data in 2D and 3D. This repository contains an easy to use and to extend prototype implementation in python.

This implementation is not restricted to spatiotemporal data. However, since the sampling strategy is based on kernel density estimation, the curse of dimensionality will be problematic in higher dimensions. One goal of this prototype is to explore stratification in higher dimensions.

Example

This is a simple example of a two-dimensional point set from which we sample a subset using vc sample:

docs/input.png

The original dataset

docs/output.png

10% of samples

Samples should be maximally pairwise distant, whilst still respecting the density of data points. I.e. dense regions in the original data are sampled more often.

Extending this sampling method to higher dimensions is problematic, due to the curse of dimensionality. More specifically, the kernel density estimation does not scale with the dimensionality. However, in a lot of cases, our data is actually lower dimensional, but is embedded in a higher dimensional space.

For example, this S-curve is a 2D manifold (think of a deformed rectangle) that lies in a 3D space:

docs/full_s-curve.png

To sample such a dataset, we only have to change the density estimation. By specifying the density on the S-curve and by not computing distances in 3D, the void and cluster algorithm can be directly applied without changes. Note that this reduces the dimensionality of the density estimation.

The difficult part is now finding such a density estimation method on this embedded manifold. Here, we use an estimate based on the UMAP dimensionality reduction technique and then sample accordingly:

docs/sampled_s-curve.png

10% of stratified samples on the S-curve

Installation

This project is still under development. For setting it up for development, pyscaffold is used, to simplify python packaging.

Clone the repo:

` git clone [email protected]:TobiasRp/vc_sample.git && cd vc_sample `

Optionally, create and initialize conda or another virtual python env. This is not included here. Then, install pyscaffold.

` pip install pyscaffold `

Lastly, install vc_sample as an editable package including its dependencies:

` pip install -e . `

Now, vc_sample can be imported and used as a python package.

Usage

Example notebooks can be found in the notebooks folder. Start with notebooks/sampling_examples.

vc_sample's People

Contributors

tobiasrp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.