GithubHelp home page GithubHelp logo

libkdv's Introduction

LIBKDV - A Versatile Kernel Density Visualization Library for Geospatial Analytics (Heatmap)

Kernel Density Visualization (KDV) has been extensively used for many geospatial analysis tasks (Heatmap). Some representative examples include traffic accident hotspot detection, crime hotspot detection, and disease outbreak detection. Although many scientific software packages, including Scipy, Statmodels, and Scikit-learn, geographical software packages, including QGIS and ArcGIS, and visualization software packages, including Deck.gl and KDV-Explorer, can also support KDV, none of these tools, to the best of our knowledge, can be scalable to high resolution size (e.g., 1280 x 960) and large-scale datasets (e.g., one million data points). Therefore, the huge computational cost limits the applicability of using the off-the-shelf software tools to support advanced (or more complex) geospatial analytics, e.g., bandwidth-tuning analysis and spatiotemporal analysis, which involves computing multiple KDVs in one batch.

Introduction:

To overcome the above issue, we develop the first versatile programming library (LIBKDV) [2], by combining our recent studies (SLAM [1] and SWS [3]), which can reduce the worst-case time complexity for supporting different types of KDV-based geospatial analytics, including:

(1) Bandwidth-tuning analysis (cf. Figure 1): Domain experts can first set multiple bandwidths in a batch, and then generate multiple KDVs with respect to these bandwidths.

03e58de5950a5d503b73952e8a3bbd1

(2) Spatiotemporal analysis (cf. Figure 2): Domain experts can leverage a more complex spatiotemporal kernel density function to generate time-dependent hotspot maps that correspond to different timestamps.

03e58de5950a5d503b73952e8a3bbd1

To further enhance the efficiency for these two tasks, we fully parallelize our methods, SLAM and SWS.

Installation Guidelines:

(for Win64, Linux, and MacOS)

  1. First, build the virtual environment in the Anaconda (recommended Python 3.9)
conda create -n libkdv python=3.9
  1. Activate the virtual environment
conda activate libkdv
  1. Install the dependencies and the library
conda install -c conda-forge geopandas keplergl notebook
pip install libkdv
  1. Anticipated problem(s) and possible solution(s)

OSError: could not find or load spatialindex_c-64.dll

pip install rtree==0.9.3

How to Use:

  1. Import LIBKDV and Pandas in your code
import libkdv
import pandas as pd
  1. Create the LIBKDV object and compute the heatmap
libkdv_obj = kdv(dataset, KDV_type,
                 GPS=true, 
                 bandwidth_s=1000, row_pixels=800, col_pixels=640, 
                 bandwidth_t=6, t_pixels=32,
                 num_threads=8)
libkdv_obj.compute()

Required arguments

dataset: Pandas object, the dataset. (for preparation, please refer to the steps in data_processing.ipynb)
KDV_type: String, "KDV" - single KDV or "STKDV" - Spatio-Temporal KDV.

Optional arguments

GPS: Boolean, true *- use geographic coordinate system * or false - use simple (X, Y) coordinates (evaluation.ipynb).
bandwidth_s: Float, the spatial bandwidth (in terms of meters), default is 1000.
row_pixels: Integer, the number of grids in the x-axis, default is 800.
col_pixels: Integer, the number of grids in the y-axis, default is 640.
bandwidth_t: Float, the temporal bandwidth (in terms of days), default is 6. REQUIRED if KDV_type="STKDV".
t_pixels: Integer, the number of grids in the t-axis, default is 32. REQUIRED if KDV_type="STKDV".
num_threads: Integer, the number of threads, default is 8.

Example for computing a single KDV:

NewYork = pd.read_csv('./Datasets/New_York.csv')
traffic_kdv = kdv(NewYork,KDV_type="KDV",bandwidth_s=1000)
traffic_kdv.compute()

Example for supporting the bandwidth-tuning analysis task:

bandwidths_traffic_kdv = [500,700,900,1100,1300,1500,1700,1900,2100,2300] #Set the bandwidths
result_traffic_kdv = [] #Stores the final results
traffic_kdv = kdv(NewYork,KDV_type="KDV")
for band in bandwidths_traffic_kdv:
    kdv_traffic_kdv.bandwidth_s = band
    result_traffic_kdv.append(traffic_kdv.compute())

Example for supporting the spatiotemporal analysis task:

NewYork = pd.read_csv('./Datasets/New_York.csv')
traffic_kdv = kdv(NewYork,KDV_type="STKDV",bandwidth_s=1000,bandwidth_t=10)
traffic_kdv.compute()
  1. Show the heatmaps by KerplerGL

To generate a single KDV or support the spatiotemporal analysis task, you can use the following code.

from keplergl import KeplerGl
map_traffic_kdv = KeplerGl(height=600, data={"data_1": traffic_kdv.result})
map_traffic_kdv

To support the bandwidth-tuning analysis task, you can use the following code.

from keplergl import KeplerGl
map_traffic_kdv_bands = KeplerGl(height=500)

for i in range(len(bandwidths_traffic_kdv)):
    map_traffic_kdv_bands.add_data(data=result_traffic_kdv[i], name='data_%d'%(i+1))
map_traffic_kdv_bands

Sample datasets:

We offer five sample datasets for testing, which are (1) Atlanta crime dataset [a], (2) Seattle crime dataset [b], (3) New York traffic accident dataset [c], (4) Hong Kong COVID-19 dataset [d], and (5) China Hainan Sanya taxi dataset [e]. The python code (data_processing.py) and the Jupyter notebook (data_processing.ipynb) for extracting these datasets are provided in this Github link.

[a] Atlanta Open Data. http://opendata.atlantapd.org/.
[b] Seattle Open Data. https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5.
[c] NYC Open Data. https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95.
[d] Hong Kong Open Data. https://geodata.gov.hk/gs/view-dataset?uuid=d4ccd9be-3bc0-449b-bd27-9eb9b615f2db&sidx=0.
[e] Hainan Sanya taxi Data. https://github.com/libkdv/libkdv/blob/main/hainan-sanya-taxi.csv.

Advantages:

There are three main advantages for using our LIBKDV.
Easy-to-use software package: Domain experts only need to write a few lines of python codes for using our LIBKDV, which is as easy as using other python packages, including Scikit-learn and Scipy.
High efficiency: LIBKDV is the first library that can reduce the worst-case time complexity for generating KDV, which cannot be achieved by other software tools. Here, we also conduct the experiment in the Seattle crime dataset for comparing the efficiency of different python packages to generate KDV. In this experiment, we fix the resolution size to be 1280 x 960 and sample this dataset with different percentages. Observe from Figure 3 that all the existing libraries, including Scipy, Scikit-learn, and Statsmodels, take at least 100 seconds for generating a single KDV even we sample only 1% of data points in this dataset. Compared with these packages, LIBKDV only takes less than 10 seconds, which is more scalable, for generating KDV. Therefore, instead of calling the KDV function in other python packages, domain experts can call our efficient KDV function in LIBKDV.


Figure 5: Response time of different python libraries for generating KDV in the Seattle dataset, varying the dataset size.

High versatility: Due to the high efficiency of LIBKDV, our library can support more KDV-based geospatial analysis tasks, including bandwidth-tuning analysis (cf. Figure 4) and spatiotemporal analysis (cf. Figure 5), which cannot be natively and feasibly supported by other software tools.


Figure 4: Bandwidth-tuning analysis for the New York traffic accident dataset.


Figure 5: Spatiotemporal analysis for the Hong Kong COVID-19 dataset.

Example Jupyter Notebooks for Calling LIBKDV:

In this Github link, we also provide three Jupyter notebooks, namely Demo_single_KDV.ipynb, Demo_KDV_bandwidth.ipynb, and Demo_STKDV.ipynb, which can support generating a single KDV, bandwidth-tuning analysis, and spatiotemporal analysis, respectively. Interested users can download these Jupyter Notebooks for testing our library. Please also refer to the demonstration video for more details.

Project Members:

Prof. (Edison) Tsz Nam Chan, Hong Kong Baptist University
Mr. Pak Lon Ip, Universiy of Macau
Mr. Kaiyan Zhao, Universiy of Macau
Prof. (Ryan) Leong Hou U, Universiy of Macau
Prof. Byron Choi, Hong Kong Baptist University
Prof. Jianliang Xu, Hong Kong Baptist University

Collaborators:

Prof. Reynold Cheng, The University of Hong Kong
Prof. (Ken) Man Lung Yiu, Hong Kong Polytechnic University
Dr. Zhe Li, Alibaba Cloud
Mr. Ye Li, University of Macau
Mr. Weng Hou Tong, University of Macau
Mr. Shivansh Mittal, The University of Hong Kong

Publications:

  1. Tsz Nam Chan, Leong Hou U, Byron Choi, Jianliang Xu. SLAM: Efficient Sweep Line Algorithms for Kernel Density Visualization. Proceedings of ACM Conference on Management of Data (SIGMOD), 2022.
  2. Tsz Nam Chan, Pak Lon Ip, Kaiyan Zhao, Leong Hou U, Byron Choi, Jianliang Xu. LIBKDV: A Versatile Kernel Density Visualization Library for Geospatial Analytics. Proceedings of the VLDB Endowment (PVLDB), 2022.
  3. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu. SWS: A Complexity-Optimized Solution for Spatial-Temporal Kernel Density Visualization. Proceedings of the VLDB Endowment (PVLDB), 2022.
  4. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Byron Choi, Jianliang Xu. SAFE: A Share-and-Aggregate Bandwidth Exploration Framework for Kernel Density Visualization. Proceedings of the VLDB Endowment (PVLDB), 2022.
  5. Tsz Nam Chan, Pak Lon Ip, Leong Hou U, Weng Hou Tong, Shivansh Mittal, Ye Li, Reynold Cheng. KDV-Explorer: A Near Real-Time Kernel Density Visualization System for Spatial Analysis. Proceedings of the VLDB Endowment (PVLDB), 2021.
  6. Tsz Nam Chan, Zhe Li, Leong Hou U, Jianliang Xu, Reynold Cheng. Fast Augmentation Algorithms for Network Kernel Density Visualization. Proceedings of the VLDB Endowment (PVLDB), 2021.
  7. Tsz Nam Chan, Reynold Cheng, Man Lung Yiu. QUAD: Quadratic-Bound-based Kernel Density Visualization. Proceedings of ACM Conference on Management of Data (SIGMOD), 2020.
  8. Tsz Nam Chan, Leong Hou U, Reynold Cheng, Man Lung Yiu, Shivansh Mittal. Efficient Algorithms for Kernel Aggregation Queries. IEEE Transactions on Knowledge and Data Engineering (TKDE).
  9. Tsz Nam Chan, Man Lung Yiu, Leong Hou U. KARL: Fast Kernel Aggregation Queries. IEEE International Conference on Data Engineering (ICDE), 2019.

libkdv's People

Contributors

edisonchan2013928 avatar ryanlhu avatar libkdv avatar pheabt avatar nick12340 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.