Influence of tracking duration on the privacy of individual mobility graphs

This is the open-source code for our paper presented at LBS 2022, the 17th International Conference on Location Based Services in Munich. Our paper analyzes how privacy-preserving are graph representations of mobility.

This repository includes all source code used to produce the results from the paper; however, the dataset can not be published due to data privacy.

Installation

The following packages are required to run this code:

trackintel
graph_trackintel
pandas, numpy, scipy
matplotlib, seaborn
networkx
psycopg2, sqlalchemy

Analysis

Our analysis comprises the following steps:

precompute features A script that precomputes the graph features used in the publication.
- transition_feats: The distribution of transition weights over the 20 most popular trips.
- shortest_path_feats: The distribution of shortest-path lengths in the graph.
- centrality_feats: The betweenness centrality of a node denotes its centrality with respect to other nodes.
- in_degree_feats: Distribution of (unweighted) node in-degrees.
- out_degree_feats: Similar to the in-degree, the distribution of out-degrees over the 20 locations with the highest out-degree is computed.
create cross join table
- Loads feature table
- Combines all pairs of subsequent time periods for reidentification tests
compute similarity
- Loads the cross-joined pairs
- Computes similarity with several metrics for each pair
- Writes the result to the database
rank users
- Loads the similarities
- For each duration-bin combination, rank the users from the pool by their distance to the current user
- Write the rank of the matched user to the database
fill_matrix:
- Loads similarities for all combinations (all users & time period bins)
- Computes reidentification accuracy for all time-bin combinations (over users)
- Computes reciprocal ranks
visualization: Functions for visualizing / summarizing all results reported in the paper, namely
- The reidentification accuracies (Figure 2)
  - How much is the reidentification top-k accuracy for differnt tracking periods
- Regression analysis (Table 1)
  - performs the regression analysis to evaluate the effect of pool- and test-user tracking duration on the matching performance
- Feature analysis (Table 2)
  - which features improve user reidentification performance?
- Privacy loss analysis (Figure 3)
  - What is the privacy loss due to reidentifying users?
- Intra and inter user differences (Figure 4)
  - Is the variance explained by differences between users or by differences between time periods?

mie-lab / topology_privacy Goto Github PK

topology_privacy's Introduction

Influence of tracking duration on the privacy of individual mobility graphs

Installation

Analysis

topology_privacy's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs