GithubHelp home page GithubHelp logo

gps_kdd's Introduction

Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data (KDD'24)

Hanyuan Yuan, Jiarong Xu*, Cong Wang, Ziqi Yang, Chunping Wang, Keting Yin and Yang Yang. (*Corresponding author)

Brief Introduction

The public sharing of user information opens the door for adversaries to infer private data, leading to privacy breaches. While studies have concentrated on privacy leakage via public user attributes, the threats associated with user relationships are often neglected. This study aims to advance the understanding of privacy risks emanating from network structure, moving beyond direct neighbor connections to the broader implications of indirect network structural patterns.

  • Problem and measure: Our work pioneers a comprehensive investigation into the problem of Graph Privacy leakage via Structure (GPS), introducing the innovative Generalized Homophily Ratio (GHRatio) as a measure of privacy leakage.
  • Attack model: We introduce a novel private attribute inference attack leveraging a data-centric strategy to exploit all identified privacy breaches. By feeding a GNN various data forms, it gains the ability to learn from multiple homophily types that result in privacy risks.
  • Defensive model: To counter the attacks, we propose a graph data publishing method that employs learnable graph sampling, rendering the sampled graph suitable for publication.

For more details, please refer to the paper.

Table of Contents:

Environment Set-up

Please first clone the repo and install the required environment, which can be done by running the following commands:

# Clone our repo
git clone https://github.com/xxx08796/GPS_KDD.git
cd GPS_KDD
mkdir dataset
# Create conda env
conda create -n gps python=3.8.0 -y
conda activate gps
# Torch 2.0.1 with CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
# Install required libraries
pip install -r requirements.txt
pip install torch-sparse==0.6.17 -f https://pytorch-geometric.com/whl/torch-2.0.1+cu118.html
pip install torch-scatter==2.1.1 -f https://pytorch-geometric.com/whl/torch-2.0.1+cu118.html

Dataset Set-up

We used three datasets in this work, including: Pokec-z, Pokec-n, and NBA, you can download the pokec/ and NBA/ directory here. And add them to ./dataset/ in this project. Specifically:

Experiment on graph private attribute inference attack

  • Our proposed graph private attribute inference attack is based on a data-centric strategy of feeding different data forms (i.e., graph vs subgraphs) into GNN to learn different knowledge (i.e., proximity homophily vs structure-role homophily).
  • Launching an attack: To run the proposed graph private attribute attack, you could execute exp_attack.py.
python exp_attack.py \
    --device <GPU ID> \
    --num_layers <GNN layers> \
    --train_ratio <training data ratio> \
    --p_encoder <GNN encoder for proximity homophily> \
    --s_encoder <GNN encoder for structure-role homophily> \
    --num_workers <dataloader workers> \
    --dataset < dataset name> \
    --sens_attr <private attribute>
  • Below is a demo on NBA, treating country as the private attribute:
python exp_attack.py --device 0 --num_layers 2 --train_ratio 0.1 --p_encoder P_GIN --s_encoder S_GIN --num_workers 1 --dataset nba --sens_attr country

Experiment on privacy-preserving graph data publishing

  • The proposed method for graph data publishing is based on adversarial training strategy, where the objectives are (1) defending against worst-case attack, (2) limiting benefits that graph structure provides for attribute inference, and (3) controlling the deviation of the sampled graph.
  • To learn the sampled graph and test the performance of privacy-preserving and utility, you could execute exp_defense.py.
python exp_defense.py \
    --device <GPU ID>
    --num_layers <GNN layers>
    --train_ratio <training data ratio>
    --p_encoder <GNN encoder for proximity homophily>
    --s_encoder <GNN encoder for structure-role homophily>
    --num_workers <dataloader workers>
    --dataset <dataset name>
    --sens_attr <private attribute>
    --ds_label <downstream label>
    --lam <hyper-param>
    --gamma <hyper-param>
    --eta <hyper-param>
  • Below is a demo on NBA, treating country as the private attribute and salary as the downstream label:
python exp_defense.py --device 0 --num_layers 2 --train_ratio 0.1 --p_encoder P_GIN --s_encoder S_GIN --num_workers 1 --dataset nba --sens_attr country --ds_label SALARY --lam 2.0 --gamma 3.0 --eta 2.0

Contact

For any questions or feedback, feel free to contact Hanyang Yuan.

Citation

If you find our work useful in your research or applications, please kindly cite:

@inproceedings{yuan2024unveiling,
title={Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data}, 
author={Hanyang Yuan and Jiarong Xu and Cong Wang and Ziqi Yang and Chunping Wang and Keting Yin and Yang Yang},
year={2024},
booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
}

Acknowledgements

This code implementation was inspired by GraphGPS and ADGCL. This readme.md was inspired by GraphGPT. Thanks for their wonderful works.

gps_kdd's People

Contributors

xxx08796 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.