DiffuGreedy Influence Maximization

Code and instructions to reproduce the analysis of the paper DiffuGreedy: An Influence Maximization Algorithm Based on Diffusion Cascades

Folder structure

Root folders: Code, Data, Figures

Code: Contains the contents of this folder and the code of NETRATE. You will also need code for IMM and SIMPATH. PMIA.py and runIAC.py are taken from python PMIA implementation.

Data -> Init Data: Contains the cascades and the follower network from Sina Weibo i.e. total.txt and graph_170w_1month.txt
Data ->Empty folder Logs
Data ->Empty folder Netrate
Data ->Empty folder Seeds
Data ->Empty folder Results

Requirements

gcc version >=4.7

MATLAB 2017b

Python 2.7, packages: igraph, pandas, numpy, networkx

R packages :ggplot, reshape2

Code

The scripts follow the order indicated by the number in their title.
Below is an explanation on how each influence maximization technique is implemented through the scripts.

Diffusion Greedy

_2_diffusion_greedy.py runs diffusion-based influence maximization using the train cascades.

Ranking by K-core decomposition

_2_train.py runs k-core decomposition for each node in the active graph and stores it at kcores.csv.
_3_rank_nodes.py derives the top nodes based on it and stores them at folder Seeds.

Influence Maximization via Martingales

_2_train.py extracts the active network for the first 25 days at train_network.pickle.
_3_extract_weighted_cascade.py adds edge weights to the network based on weighted cascade and stores it at follower_weighted.txt. It also creates the attribute file required for the IMM algorithm.
Use the IMM code to produce the seed set of follower_weighted.txt and store it in a file with the same name in Data\Seeds.

PMIA on the Diffusion-based Network

_4_reform_cascades.py uses top_nodes.csv created by _3_rank_nodes.py to filter the training cascades to include only top nodes based on degree and follow the format required for NETRATE. The cascade file is stored at Data\Netrate.
_5_call_netrate.m calls NETRATE algorithm for each cascade file and stores the resulting adjacency list at Data\Netrate.
_6_run_pmia.py creates a network out of the adjecency matrix, weighs it based on weighted cascade and computes NETRATE's accuracy in retrieving follow relationships. It then uses PMIA to derive the seed set.

SIMPATH on the Data-based weighted Network

_2_train.py extracts the active network for the first 25 days at train_network.pickle.
_3_extract_bernouli_and_time.py extracts three weighted networks, with edge weights based on influence strength (literature's Bernoulli-ic), the inverse of average influence delay, and their product.
Use the SIMPATH code and the .inf files from the previous step to derive the seed sets and store them in text files with the same name as the .inf, with format "seed1 seed2 seed3 etc..", in Data\Seeds.

braylon1002 / diffugreedy-influence-maximization Goto Github PK