bunsenfeng / botrgcn Goto Github PK

Code listing for the paper 'BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks'. ASONAM 2021.

License: MIT License

Python 100.00%

botrgcn's Introduction

BotRGCN

Introduction to BotRGCN

Twitter users operated by automated programs, also known as bots, have increased their appearance recently and induced undesirable social effects. While extensive research efforts have been devoted to the task of Twitter bot detection, previous methods leverage only a small fraction of user semantic and profile information, which leads to their failure in identifying bots that exploit multi-modal user information to disguise as genuine users. Apart from that, the state-of-the-art bot detectors fail to leverage user follow relationships and the graph structure it forms. As a result, these methods fall short of capturing new generations of Twitter bots that act in groups and seem genuine individually. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and apply relational graph convolutional networks to the Twittersphere. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships. BotRGCN is also proved to effectively leverage three modals of user information, namely semantic, property and neighborhood information, to boost bot detection performance.

Affiliated Paper

The affiliated paper of this repository, 'BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks', is accepted at ASONAM'21. Work in progress.

Dataset

More details at TwiBot-20 data , please download 'Twibot-20.zip' to the folder which also contains 'Dataset.py' and extract it there.

Code Description

Dataset.py
- ```
class Twibot20(self,root='./Data/,device='cpu',process=True,save=True)
```
  - root - the folder where the processed data is saved , the default folder is './Data' , which has already been created
  - save - whether to save the processed data or not (set it to True can save you a lot of time if you want to run this model for further ablation study)
  - process - If you have already saved the processed data,set it to True
model.py
- BotRGCN - the standard BotRGCN
- BotRGCN1 - using the description feature alone
- BotRGCN2 - using the tweets feature alone
- BotRGCN3 - using the numerical properties feature alone
- BotRGCN4 - using the categorical properties feature alone
- BotRGCN12 - using the description feature + the tweets feature
- BotRGCN34 - using the numerical properties feature + the categorical properties feature
- BotGCN - replace the RGCNConv layers with GCNConv layers
- BotGAT - replace the RGCNConv layers with GATConv layers
- BotRGCN_4layers - BotRGCN with 4 RGCNConv layers
- BotRGCN_8layers - BotRGCN with 8 RGCNConv layers

The Pre-Processing is too slow!

This is a common issue since we did not present parallelized code in this repo since it's dependent on specific CPU/GPU/device configurations. You can parallelize it yourself or download our pre-processing results here at link.

The above zip file includes four generated embeddings

des_tensor.pt (user_description)
tweets_tensor.pt (user_tweets)
num_properties_tensor.pt (numerical_properties)
cat_properties_tensor.pt (categorical_properties)

( shape : [number_of_users , embedding_size] )

and edge_index.pt, edge_type.pt, label.pt

The Pre-Trained weight of BotRGCN on Twibot-22

To facilitate future research , we provide here the state_dict() of our BotRGCN model trained on Twibot-22.

botrgcn's People

Contributors

Stargazers

Watchers

Forkers

leopoldwhite kareemalaa2001 wy8881 lenny-dai tanyouqing

botrgcn's Issues

问题

不好意思问下，为什么我这里显示引入不了数据集，一直显示'Twibot20' object has no attribute 'df_data_labeled'？

预处理数据集

请问直接下载得到的您处理好的train.json， test.json等文件，这些文件是存在问题的吗？是需要重新跑一些您发布（在TwiBot-22中）的预处理代码，自己重新处理一遍吗？因为我在跑对推特文本进行编码的代码处一直报错。

关于huggingface上的roberta-base模型下载

您好，运行时，这一步我出现错误

显示无法连接到 Hugging Face Hub 我尝试官网中Fetch models and tokenizers to use offline的方法，其中两种方法仍显示无法连接，于是我想手动通过用户界面下载文件，通过搜索“roberta-base”，发现有很多模型，不知道应该下载哪一个

是否能**提供模型的下载链接**，非常感谢！

Help, It seems the model inputs doesn't match the weight dimensions of the linear layer.

Hello, the author. Recently, I have been studying the use of RGCN to detect the robot, but when I was running this model, The error message indicates that a matrix shape mismatch has occurred. My understanding is that the model's inputs doesn't match the weight dimensions of the linear layer,I was wondering if I should set the num_prop dimension to (229580, 5) instead of (229580, 6).After trying many ways I still don't know how too fix this.could you teach me how to solve it? Thank you very much!
The error message is as follows:
File "main.py", line 57, in
train(epoch)
File "main.py", line 23, in train
output = model(des_tensor,tweets_tensor,num_prop,category_prop,edge_index,edge_type)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/111/BotRGCN-main/model.py", line 653, in forward
n = self.linear_relu_num_prop(num_prop)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (229580x6 and 5x32)

Something wrong with the pre-process data

预处理过的数据与论文里的feature_size不一样还有把你给的num_properties_tensor.pt打印出来为什么是float类型，不是6维特征吗打印出来为啥是5维，
cat_properties_tensor.pt 11维打印出来为啥是3维

Data Preprocessing hardware requirements.

I tried to run through the data process codes with Google Colab Pro, yet even with the GPU and 24GB memories provided, I still ran out of memory constantly.

May I know what hardware sets are you guys using and if there is any possible solution on my issue ?

Solve the problem of missing description. npy files.

Hello, the author. Recently, I have been studying the use of RGCN to detect the robot, but when I was running this model, the description.npy file failed to be read, my understanding is that this file and its specific content are missing. could you teach me how to solve it? Thank you very much!

twibot-22 preprocessed files

Hello dear authors,

Would you have the preprocessed output files for the twibot-22 dataset? I have been struggling with reproducing results as I run out of memory during preprocess_2.py.

Thanks in advance for your help.

Request for pre-trained model

Hi authors, I'm currently doing some research in twitter bot detection for a course project. I came across your BotRGCN project from your Twibot-22 page, and got quite interested in it. Could I ask if you would kindly share the pre-trained model for BotRGCN on twibot-22? I just finished my own training and wish to have a reference using the pre-trained model if possible~

Thanks very much for your great work!