GithubHelp home page GithubHelp logo

bunsenfeng / botrgcn Goto Github PK

View Code? Open in Web Editor NEW
29.0 2.0 6.0 22 KB

Code listing for the paper 'BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks'. ASONAM 2021.

License: MIT License

Python 100.00%

botrgcn's Introduction

BotRGCN

Introduction to BotRGCN

Twitter users operated by automated programs, also known as bots, have increased their appearance recently and induced undesirable social effects. While extensive research efforts have been devoted to the task of Twitter bot detection, previous methods leverage only a small fraction of user semantic and profile information, which leads to their failure in identifying bots that exploit multi-modal user information to disguise as genuine users. Apart from that, the state-of-the-art bot detectors fail to leverage user follow relationships and the graph structure it forms. As a result, these methods fall short of capturing new generations of Twitter bots that act in groups and seem genuine individually. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and apply relational graph convolutional networks to the Twittersphere. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships. BotRGCN is also proved to effectively leverage three modals of user information, namely semantic, property and neighborhood information, to boost bot detection performance.

Affiliated Paper

The affiliated paper of this repository, 'BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks', is accepted at ASONAM'21. Work in progress.

Dataset

More details at TwiBot-20 data , please download 'Twibot-20.zip' to the folder which also contains 'Dataset.py' and extract it there.

Code Description

  • Dataset.py

    • class Twibot20(self,root='./Data/,device='cpu',process=True,save=True)
      • root - the folder where the processed data is saved , the default folder is './Data' , which has already been created
      • save - whether to save the processed data or not (set it to True can save you a lot of time if you want to run this model for further ablation study)
      • process - If you have already saved the processed data,set it to True
  • model.py

    • BotRGCN - the standard BotRGCN
    • BotRGCN1 - using the description feature alone
    • BotRGCN2 - using the tweets feature alone
    • BotRGCN3 - using the numerical properties feature alone
    • BotRGCN4 - using the categorical properties feature alone
    • BotRGCN12 - using the description feature + the tweets feature
    • BotRGCN34 - using the numerical properties feature + the categorical properties feature
    • BotGCN - replace the RGCNConv layers with GCNConv layers
    • BotGAT - replace the RGCNConv layers with GATConv layers
    • BotRGCN_4layers - BotRGCN with 4 RGCNConv layers
    • BotRGCN_8layers - BotRGCN with 8 RGCNConv layers

The Pre-Processing is too slow!

This is a common issue since we did not present parallelized code in this repo since it's dependent on specific CPU/GPU/device configurations. You can parallelize it yourself or download our pre-processing results here at link.

The above zip file includes four generated embeddings

  • des_tensor.pt (user_description)

  • tweets_tensor.pt (user_tweets)

  • num_properties_tensor.pt (numerical_properties)

  • cat_properties_tensor.pt (categorical_properties)

    ( shape : [number_of_users , embedding_size] )

and edge_index.pt, edge_type.pt, label.pt

The Pre-Trained weight of BotRGCN on Twibot-22

To facilitate future research , we provide here the state_dict() of our BotRGCN model trained on Twibot-22.

botrgcn's People

Contributors

leopoldwhite avatar gabrielham avatar bunsenfeng avatar

Stargazers

 avatar Arwen avatar  avatar  avatar  avatar  avatar 石汶峰 avatar  avatar Zepher avatar  avatar  avatar Henry Zheng avatar shawn_dm avatar sweet avatar An Dang-Hieu avatar BigCat avatar  avatar wazapH avatar  avatar Xuan Zhang avatar WenqianZhang avatar Herun Wan avatar  avatar Xuehao avatar  avatar  avatar LzyFischer avatar  avatar Bruno Carlos Vieira avatar

Watchers

Kostas Georgiou avatar  avatar

botrgcn's Issues

问题

不好意思问下,为什么我这里显示引入不了数据集,一直显示'Twibot20' object has no attribute 'df_data_labeled'?
屏幕截图 2023-12-10 174612

预处理数据集

请问直接下载得到的您处理好的train.json, test.json等文件,这些文件是存在问题的吗? 是需要重新跑一些您发布(在TwiBot-22中)的预处理代码,自己重新处理一遍吗?因为我在跑对推特文本进行编码的代码处一直报错。

关于huggingface上的roberta-base模型下载

1b83fb05323dbd1ace362a47fa4da48 您好,运行时,这一步我出现错误 faba74373f8752d05b35aa5f7b8b6a0 显示无法连接到 Hugging Face Hub 我尝试官网中Fetch models and tokenizers to use offline的方法,其中两种方法仍显示无法连接,于是我想手动通过用户界面下载文件,通过搜索“roberta-base”,发现有很多模型,不知道应该下载哪一个 3433ca739e499f88d8638ef7d3e8570 是否能**提供模型的下载链接**,非常感谢!

Help, It seems the model inputs doesn't match the weight dimensions of the linear layer.

Hello, the author. Recently, I have been studying the use of RGCN to detect the robot, but when I was running this model, The error message indicates that a matrix shape mismatch has occurred. My understanding is that the model's inputs doesn't match the weight dimensions of the linear layer,I was wondering if I should set the num_prop dimension to (229580, 5) instead of (229580, 6).After trying many ways I still don't know how too fix this.could you teach me how to solve it? Thank you very much!
The error message is as follows:
File "main.py", line 57, in
train(epoch)
File "main.py", line 23, in train
output = model(des_tensor,tweets_tensor,num_prop,category_prop,edge_index,edge_type)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/111/BotRGCN-main/model.py", line 653, in forward
n = self.linear_relu_num_prop(num_prop)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (229580x6 and 5x32)

Something wrong with the pre-process data

  1. 预处理过的数据与论文里的feature_size不一样 还有把你给的num_properties_tensor.pt打印出来为什么是float类型,不是6维特征吗打印出来为啥是5维,

  2. cat_properties_tensor.pt 11维 打印出来为啥是3维

Data Preprocessing hardware requirements.

I tried to run through the data process codes with Google Colab Pro, yet even with the GPU and 24GB memories provided, I still ran out of memory constantly.

May I know what hardware sets are you guys using and if there is any possible solution on my issue ?

Solve the problem of missing description. npy files.

Hello, the author. Recently, I have been studying the use of RGCN to detect the robot, but when I was running this model, the description.npy file failed to be read, my understanding is that this file and its specific content are missing. could you teach me how to solve it? Thank you very much!

f59cb500db64329e9810065e3e66316

twibot-22 preprocessed files

Hello dear authors,

Would you have the preprocessed output files for the twibot-22 dataset? I have been struggling with reproducing results as I run out of memory during preprocess_2.py.

Thanks in advance for your help.

Request for pre-trained model

Hi authors, I'm currently doing some research in twitter bot detection for a course project. I came across your BotRGCN project from your Twibot-22 page, and got quite interested in it. Could I ask if you would kindly share the pre-trained model for BotRGCN on twibot-22? I just finished my own training and wish to have a reference using the pre-trained model if possible~

Thanks very much for your great work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.