Light

yingqichao / fnd-bootstrap Goto Github PK

View Code? Open in Web Editor NEW

48.0 2.0 4.0 390 KB

Python 99.38% Shell 0.62%

fnd-bootstrap's Introduction

Bootstrapping Your Own Representations for Fake News Detection

This repo is built upon "Masked Autoencoders: A PyTorch Implementation"

Data Preparation. You need to prepare the data using the scripts in ./data_prepare. We only support Weibo/Weibo-21/GossipCop so far, and the data should be downloaded exactly from the following sources. Weibo/Weibo-21: send email to Dr Qiong Nan. GossipCop: send email to the original authors of GossipCop (say sorry to Dr Singhal for my previous wrong direction and the caused confusion and borthering). They will kindly help (according to our experience)
After you process the data, run the .sh scripts for training or testing.
We alternatively provide an alternative in the network design in ./models/UAMFDv2_Net.py, where the differences are trivial: 1) we replaced ELU with SimpleGate where the tensors are split into two halves and the second half is used for reweighing the first half, which also ensures non-linearlity. 2) We use AdaIn to control the mean and std of the refined representations, where the original reweighing MLPs are therefore replaced. Note that if you wish to exactly implement the network design reported in the paper, use UAMFD_Net instead of UAMFDv2_Net, though the latter will be slightly even better according to our later tests.

Pre-training

The pre-training models of MAE can be downloaded from "Masked Autoencoders: A PyTorch Implementation".

Because of the restriction on upload size, we are unable to upload pretrained models and the processed data. We will further open-source them on GitHub after the anonymous reviewing process.

License

We have been granted permisson to use Weibo/Weibo-21/GossipCop datasets for academic studies only.

fnd-bootstrap's People

Stargazers

Watchers

Forkers

jbk-xiao amgods anshiquanshu66 nexus-liu

fnd-bootstrap's Issues

关于twitter数据集

感谢您的工作！对于在twitter数据集上我有几个问题，首先是在Twitter_dataset.py文件以下两个路径的文件我不知道是如何得到的：
train_path = root_path + '/train_twitters.xlsx
test_path = root_path+'/test_datasets.xlsx'

然后是process_data_twitter.py中的“Data/twitter-merged/”文件和 path = "/home/groupshare/mae-main/ztrain_en.xlsx"中的文件，

最后请问能不能直接上传root_path中Twitter文件，我想它应该包含我所需要的所有文件，非常感谢！！！

数据标签

作者你好，我想问一下这篇论文单视图的标签是怎么定ground-truth的，是利用文中说得阈值还是一整条信息的标签。

MAE pre-train

请问MAE预训练是在模型之外训练好之后，再插入mae_pretrain_vit_base.pth到模型中吗？
请问MAE预训练使用的是实验所用数据集吗？

关于验证集与测试集

您好，感谢您出色的工作！我注意到您将数据集划分为训练集与测试集，当模型验证时，直接加载了测试集进行验证，而没有单独的测试集，请问论文中的结果是如何得到的？是多次训练对最高的准确率进行平均吗？

关于iMMoE

1.在使用iMMoE细化特征ris时，生成了eis0和eis1,请eis0和eis1有什么区别吗，它们在经过iMMoE时，是从两个Gate输出的吗,如果是，那它们分别从不同Gate经过是由什么依据，标准是什么？
2.在使用iMMoE细化融合特征[eis1,et1]以及自举阶段引导细化多视图表示[wis; wip; wm; wx; wt]时特征表示只需要经过一个Gate网络即可吗？
3.iMMoE中的三个专家网络负责的部分有何区别？
希望得到你的解答，十分感谢。

数据集

请问可以提供一下预训练模型和处理后的数据集吗？

请问具体要怎么处理数据集呢

比如我有了weibo21的原始数据集，用您提供的代码里的哪些文件来处理呢

gossip数据集

我想问一下你的gossip怎么处理数据长度的，bert模型根本处理不了那么长的文本，你是选用了固定长度的文本进行截断处理还是将文本分成多个段处理的，非常感谢解答。

您好，能再分享下数据集吗

之前看您在issue里面的百度云分享过期了，能再分享一次吗

直接进行假新闻检测应该运行哪些代码？

作者你好，如果我想直接运行进行假新闻检测的测试。请问应该使用哪些代码呢？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs