This github repository corresponds dataset used for our research article titled An Exploratory Study of COVID-19 Misinformation on Twitter.
In our article, we present a synthesis of established work in social media analytics and new streams in the detection and mitigation of misinformation applied to one of the most challenging topics for societies (and, possibly also scientific research): the COVID-19 crisis.
We have used two datasets for our study. The first dataset are the tweets which have been mentioned by fact-checking websites and are classified as false or partially false and the second dataset consists of COVID-19 tweets collected from publicly available corpus TweetsCOV19 (January-April 2020)} and in-house crawling from May-July 2020. A detailed description of data collection process explain in section 3.1 of the paper.
We have shared the two datasets, one is sampled tweets from each day i.e, dataser II. Another is annotated tweet for misinformation(dataset I) it has around 1500 tweets in 4 different category. The format of data as follow
- tweet_id - unique ID of a Tweet
- tweet_class - class of the tweet defined by us.
Please cite the OSNEM paper:
@article{shahi2021exploratory,
title={An exploratory study of covid-19 misinformation on twitter},
author={Shahi, Gautam Kishore and Dirkson, Anne and Majchrzak, Tim A},
journal={Online Social Networks and Media},
volume={22},
pages={100104},
year={2021},
publisher={Elsevier}
}
For help or issues using data, please submit a GitHub issue.
For personal communication related to our work, please contact Gautam Kishore Shahi([email protected]
), Anne Dirkson([email protected]
) and Tim A. Majchrzak([email protected]
).
For more update on the related publication on the topic of FakeCovid, please visit https://gautamshahi.github.io/FakeCovid/