GithubHelp home page GithubHelp logo

rafiqult1 / fakehealth Goto Github PK

View Code? Open in Web Editor NEW

This project forked from enyandai/fakehealth

0.0 0.0 0.0 17.08 MB

This repository (FakeHealth) is collected to address challenges in Fake Health News detection

Python 100.00%

fakehealth's Introduction

FakeHealth

FakeHealth repository is to supplement the paper "Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository". This repository (FakeHealth) is collected to address challenges in Fake Health News detection, which includes news contents, news reviews, social engagements and user network.

Overviews

Our repository consist of two datasets: HealthStory and HealthRelease. Due to the twitter policy of protecting user privacy, the fullcontents of user social engagements and network are not al-lowed to directly publish. Instead, we store the IDs of all social engagements and related user network into json files, and supplement them with a API to trivially attain the social engagements and user network from twitter. The IDs are stored in ./dataset/engagements/HealthRelease.json , ./dataset/engagements/HealthStory.json , ./dataset/user_network/followers/ , and ./dataset/user_network/following/. Due to the size limitation, the IDs of followers and following is uploaded to zenodo as version 2 of FakehHealth.

Requirements

  • twython==3.7.0
  • Developer APP of twitter to generate app_key,app_secret,oauth_token and oauth_token_secret

Running Code

  1. set the .\API\resources\tweet_keys_file.txt in the format of:

    app_key,app_secret,oauth_token,oauth_token_secret
    XXXXXX,XXXXXXX,XXXXXXXXX,XXXXXXXXXXXXX
    
  2. Build HealthStory:

    python main.py news_type=HealthStory sav_dir=../dataset
    
  3. Build HealthRelease:

    python main.py news_type=HealthRelease sav_dir=../dataset
    
  4. Build user network:

    1. Download the dataset/user_network/followers and dataset/user_network/followering from https://zenodo.org/record/3606756.

    2. (optional) Collect the followers and followerings profiles and save it into dataset/user_network/user_profiles:

      python crawl_friends_profiles.py sav_dir=../dataset
      

      Note that the number of friends are extremely large. We only recommend you crawl the friends profiles if it is necessary.

Data Format

The data provided here only cantain the The downloaded dataset will have the following folder structure,

  • content
    • HealthStory
      • <news_id>.json: a list of news contents wich include URL, Title, Key words, Tags, Image URL, Author and Publishing Date.
    • HealthRelease.json: ~
  • reviews
    • HealthStory.json: a list of news reviews which include Rating, news source,description, summary of the review, ground truth labels of the ten standard criteria, explanations of the criteria judgements and image link.
    • HealthRelease.json: ~
  • engagements
    • HealthStory
      • <news_id>
        • tweets
          • <ID>.json: The json file of the tweet object. The detailed attributes of tweet object is here.
          • ......
        • retweets
          • <ID>.json
          • ......
        • replies
          • <ID>.json
      • HealthRelase
        • ......
  • user_network
    • user_profiles
      • <user_name>.json: The json file of the user profile object. The detailed attributes of user profile object is here
      • ......
    • user_timelines
      • <user_name>.json: a list of tweet objects
      • ......
    • user_followers
      • <user_name>.josn: a list of user follower IDs (up to 200 per user)
      • ......
    • user_following
      • <user_name>.json: a list of user following IDs (up to 5000 per user)
      • ......

Refercences

If you use the FakeHealth datasets, please cite the following paper:

@article{dai2020ginger,
  title={Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository},
  author={Dai, Enyan and Sun, Yiwei and Wang, Suhang},
  journal={arXiv preprint arXiv:2002.00837},
  year={2020}
}

fakehealth's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.