GithubHelp home page GithubHelp logo

cofacts / opendata Goto Github PK

View Code? Open in Web Editor NEW
45.0 8.0 14.0 426.25 MB

Open data of Cofacts collaborative fact-checking database

License: MIT License

JavaScript 97.49% Shell 2.51%
open-data csv fact-checking crowdsourcing

opendata's People

Contributors

eddiechocho avatar mrorz avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendata's Issues

Make dataset ready for huggingface

TODO

  • Fix CSV file line ending
  • Add YAML config Update README: Include the use of [datafiles] to stop HF from doing automatic split
  • Update README: include joins, outputs results. Scenarios:
    • Topic classification: articles + categories
    • Predicting crowd label (rumor / not-rumor / opinionated) of a text: articles + article-replies
    • Q&A dataset: articles + article-replies + replies

some fields are not properly double-quoted in csv

e.g.
in https://github.com/cofacts/opendata/blob/master/data/articles.csv.zip

  1. text field not double-quoted but contain newline char
    was:
AV6yLoz4yCdS-nWhuglc,LINE,dd6282f4a3d6da926bd62131faf68862840927cacb182cb93c6621950413e38a,,1,RUMORS_LINE_BOT,牛奶到底怎麼來的?請看看牛奶背後的醜陋真相,人類的確太險惡了 !! 以後還是少喝牛奶吧 !!!
https://youtu.be/S9gKQwmNq9M,[object Object],2017-09-24T04:39:08.790Z,2017-09-24T04:39:08.790Z,2017-09-24T04:39:08.806Z

shoud be:

AV6yLoz4yCdS-nWhuglc,LINE,dd6282f4a3d6da926bd62131faf68862840927cacb182cb93c6621950413e38a,,1,RUMORS_LINE_BOT,"牛奶到底怎麼來的?請看看牛奶背後的醜陋真相,人類的確太險惡了 !! 以後還是少喝牛奶吧 !!!
https://youtu.be/S9gKQwmNq9M",[object Object],2017-09-24T04:39:08.790Z,2017-09-24T04:39:08.790Z,2017-09-24T04:39:08.806Z
  1. text field not double-quoted but contain newline char
29qaa5b669uhd,LINE,c32056853587055102d4e2e580887836fbe440821d86d935856399092919f935,,1,RUMORS_LINE_BOT,「2003年民進黨廢止政務官18%」2004年(93年)又公布「政務人員退職撫卹條例
」讓政務官再領18%,現在立法院民進黨絕對多數,就把政務官退職撫卹條例第十條修正,讓政務官18%成為歷史,徹底貫徹民進黨的目標吧!,,2018-07-08T14:57:21.134Z,2018-07-08T14:57:21.134Z,2018-07-08T14:57:21.168Z
  1. text field not double-quoted but contain newline char
5592851235519-rumor,LINE,,,1,BOT_LEGACY,中研院努力了8年才完成的
排毒最強的食物…依序

發佈日期:
2015年11月16日

1.地瓜2.綠豆
3.燕麥 4. 薏仁
5.小米6.糙米
7.紅豆8.胡蘿蔔
9.山藥10.牛蒡
11.蘆筍
12.洋蔥13. 蓮藕
14.白蘿蔔
15.山茼蒿
(裂葉茼蒿)
16.地瓜葉
17.蘿蔔葉
18.川七
19.優格20.醋

記得儲存,
也請轉發您的親朋好友們,,2017-02-03T02:11:00.000Z,2017-02-03T02:11:00.000Z,2017-07-03T02:58:45.899Z
  1. text field not double-quoted but contain newline char
36sxgyfxuokjr,LINE,f810578040dea180f13043650c24a59e641b7318fdb1cf3ecd3e50a752123280,,1,RUMORS_LINE_BOT,有点意思,看看有益😁
    現在地球的人口,有 70多億,虽然統計学能告訴我們人口的數量、分布、種族等信息,但因為數量实在太大,所以单纯的学术性統計报告,绝大多數人來說其实沒什么意义!

    因此,有人制作了这组有趣的统计报告,把世界上的 70亿人想像成 100人,然后各种百分比的统计资料,看起來就有点意思了!

统计资料看起來会是这样

11人在欧洲,
5人在北美洲,
9人在南美洲,
15人在非洲,
60人在亞洲!

49人生活在乡下,
51人生活在城市。

12人讲中文!
5人讲西班牙語,
5人讲英語,
3人讲阿拉伯語,
3人讲印度語,
3人讲孟加拉國語,
3人讲葡萄牙語,
2人讲俄羅斯語,
2人讲日語,
还有62人各讲一种語言。

77人有自己的住房,
23人沒有居住地方。

21人营养过盛,
63人能吃饱饭,
15人营养不夠,
1人吃了上頓沒下頓。
48人每天的生活費,
不到10元人民币。

87人有干净的饮用水,
13人缺水或水源污染!

75个人有手機,
25个人沒有。

30人能上网,
70人沒有条件上网。

7人能接受大学教育,
93人沒有上过大学。

83人能识字,
17人是文盲。

33人是基督徒,
22人是穆斯林,
14人是印度教徒,
7人是佛教徒,
12人信仰其他宗教,
还有12人沒有宗教信仰。

26人活不到 14岁,
66人介于15-64岁死亡,
8人超过 65岁。

男人有50个,女人有50个

看完这组数据,假如你有自己的住房,能吃饱饭,能喝上干净的水,有手机能上网,上过大学,你还有什么理由抱怨?

世界上 100个人中,能活过或超過过65岁的,只有 8人

如果你已经超过65岁了,知足吧!感恩吧!珍惜生活,把握当下吧!你没有向92人那样在64岁之前离开,已经是人类中的佼佼者了!😘😘,,2019-01-08T13:19:18.650Z,2019-01-08T13:19:18.650Z,2019-01-08T13:19:18.687Z

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.