GithubHelp home page GithubHelp logo

menegolli / kdem Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mengtingwan/kdem

1.0 0.0 0.0 17.44 MB

This repository includes data and code for the algorithm of Kernel Density Estimation from Multiple Sources (KDEm) proposed in a KDD'16 paper

Python 100.00%

kdem's Introduction

I'm very sorry that these files are significantly out-of-date!!

I'll try to maintain if I have any bandwidth, but would strongly recommend you to consult our original paper, or many other state-of-the-art open-sourced truth finding algorithms.

This repository includes data and code for following paper:

Mengting Wan, Xiangyu Chen, Lance Kaplan, Jiawei Han, Jing Gao, Bo Zhao, "From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach", in Proc. of 2016 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD'16), San Francisco, CA, Aug. 2016

Specifically, the core algorithm KDEm: Kernel Density Estimation from Multiple Sources is implemented in KDEm.py.

Other baseline methods are implemented based on following papers:

  • TruthFinder (TruthFinder.py): Xiaoxin Yin, Jiawei Han, and Philip S. Yu, "Truth Discovery with Multiple Conflicting Information Providers onthe Web", in Proc. 2007 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'07), San Jose, CA, Aug. 2007.
  • AccuSim (Accu.py): Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. "Integrating conflicting data: the role of source dependence." in Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09), Lyon, France, Aug. 2009.
  • GTM (GTM.py): Bo Zhao and Jiawei Han, "A Probabilistic Model for Estimating Real-Valued Truth from Conflicting Sources", in Proc. of 10th Int. Workshop on Quality in Databases, in conjunction with VLDB 2012 (QDB'12), Istanbul, Turkey, Aug. 2012.
  • CRH (CRH.py): Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han, "Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation", in Proc. of 2014 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'14), Snowbird, UT, June 2014.
  • CATD (CATD.py): Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han, "A Confidence-Aware Approach for Truth Discovery on Long-Tail Data", PVLDB 8(4): 425-436, 2015 Also, in Proc. 2015 Int. Conf. on Very Large Data Bases (VLDB'15), Kohala Coast, Hawaii, Sept. 2015.

If you have any questions, feel free to contact me at [email protected]

To run experiments on synthetic datasets -- Synthetic(unimodal) and Synthetic(mix), you can directly type in

     "python test.py synuni" or "python test.py synmix"

Notice that experiments may run for a while.

To run experiments on real-world datasets -- Population(outlier) and Tripadvisor, you must download the Population(outlier) data from this link and Tripadvisor data from this link.

For Population(outlier), please put two files "popAnswerOut.txt" and "pupTuples.txt" in the folder "./data_pop/". For Tripadvisor, please put all the hotel review files "hotel_?????.dat" in the folder "./data_tripadvisor/Review_Texts". Then you can type in

     "python test.py realpop" or "python test.py realtrip"

If you tpye in "python test.py" in the terminal, you will run the experiments on the default datasets -- synthetic(unimodal). Then you can open the folder "./measure_syn" to see the results.

kdem's People

Contributors

mengtingwan avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.