GithubHelp home page GithubHelp logo

antoine-tran / diversify Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 4.1 MB

Automatically exported from code.google.com/p/diversify

License: Other

Shell 0.59% Perl 74.65% C 5.52% Java 19.22% TeX 0.01%

diversify's Introduction

Result Set Diversifier -- A Java framework for evaluating various
                          result set diversification algorithms.

Author: Scott Sanner ([email protected])


Description
===========

This code implements a Java framework for producing result
sets for queries on a corpus (currently given as a directory
of text files).  It is intended as a testing framework for
diversification algorithms, but may be used for general 
result list construction in information retrieval.

The code provides an implementation of Maximal Marginal Relevance 
(MMR) (Carbonell & Goldstein, SIGIR 1998) along with a variety of 
kernels:

  * Term Frequency (TF) Kernel
  
  * Term Frequency - Inverse Document Frequency (TF-IDF) Kernel
    n.b., currently uses non-log TF and log IDF, but variants
          could be used, c.f., 
          http://cseweb.ucsd.edu/~elkan/papers/spire05.pdf
  
  * BM25 Kernel
    c.f., http://research.microsoft.com/en-us/people/junxu/airs2010_rankusekernel.pdf

  * LDA Kernel
    n.b., essentially an LSI Kernel using LDA to derive topic-document distributions
    
  * Probabilistic Latent Set Relevance (PLSR) Kernel
    Preliminary work on the derivation of this kernel appeared as
     
      Probabilistic Latent Maximal Marginal Relevance, SIGIR 2010, 
      S. Guo and S. Sanner.

    Currently implemented kernel and derivation unpublished.
    

Basic Installation and Invocation
=================================

diversify/ provides the following subdirectories:

    src   All source code (.java files)
    bin   All binaries (.class files)
    lib   All 3rd party libraries (.jar files)
    files All supplementary files (i.e., data, results)

Always ensure that all .jar files in lib/ are included in your
CLASSPATH for both Java compilation and at Java runtime.  It is
recommended that you use Eclipse for Java development:

    http://www.eclipse.org/downloads/

In Eclipse the CLASSPATH libraries can be set via 

    Project -> Properties -> Java Build Path -> Libraries Tab

For running this code from a terminal, there are two scripts

    run     For Windows/Cygwin and UNIX/Linux systems
    run.bat For the Windows CMD prompt


Starting Point
==============

See class TestDiversity in the default package, which evaluates
a variety of MMR algorithms (with different kernels) w.r.t.
various queries on the news content in files/data.  The command
line parameters are as follows:

    *   arg 1: directory of files to rank
    *   arg 2: directory for output
    *   arg 3: query (enclose in 'single quotes')

The code exports results both to stdout and the output directory 
with filename constructed according to the query.

> Sample data:

Each directory in files/data contains 50 documents for news articles 
retrieved with the query given by the directory name.

> Examples:

From Eclipse, run TestDiversity with the following arguments, 
or use the following commands (substituting 'run' with 'run.bat' 
if working on a non-Cygwin Windows system).

    ./run TestDiversity files/data/Healthcare files/results 'health legislation' 
    ./run TestDiversity files/data/BP_Oil files/results 'legal charges' 
    ./run TestDiversity files/data/Barack_Obama/ files/results 'gun control lobby'

> Debugging:

Most classes (e.g., MMR and all Kernels) have a DEBUG flag, which 
if set to true will printout debug information as the code in that 
class executes.  For the LDAKernel and PLSRKernel, setting DEBUG=true 
will display the topic models for each document.


GraphViz Visualization
======================

To enable Java Graphviz visualization (e.g., agglomerative clustering):

- Download and install GraphViz on your system:
 
  http://www.graphviz.org/

- Make sure "dot" and "neato" (including ".exe" if running on Windows)
  are in your PATH, i.e., you can execute them from any home directory

Run graph.Graph.main() and verify that a cleanly formatted Java window
displaying a graph appears.

diversify's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.