GithubHelp home page GithubHelp logo

jsennett / top-k-insights Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 80.97 MB

Reproducing results from "Extracting Top-K Insights from Multidimensional Data"

License: MIT License

Python 6.13% Jupyter Notebook 93.65% Shell 0.22%

top-k-insights's Introduction

top-k-insights

Reproducing results from "Extracting Top-K Insights from Multidimensional Data"

Joshua Sennett -- 645 DBMS Final Project -- UMass Amherst

Instructions

  1. Install dependencies using pip. This project was developed using Python 3.6.1 and only a few common libraries. (You may have to use pip3 instead of pip).
pip install --user -r requirements.txt
  1. Extract insights from the 'papers' or 'collaborators' DBLP dataset by running the top_k_insights/analyze_dblp.py script with input arguments. This will print output to the console, and more verbose logs will be created in the log/ directory.

Examples

# Top-10 depth-2 insights from DBLP papers 
python3 ./top_k_insights/analyze_dblp.py papers 2 10

# Top-10 depth-2 insights from DBLP collaborators 
python3 ./top_k_insights/analyze_dblp.py collaborators 2 10

# Top-10 depth-1 insights from DBLP papers 
python3 ./top_k_insights/analyze_dblp.py papers 1 10

# Top-10 depth-1 insights from DBLP collaborators 
python3 ./top_k_insights/analyze_dblp.py collaborators 1 10

Project Layout

top_k_insights/ contains source code for insight extraction

top_k_insights/insight_extractor.py contains the insight extraction engine, including the InsightExtractor class

top_k_insights/significance_tests.py contains the point and trend significance functions

top_k_insights/analyze_dblp.py is a command-line program you can use to extract insights from the DBLP dataset

tests/ Unit tests of significance functions are tested here, and can be run using the command pytest, if pytest is installed.

log/ Log files with timestamped filenames will be created here each time ./top_k_insights/analyze_dblp.py is called.

data/ The datasets all-papers.csv and all-paperauths.csv are expected to be here.

report/final-report.pdf Final Report

report/notebooks/ Jupyter Notebooks containing analysis and figures highlighted in the final report.

top-k-insights's People

Stargazers

胡雪亮 avatar

Watchers

Josh Sennett avatar

Forkers

mananjpatel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.