GithubHelp home page GithubHelp logo

burgersmoke / enron-formality Goto Github PK

View Code? Open in Web Editor NEW
9.0 4.0 4.0 9.14 MB

Code and data from the paper "Email formality in the workplace: A case study on the Enron corpus"

Python 97.11% Shell 2.69% Batchfile 0.19%

enron-formality's Introduction

enron-formality


Results for 'Email Formality in the Workplace: A Case Study on the Enron Corpus' URL for paper http://aclweb.org/anthology-new/W/W11/W11-0711.pdf

Kelly Peterson [email protected]

Matt Hohensee [email protected]

Fei Xia [email protected]


Data references : ISI database : Retrieved December, 2010 from:


File Contents:

/README

Dearest reader, you are reading me as we speak

/annotations/

All of the human annotated files for formality and requests
NOTE : when an email ends in .txt, this is numbered by the 'mid' column in the ISI database.  
Otherwise, the filenames are the same as the originals from the CMU dataset

/formality/

    /100_emails_agreement
    
        Initially, 3 annotators ennotated 100 emails for formality annotation agreement
        
    /400_emails_training
    
        After agreement, one annotator had time to annotate another 300 emails totalling 400 so that the classifier could be trained
        
/requests/

    2 annotators annotated for the presence of a request.  
    In these files, a '1' right of the file name indicates a request, otherwise there was no request

/furcoat

Python scripts for generating the feature vectors for the formality classifier.  

/mysql/

/queries/

    /get_formality_and_requests_by_position.sql
    
        This query was used to derive the results in Table 6 of the paper
        
    /get_formality_by_rank_diff.sql
    
        This query was used to derive the results in Table 7 of the paper
        
    /requests_and_formality.sql
    
        This query was used to derive the results in Table 8 of the paper
        
    /get_recipient_count_and_formality.sql
    
        This query was used to derive the results in Table 9 of the paper
        
/tables/

    3 tables were added to the ISI database during our case study.  All of these can be used to JOIN against MySQL tables provided by ISI.
    No indexing is added to any of the columns in these .sql so please note that queries will run EXTREMELY slowly until indexing is added.  
    You will likely want to add an INDEX to the following columns : 'mid', 'Address' and 'Rank'
    
    /enron_formality.sql
    
        This table comprises the formality classification results in the column 'EffectiveLabel' (0=Empty, 1=Formal, 2=Informal)
        It also contains counts of the various features that the classifier extracted
        
    /enron_positions.sql
    
        A table created based on the positions in the ISI spreadsheet
        
    /enron_requests.sql
    
        This table comprises the requests classification results in the column 'RawLabel' (0=NonRequest, 1=Request)

/python/

For more information on running these scripts or reproducing these results, please contact Kelly Peterson

/contact_frequency/

    This script was used in combination with a CSV file exported from MySQL to derive the results in Table 5 of the paper
    
/personal_vs_business_formality/

    This script was used in conjunction with Mallet's output from running against the 
    University of Sheffield Personal vs. Business dataset to derive the results

/ scripts

Lots of random scripts for Wordnik informal word API scraping, preprocessing, data reporting, etc.

NOTE : It's been a LOOOOONG time since looking at these so please use at your own risk!

/enron_employee_positions/

Included here are positions and ranks from Diesner et al and some re-ranked positions that we performed in our work

enron-formality's People

Contributors

burgersmoke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

enron-formality's Issues

Your paper and contact

Hi, We planning to use the ranking you created. Is it possible to contact you on your email id ? Also, can you please mention your paper which we can use for our reference.

Thanks !
Rajesh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.