GithubHelp home page GithubHelp logo

rcat-v1.1's Introduction

rCAT

This page describes the tool for the relational character analysis tool. The purpose of rCAT is to extract relations of entities in a text and generate a pdf report outlining these relations. For more details about the implementation and citation see the abstract: Barth, Florian and Kim, Evgeny and Murr, Sandra and Klinger, Roman (2018). "A Reporting Tool for Relational Visualization and Analysis of Character Mentions in Literature". Abstract presented at DHd 2018 Conference. Köln, Germany.

Installation

You will need several dependencies to make rCAT working. Follow the installation instructions from the relevant pages:

Make sure that you add treetagger to your PATH, otherwise the program will not work. If you still get treetagger-related errors, you may want to hardcode the installation directory of treetagger in the source file (treetaggerwrapper.py).

In addition, there are other dependecies as well as Python wrappers that should also be installed via "pip" command:

$ pip install pylatex
$ pip install numpy
$ pip install nltk
$ pip install graphviz
$ pip install wordcloud 
$ pip install treetaggerwrapper

If you use Anaconda python distribution, you may skip numpy and nltk as they are preinstalled. The next step is to get necessary libraries from NLTK package. In terminal, type:

$ python
A python environment will open. 
$ Python 3.6.1 |Anaconda custom (x86_64)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()

A window opens: select "popular packages" and download them. Close the window after the download process is finished.

Clone this repository, navigate to the directory "flask_app" within "web-rcat" folder. Then start the program as:

$ python flask_form.py

You will see the following status:

 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Open the url in your browser. This will lead you to the main page of the tool. Now you can start working with it!

Working with rCAT

The web interface has the following fields:

Select text file:

Click Choose file to select the text you want to analyze. The file should be in a plain text format.

Select character file:

Click Choose file to select the text with character names.

The file with character names should be formatted as follows: each line starts with a canonical name for a single character. Separated by tab are aliases of this character. Each character and list of his/her aliases should be entered on separate lines.

Specify relation distance:

This should be an integer. How many words between mentions of two characters are considered as proximity. Default is 10.

Specify words before :

This should be an integer. How many words before the mention of the first character to include into the contextual analysis. Default is 8.

Specify words after:

This should be an integer. How many words after the mention of the second character to include into the contextual analysis. Default is 8.

Remove stop words (y/n)?:

Remove stop words from contextual analysis or no. Default Yes.

Lemmatization

This option will lemmatize each word in the text (cast it to its base form). This option is especially usefull when working with relatively short texts.

Word clouds to show

Parameter that defines how many word clouds will be generated. This should be an integer. This option will show only n-top word clouds for each character and character pair. Default is 5.

Text language

German, English

Segments:

This should be an integer. Number of segments into which the book should be splitted to track the word field development of the story. Default is 10.

Analyze with word fields:

There are two ways in which you can provide word fields.

Single category: One plain text file with one word per line. The tool will then use this words to characterize characters, relations between characters, and plot the development of these word field in a single plot. Multi-category: Multiple files structured as described above. Files names correspond to the categories of the word fields. The tool will plot the development of these word fields in multiple plots. Warning: multi-category word clouds are not currently supported.

Filter word clouds by:

You can filter the words appearing the word cloud either by the most freq words or by pointwise mutual information.

Choose the amount of words in the word clouds:

How many words should each word cloud consist of.

Run:

Run the program.

The program will analyze the text and generates a pdf report that you can download by clicking Download on a page you are redireted.

rcat-v1.1's People

Contributors

kimikadze avatar

Stargazers

Federico Pianzola avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.