GithubHelp home page GithubHelp logo

versotym / stichometry Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 20 KB

Stylometric analysis of poetic texts based on their versification

Python 100.00%
stylometry versification authorship-attribution

stichometry's Introduction

Parameters

lang :      'cs', 'de', 'es' ... 
  Language to process (subfolder in "pickle" folder)  

method :     'sticho', 'word', 'lemma', '3gram_t' ...
  Which data to use for attribution (file in pickle > lang folder)

Methods

Data filtering

reduce_features(filters)
Only for method == 'sticho'
Filter features (columns) which should be used for attribution. 
E.g. drop all statistics on rhyme, or leave on stress profile.
  filters:     conditions to filter features (format accepted by pandas .query method)
               default: None    

mfi(n)
Only for method != 'sticho'
Select how many most frequented items (words, lemmata, n-grams) will be analyzed.
  n:           int
               number of mfi
               default: 500    

reduce_sets(filters, n_min, remove_singles)
Filter datasets (rows) according to specified conditions.
  filters:         conditions to filter datasets (format accepted by pandas .query method)
                   default: None
  n_min:           int
                   minimum number of all features to keep dataset
                   default: 0
  remove_singles:  boolean
                   whether to drop datasets author of which is not author of any other dataset
                   default: True

Normalization

zscores()
Normalize data to z-scores across datasets.

Attribution

nearest_neighbour()
Classification by nearest neighbour (various distance metrics)

svm(multiclass, **kwargs)
Classification by support vector machine
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.svm.SVC (e.g. kernel, gamma...)
  
random_forest(multiclass, **kwargs)  
Classification by random forest
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.ensemble.randomForestClassifier 
                   (e.g. n_estimators, class_weight...)

Evaluation

evaluate()
Print evaluation of particulars methods that were applied

dendrograms()
Plot dendrograms (only if nearest_neighbour has been applied)

complete_results(pickle, filename)
Returns dictionary with complete results
  pickle:          boolean
                   whether to pickle dict into a file (stored in 'pickle' folder)
                   default: True
  filename:        specifies the name of a pickled file
                   default: method name (e.g. sticho, word...)

stichometry's People

Contributors

versotym avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.