GithubHelp home page GithubHelp logo

jj / literaturame Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 2.0 87.26 MB

Measuring progress in literature

License: GNU General Public License v3.0

Perl 2.48% R 4.98% TeX 92.47% Makefile 0.04% Shell 0.03%
yapc literature perl knitr complex complex-systems self-organized-criticality

literaturame's Introduction

literatura.me

Measuring progress in literature and in other creative endeavours, like programming. Preparing a paper/presentation for YAPC::EU 2016.

This repo and branch contain scripts to process repositories and generate time series of lines changed in commits, as well as, if it is a literary work that has been continously integrated, to extract the number of words changed. You need to use the Test::Text module in order to process it in this way.

Maybe you are looking for the YAPC::EU 2016 presentations

The talk on analyzing creativity, or progress when writing books, is in this repo and also published as a GitHub page by means of reveal.js. The lightning talk the next day, focusing on several famous Perl modules, Dancer2, Moose, Mojo and Catalyst, same repo (this one). Take a look at it for a shorter intro.

How to use this on your repo (or any other, for that matter)

Perl needs to be installed. Do it the usual way or, better yet, using perlbrew. I'll be using cpanm in the instructions, so that is needed too. If you use perlbrew, which you should, you will have both.

Scripts for processing repositories are contained in the appropriately named scripts repository. So

cd scripts
cpanm --installdeps .

And then, to run the script itself, you can cd to the repo you want analyzed and

/path/to/scripts/get-diffs.pl <glob including all files you want to analyze> 

The repository has to be downloaded to your drive. By default, you will analyze the current repo, but you can also analyze others:

./get-diffs.pl <glob including all files you want to analyze> <repo directory>

This will generate a .csv file with lines as preffix and a name related to the repo name and glob. This file will contain a single column with the size of the commit, with size being the maximum of lines added/deleted.

There is no rule to what the glob should include, other than you should try and include only files that have been typed by hand, not those automatically generated by, well, code generators or LICENSE files, that kind of thing. The whole point of this is to analyze coding patterns as reflected by commit sizes, so non-human files make no sense.

What to do with this file

cd to ../stats. You can plug the file name into the first lines of creativity.Rmd and, if you have R and knitr, generate the file from rstudio or directly from R using knitr. Please check the knitr size for how to do this, or directly share your file in a repo, tell me via twitter to @jjmerelo, and I'll do it for you. Of course, that file includes author and stuff, so if you want to change conclusions, author or whatever, feel free to do so, it has the same license as the whole repo.

You can also add a link to these results in data.md if you so desire. Take a look at CONTRIBUTING.md to do it properly (don't worry, just a minimal set of rules).

What kind of repos will be interesting

You will need a repo with more than a few hundred commits to have some real effect showing up. And by real effect I mean power laws, maybe pink noise, all that adding up to self-organized criticality. Which is kind of cool.

literaturame's People

Contributors

amorag avatar ccottap avatar jj avatar mariosky avatar pacastillo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

literaturame's Issues

Check this out

I have checked out IWANN and it admits papers on self-organizing stuff. This has not been published so far except as technical report. i have added you two to the ResearchGate project.
it's written in RMarkdown, it would have to be converted to Latex and then put in the LNCS format. I can do the initial conversion, but adaptation, checking stuff, converting to LNCS and adding maybe some other paper-y things would have to be done by you. @amorag @pacastillo.

Table 1

I suggest to swap Median and Mean columns: the mean should be next to the median but also to the column with the dispersion measure.

I suggest changing SD to SE, by dividing SD by sqrt(age). This way the effect of the sample size is factored out.

Typos and revisions rev#4

- the non-existence of a particular scale -> the absence of a particular scale?

- p1, col2, last line: is changes -> is changed.

- Section numbers are missing in the sentence that describes the organisation of the manuscript (p2, col1 just before State of the Art)

- Figure 1: the ticks are unreadable even when zooming. The authors may want to reduce the number of plots but make things readable. Also, the caption needs changing. commit number could suggest that it is the number of commits. But it isn't. That is the actual 'index' of the commit, i.e., 1 = 1st commit, i = ith commit.

- p3, col2, line 2-3. Authors refer to a repository hidden for double blind review. I suspect this is the remnant of a previous version of this paper. Also later in the manuscript, a github URL is prov

Pink noise

Some clarification on the analysis of the power spectrum should be included. The data shown is a linear loglog fit of coefficients found by fft, or a loglog fit of the square of these coefficients? The power carried by a certain frequency is proportional to the square of its amplitude, so it should be the latter. I suspect the fit shown is to raw coefficients, which would be consistent with the fact that the resulting slopes tend to be around -0.5.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.