jj / literaturame Goto Github PK

Measuring progress in literature

License: GNU General Public License v3.0

Perl 2.48% R 4.98% TeX 92.47% Makefile 0.04% Shell 0.03%

yapc literature perl knitr complex complex-systems self-organized-criticality

literaturame's Introduction

literatura.me

Measuring progress in literature and in other creative endeavours, like programming. Preparing a paper/presentation for YAPC::EU 2016.

This repo and branch contain scripts to process repositories and generate time series of lines changed in commits, as well as, if it is a literary work that has been continously integrated, to extract the number of words changed. You need to use the Test::Text module in order to process it in this way.

How to use this on your repo (or any other, for that matter)

Perl needs to be installed. Do it the usual way or, better yet, using perlbrew. I'll be using cpanm in the instructions, so that is needed too. If you use perlbrew, which you should, you will have both.

Scripts for processing repositories are contained in the appropriately named scripts repository. So

cd scripts
cpanm --installdeps .

And then, to run the script itself, you can cd to the repo you want analyzed and

/path/to/scripts/get-diffs.pl <glob including all files you want to analyze>

The repository has to be downloaded to your drive. By default, you will analyze the current repo, but you can also analyze others:

./get-diffs.pl <glob including all files you want to analyze> <repo directory>

This will generate a .csv file with lines as preffix and a name related to the repo name and glob. This file will contain a single column with the size of the commit, with size being the maximum of lines added/deleted.

There is no rule to what the glob should include, other than you should try and include only files that have been typed by hand, not those automatically generated by, well, code generators or LICENSE files, that kind of thing. The whole point of this is to analyze coding patterns as reflected by commit sizes, so non-human files make no sense.

What to do with this file

cd to ../stats. You can plug the file name into the first lines of creativity.Rmd and, if you have R and knitr, generate the file from rstudio or directly from R using knitr. Please check the knitr size for how to do this, or directly share your file in a repo, tell me via twitter to @jjmerelo, and I'll do it for you. Of course, that file includes author and stuff, so if you want to change conclusions, author or whatever, feel free to do so, it has the same license as the whole repo.

You can also add a link to these results in data.md if you so desire. Take a look at CONTRIBUTING.md to do it properly (don't worry, just a minimal set of rules).

What kind of repos will be interesting

You will need a repo with more than a few hundred commits to have some real effect showing up. And by real effect I mean power laws, maybe pink noise, all that adding up to self-organized criticality. Which is kind of cool.

literaturame's People

Contributors

Stargazers

Watchers

Forkers

mariosky thebooort

literaturame's Issues

Add fitting lines for Zipf

In figure 3. Improve or maybe eliminate labels. They are just useless.

Justify fitting methods

Mainly regarding Zipf's law

Missing bib file

I can't compile: geneura.bib missing

Check this out

I have checked out IWANN and it admits papers on self-organizing stuff. This has not been published so far except as technical report. i have added you two to the ResearchGate project.
it's written in RMarkdown, it would have to be converted to Latex and then put in the LNCS format. I can do the initial conversion, but adaptation, checking stuff, converting to LNCS and adding maybe some other paper-y things would have to be done by you. @amorag @pacastillo.

Table 1

I suggest to swap Median and Mean columns: the mean should be next to the median but also to the column with the dispersion measure.

I suggest changing SD to SE, by dividing SD by sqrt(age). This way the effect of the sample size is factored out.

Power law adjustments should be done in some other way

Follow @thebooort advice after reading Newman's paper on how to fit.

Correct observations by GECCO reviewers

including "paper" instead of "project" in many cases.

Typos and revisions rev#4

- the non-existence of a particular scale -> the absence of a particular scale?

- p1, col2, last line: is changes -> is changed.

- Section numbers are missing in the sentence that describes the organisation of the manuscript (p2, col1 just before State of the Art)

- Figure 1: the ticks are unreadable even when zooming. The authors may want to reduce the number of plots but make things readable. Also, the caption needs changing. commit number could suggest that it is the number of commits. But it isn't. That is the actual 'index' of the commit, i.e., 1 = 1st commit, i = ith commit.

- p3, col2, line 2-3. Authors refer to a repository hidden for double blind review. I suspect this is the remnant of a previous version of this paper. Also later in the manuscript, a github URL is prov

Use new data files

Enter new data files.

Cut it down to 8 pages

Which is the maximum for ECAL.

Prepare for ECAL

Prepare for the ECAL change of format. https://project.inria.fr/ecal2017/

Please add authors

ECAL does not have a double-blind review process.

Pink noise

Some clarification on the analysis of the power spectrum should be included. The data shown is a linear loglog fit of coefficients found by fft, or a loglog fit of the square of these coefficients? The power carried by a certain frequency is proportional to the square of its amplitude, so it should be the latter. I suspect the fit shown is to raw coefficients, which would be consistent with the fact that the resulting slopes tend to be around -0.5.

Revise bibliography

I don't know if it's not correct or not.