atrisovic / dataverse-r-study Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 2.0 8.46 MB

Data and code for a large-scale study on research code quality and execution at Harvard Dataverse.

License: MIT License

Jupyter Notebook 99.31% Python 0.62% Dockerfile 0.02% R 0.04% Shell 0.02%

code-quality documentation r-language r-programming reproducibility

dataverse-r-study's Issues

Incorporate suggestions 3,4,5 as an example

Cannot install requirements on Debian 10, with Python 2.7

git clone https://github.com/atrisovic/dataverse-r-study
cd dataverse-r-study/
pip2.7 install -r requirements.txt

  Downloading ipykernel-4.10.1-py2-none-any.whl (109 kB)
     |################################| 109 kB 21.0 MB/s
ERROR: Could not find a version that satisfies the requirement ipython==7.16.3 (from -r requirements.txt (line 25)) (from versions: 0.10, 0.10.1, 0.10.2, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.13.2, 1.0.0, 1.1.0, 1.2.0, 1.2.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 4.0.0b1, 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.1.0rc1, 4.1.0rc2, 4.1.0, 4.1.1, 4.1.2, 4.2.0, 4.2.1, 5.0.0b1, 5.0.0b2, 5.0.0b3, 5.0.0b4, 5.0.0rc1, 5.0.0, 5.1.0, 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.4.0, 5.4.1, 5.5.0, 5.6.0, 5.7.0, 5.8.0, 5.9.0, 5.10.0)
ERROR: No matching distribution found for ipython==7.16.3 (from -r requirements.txt (line 25))

Add to READMEs

Paper draft

Overleaf:

https://www.overleaf.com/3579224233hcjdwrssrwbq

Congratulations to the article - questions

Excellent work 😄

Comments and questions, I hope you don't mind me misusing the repo for that. Happy to take the conversation elsewhere if you prefer (email?).

The installation of missing packages could also fail when you use an old R version today but CRAN does not have the package for that version, or you use a new R version and R only has it working for an older release. Did you consider using MRAN checkpoints matching the R version release window?
Have you considered using R package that can parse R files for loaded packages? automagic and similar stuff?
Why Python code to analyse R code - personal preference, or did you find that useful somehow?

(Might add more questions as I digest... again, cool work!)

Plots

Replication package = dataset = data & code

code-stats

Lines of code per comment
Lines of code per dependency
Lines of code per function (modularity)
Lines of code per 'test' (testing)
File encoding

overview

Histogram of dataset sizes
Histogram of number of files per dataset
Histogram of file name lenghts
Pie chart - file name contains space?
Pie chart - dataset contains documentation?
Pie chart - dataset contains other code?
Pie chart - dataset contains testing script?
Pie chart - dataset contains R markdown?

exe-rates

exe rate before cleaning
exe rate after cleaning
exe rate per R version
aggrigated results per package

exe-stats

exe rate per year of publishing
exe rate per field of study
exe rate per publisher
exe rate per dependency count

Using `require()` is a bad idea, and so is `install.packages()`

Because they do not stop on errors, that is why they are "best performing". You are basically ignoring errors:

{ require("foobar") ; install.packages(tempfile()); message("\nNOT GOOD!!!\n") }

See how "NOT GOOD" is printed here:

Loading required package: foobar
Installing package into ‘/Users/gaborcsardi/Library/R/arm64/4.2/library’
(as ‘lib’ is unspecified)

NOT GOOD!!!

Warning messages:
1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘foobar’
2: package ‘/var/folders/ph/fpcmzfd16rgbbk8mxvy9m2_h0000gn/T//Rtmph0Pd7A/file12c3c78c562f5’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

OTOH you could argue that if the script finishes anyway, then those packages were not really needed in the first place....

atrisovic / dataverse-r-study Goto Github PK

dataverse-r-study's People

Contributors

Stargazers

Watchers

Forkers

dataverse-r-study's Issues

code-stats

overview

exe-rates

exe-stats

Recommend Projects

Recommend Topics

Recommend Org

Jobs