GithubHelp home page GithubHelp logo

modelsof's Introduction

modelsof

  1. python3 modelsof.py get_datases jop

Scrapes Dataverse for all articles. Produces out/jop/datasets.csv with title, href, date, description, keywords.

  1. python3 modelsof.py get_files jop

Scrapes Dataverse for all files associated with each article in datasets.csv. Produces out/jop/files.csv with title, href, date, filename, file_href.

  1. python3 modelsof.py get_downloads jop [2018]

Downloads all files with ext of .do .7z .7zip .gz .rar .tar .zip in files.csv optionally limited by year. Produces out/jop/downloads/{year}/{dataset}/{file}. Errors logged to out/jop/downloads/errors.csv with title, href, date, filename, file_href, error.

  1. python3 modelsof.py unzip jop

Recursively unzips all files with ext of .7z .7zip .gz .rar .tar .zip in downloads. Requires 7zip (p7zip-full and p7zip-rar on Ubuntu).

  1. python3 modelsof.py get_all_files jop

Union of files.csv and files in downloads. Produces out/jop/all_files.csv with file.

  1. python3 modelsof.py plot_files

Uses out/**/all_files.csv to produce distribution counts at out/files_dist.csv and out/files_by_datasets_dist.csv, then runs plots.R to produce out/files_dist.png and out/files_by_datasets_dist.png (whether a dataset contains a kind of file).

  1. python3 stata.py jop

Parses all .do files in out/jop/downloads and produces corresponding .do.json at out/jop/results/{year}/{dataset}/{file} as well as out/jop/files.json and out/jop/stats.json.

  1. python3 modelsof.py plot_commands

Uses out/**/stats.json to produce distribution counts at out/commands_dist.csv, then runs plots.R to produce out/commands_dist.png.

stats.json counts

Some prefix commands are run in isolation (not as a prefix). They are counted as len_prefix. Those prefix commands that are used as a prefix to another command are counted as len_prefix_as_prefix. The latter do not show up in overall counts (len).

The first item is a count of regression commands in all files. Given two commands:

svy: reg ...
reg ...

the count will be:

svy:reg = 1
reg = 1

The remaining items (counts per file) count prefix and "command" (regression or otherwise) separately except for the 'regressions' key, which works the same as the previous section.

errors

Some files have syntax errors. In the case of missing closing delimiters, they are closed. In the case of missing closing */, the comment is assumed to extend to the end of the file.

modelsof's People

Contributors

aaron-lebo avatar vjdorazio avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.