GithubHelp home page GithubHelp logo

edgi-govdata-archiving / archivers-harvesting-tools Goto Github PK

View Code? Open in Web Editor NEW
32.0 32.0 29.0 3.69 MB

ARCHIVED--Collection of scripts and code snippets for data harvesting after generating the zip starter

License: GNU General Public License v3.0

Python 93.63% Shell 0.46% Ruby 5.91%
archiving

archivers-harvesting-tools's People

Contributors

aniketaranake avatar artlogic avatar cabhishek avatar danielballan avatar dcwalk avatar dgkf avatar diafygi avatar elliot42 avatar mcraig10 avatar mhucka avatar titaniumbones avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archivers-harvesting-tools's Issues

AWS toolchain integration

The current tool chain for data harvesting doesn't have any hooks for working with AWS. For instance, running this tooling on an EC2 instance & plugging the data into S3 automagically.

Clean up README

Just noticing that the way we are documenting stuff is a little out of sync with all the wonderful contributions. We should do a pass for consistency across tools :)

Add a location for very short single-purpose scripts

I have someone who wrote a short script to harvest data from a specific page (at DOE, I think). The script is pretty specific to the particular page & purpose, so it (probably) doesn't merit a whole subdirectory + readme file and so on. At the same time, for the purposes of good reproducibility and documentation, it seems like such things ought to be kept somewhere, and it might serve as an example for others.

Perhaps there could be a subdirectory for such things? Something like "single-purpose-scripts" or "site-specific-scripts" or something like that? Each script could be placed as a single file there, and there could be a single readme file that contains a paragraph about each file in the directory.

What is the status of check-ia.py

check-ia.py is currently orphaned in the workflow repo (once we move these tools here, no other code will live in the repo). What should we do with it.

At the very least we should indicate that it is a tool for checking seeds, not for harvesting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.