GithubHelp home page GithubHelp logo

#govpack govpack is a tool to help download and explore CKAN datasets

###made for GovHack oz 2014

added DEMO at http://govpack.github.io/govpack/#68 or #1,#2,,,#100 etc (it's a copy of a.htm added at /gh-pages/index.html) which uses the big list of sites from http://instances.ckan.org/ the DEMO shows the hundred or so CKAN endpoints, BUT some of those are on the older v1, v2 'api/rest/dataset', 'api/2/rest/dataset' api's and not and not the latest v3 'api/3/action/' as fetched by govpack http://docs.ckan.org/en/ckan-1.8/apiv3.html -- so some don't show up

http://hackerspace.govhack.org/content/npm-install-g-govpack-or-github-govpackgovpack

also available on https://www.npmjs.org/package/govpack

npm install -f -g govpack
status and fixes required detailed below (plenty new bugs added just now)...

image

govpack is a command line tool (and pool noodle) that seeks out the metadata for ALL available data sets on a given CKAN endpoint, namely.... (X=0|1|2)

0 http://demo.ckan.org/api/3/action/current_package_list_with_resources

1 https://data.qld.gov.au/3/action/current_package_list_with_resources

2 https://data.gov.au/api/3/action/current_package_list_with_resources

CLI Usage:

 govpack {fetch:X} --> makes X.js module.exports=BigPackageList
 govpack {filter:X} --> makes X.txt filtered JSONP IIII(filtered_csv_metadata)
 govpack {download:X} --> downloads ./CSV/1.csv, ./CSV/2.csv,,, ./CSV/111.csv

the commands need to be run in that order because they depend on the previous result results are saved in the same folder as index.js ie in your global "./node_modules/govpack/index.js" folder downloaded "node_modules/govpack/format/1...n.format" files match up with the metadata in X.txt

###Output paths suck majorly note: result paths will dump files to random paths as determined by chaso monkey + "node_modules/govpack/X/format/1...n.format" and will never have an option to put the results in a directory of your choice, which will be tidier and better for more ckans, and less lolcats etc. With the X moved up to directory level, X.js and X.txt will have a common name like a.txt and b.txt for each.

###From your node code:

GP=require('govpack');
GP({fetch:0},function(){console.log('Done!!')})
.....which returns....->
Please be patient while we fetch from API#0
Downloading:
http://demo.ckan.org/api/3/action/current_package_list_with_resources
SavingAs:
C:/A/N/node_modules/govpack/0.js

####{filter:X ,format:'XYZ'} As an option you may wish to set the format for the filter step to filter for some other filetype

govpack {filter:0 ,format:'KML'}
txt|xlsx|jpg|json|html|png|pdf|xls|cvs|gif|xml|
rdf|hdf5|kml|pptx|docx|doc|odp|dat|jar|zip|shp|etc

would all be okay format:'XYZ' (case insensitive) values to try but by far CSV is the most popular default.

#a.htm a.htm (shown in the image above) is the page that uses the JSONP 0.txt, and displays the filtered JSONP metadata generated by govpack from the CKAN records, namely...

  • links to the actual CSV files, (right click and choose Save File As)
  • CSV file size [where available]
  • table heading/description
  • field names (hased and colourized so all of the same fields light up in the same color)
  • field types
  • column and row counts

a.htm should be useful to look at,as a sample of the final ouput. I wanted to do search and autocomplete on the field names, this is now possible :-) also CKAN has many GET verbs (including one that does SQL queries) so with our refined JSONP metatata one could genarate other ajax calls, from a web page, to open up the data even further.

###With the power of X (a simple integer as the primary key) more CKAN's can be added

 govpack {filter:X,format:'XLS'}

presently in the source code they are referenced at:

CK[0]={url:'http://demo.ckan.org/api/3/action/'}  // the demo data set as used by the CKAN docs
CK[1]={url:'https://data.qld.gov.au/api/3/action/'} //the state catalog of datasets
CK[2]={url:'https://data.gov.au/api/3/action/'}    //the national catalog of datasets 
CK[99]={url:'https://some_CKAN_action_endpoint/'} // ie ADD some more
// this CK[] array will probably end up in a seperate config file

objectified so we can describe them further and add more

NOW #2 (data.gov.au) is big and FAILS as a single request

 the code has some in progress (INCOMPLETE) calls 
 to fetch it as several pagenated sub requests (todo)
 namely GetBiggerList(x,cb){/*conglomerate page-enated package lists*/}

at one stage npm was not making the corect govpack.cmd or shell script

but as someone kindly pointed out the following 2 fixes worked!!

 1) "bin": {"govpack": "index.js"}, /*add to your package.json*/
 2)  and Add 
         #!/usr/bin/env node  
         to the top of your index.js file
funnily enough the shebang is useful on windows!!


"C:/A/N/node.exe" "C:/A/B/2/9/Ax/20/index.js" {fetch:1}
(works for me) but govpack {fetch:1} is better since your paths will vary

index.js has code that should make govpack to work as both a Command Line tool AND a module

if(require.main === module){/*Use from the CommandLine*/}
else{module.exports=init/*work as a module*/}

####Finally (get me the data) after having run govpack {fetch:0} and govpack {filter:0} you may also call

govpack {download:0} 

to download the filtered CSV file set from to disc

###more endpoints/fixes and addtions are wecolme

CSV Tables Are Cool
but what's? inside $1600
col 2 is ???????? $12
zebra stripes are neat $1

now we know

email to [email protected]

govpack's Projects

govpack icon govpack

{fetch:X}, {filter:X}, and {download:X} datasets from CKAN endpoint X as a nodejs CLI tool and module

rq icon rq

Better living through asynchronicity

widgets icon widgets

a standard story for reuse may remain a futile goal, the constant need/desire to re-invent is infinite as is the domain

wordfish icon wordfish

Fish for words from the current page. Then click to match terms with their definitions -- a brain Training game (chrome extension).

worldckans icon worldckans

Explore and download resources from global CKANs (using node and an elctron app)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.