GithubHelp home page GithubHelp logo

cadenceonefive / cleanknit Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 1.0 122 KB

Take Socrata datasets and intersect them to get clean normalized fragments

License: MIT License

Jupyter Notebook 4.23% Python 7.17% TSQL 1.10% XSLT 0.63% HTML 86.15% Shell 0.71%

cleanknit's Introduction

CleanKnit

Take Socrata datasets and intersect them to get clean normalized fragments

cleanknit's People

Contributors

bomeejung avatar phrrngtn avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

phrrngtn

cleanknit's Issues

demonstrate use of URL generation via SQL to download Socrata resources

-- demonstration of using SQLite database to code-generate the download URLs
-- for the Socrata metadata collection for the resources of each domaincreate table socrata_domain
(
    domain varchar primary key,
    resource_count integer not null
);
insert into socrata_domain
    (domain, resource_count)
values
    ('data.somervillema.gov', 54),
    ('data.lacity.org', 2819),
    ('data.cityofnewyork.us', 3461),
    ('data.sfgov.org', 1085);
​
SELECT 'curl "https://api.us.socrata.com/api/catalog/v1?domains=' || printf('%s&offset=0&limit=%d" --output %s.json', domain, resource_count,domain)
FROM socrata_domain
WHERE resource_count > 0;
-- curl "https://api.us.socrata.com/api/catalog/v1?domains=data.somervillema.gov&offset=0&limit=54" --output data.somervillema.gov.json
-- curl "https://api.us.socrata.com/api/catalog/v1?domains=data.lacity.org&offset=0&limit=2819" --output data.lacity.org.json
-- curl "https://api.us.socrata.com/api/catalog/v1?domains=data.cityofnewyork.us&offset=0&limit=3461" --output data.cityofnewyork.us.json
-- curl "https://api.us.socrata.com/api/catalog/v1?domains=data.sfgov.org&offset=0&limit=1085" --output data.sfgov.org.json

Excel UDFs for browsing Socrata

Provide some configuration mechanism to indicate the default domain for searches.

=SocrataSearch("pluto")
=SocrataResource(resource_id, column_list, 100)

Command line client for Socrata

small client with some command-line wrapper for doing stuff with Socrata such as:

  1. creating and maintaining a metadata database
  2. downloading/mirroring resources
  3. creating relational databases with subsets/samples of Socrata resource data

SQLite copies of the NYC socrata tabular

SQLite copies of the NYC socrata tabular datasets of interest that have been obtained by

  1. shredding the metadata
  2. code-generating the delimited file download URLs
  3. profile the data
  4. manually cleaned up schemata for a subset of the datasets
  5. manually written schema with foreign key constraints

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.