timwis / jkan Goto Github PK

View Code? Open in Web Editor NEW

214.0 18.0 309.0 6.82 MB

A lightweight, backend-free open data portal, powered by Jekyll

Home Page: https://jkan.io

License: MIT License

HTML 62.42% Ruby 0.48% JavaScript 31.48% CSS 4.24% Dockerfile 1.39%

jkan's Introduction

JKAN

A lightweight, backend-free open data portal, powered by Jekyll

Open-source data portals can be really hard to install and maintain. But their basic purpose of providing links to download data really isn't that complicated. JKAN is a proof-of-concept that allows a small, resource-strapped government agency to stand-up an open data portal by simply clicking the fork button.

Demo site

Documentation

jkan's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger jjediny eeeschwartz mheadd chrismetcalf iradche sakano-y0003 daisuke201 iltempe hkwi abola odijkan donymuslimin gianni-di-noia oliryde sacmoie mcdougs bouldercounty pezholio zmon civichelper uniopen digideskio alexrollin nubic patcon biellacasadivetro marks coraleigh amaliebarras ll911 ispadley chabrowa kevinwheeler jayserdny chriscardoz odtw davidread drzax gbooze mfarrow wilsaj idlweb taxfoundation zakipatel picomiles siciliahub rsalas82 ldguzman jpmeade mrmaksimize arnaudvedy ninproj acouch ddy1234 mariomaurer mixxmac feng-gao 4pplefish collabolabs basharag laurendamato cityofsyracuse tmmv stevieflow cosmogeo shua123 enterstudio hermt54 vincent-lyu carlvlewis chadreimers tobinbradley cforlando akozlik opensavannah lxyu0405 openstl tkleykamp jrodrigue766 attarca sportsbitenews derekcaelin ryanvco34 openknoxville randallsquared edinburgopendata miscmoon dskott123 abbeymugisha cwarecsiro akridge sabgaby northernjamie xanimo odiaberdeen stephensonneil75 arif29 rvcagis willemvanopstal

jkan's Issues

Move categories to _config.yml

Edit them via administration page

Add UUID to core schema

With the addition of being able to swap schema(s) thanks to @timwis #60 ...

There is still one critical field missing in the common/shared JKAN/CKAN schema. Currently there is no unique identifier used (i.e. a Global - GUID or Universal - UUID which are two names for the same thing), without the use of one; machine re-association or human cross-referencing of a dataset is subject to a broken links/reference in the future and/or duplicative entries under two names.

To avoid this and to make it easier to build out features/use-cases around the datasets categories. I think it is critical to add a UUID to the core schema. As a feature of UUIDs are that they can be distributively generated and yet still be guaranteed* to be unique because of the tens of trillions that could be generated without a duplicate. A simple approach to implement would be to use JKAN's edit form to add a UUID if none exists or is new see identifier in this form as an example which uses a pure JS UUID function to generate one.

This would allow for backwards compatibility as the new blank field would have a UUID added next time a dataset is updated... so long as the UUID function only adds one if the field is blank this would ensure no UUID is overriden only that one be created if new or blank... This UUID too should/could be used as the permalink so that the URL to the dataset wouldn't break when the title is changed/updated

Administration page

Per #19, a page to edit _config.yml

Allow editing of organizations

Add authentication steps to docs

Much easier now thanks to the admin page

JSON endpoints for each page

Categories not loading in editing form

Perhaps site.categories isn't even the right approach if we want to get away from defined category files.

Search

Would be great to use a search-as-a-service tool like swift type since it's all just static html files

"Add dataset" functionality

Currently only editing is supported

Management buttons should require collaborator access

Edit/add/delete/admin/setup buttons/pages

Anonymous users: buttons should say "Suggest Changes" etc. and prompt them to login
Logged-in users (non-collaborators): buttons should say "Suggest Changes" etc. and submit a pull request
Logged-in Collaborators: buttons should say "Edit" etc. and make commits

Add configurable google analytics

Perhaps the account number goes in _config.yml and it's configurable on the /admin/ page

Create installation wizard

Walk through heroku deploy & credential provision
Prompt user to login via GitHub
Save auth credentials to _config.yml

Management buttons should reflect user access

Also add/delete/admin/setup.

Anonymous users: buttons should say "Suggest Changes" etc. and prompt them to login
Logged-in users (non-collaborators): buttons should say "Suggest Changes" etc. and submit a pull request
Logged-in Collaborators: buttons should say "Edit" etc. and make commits

LICENSE and CKAN reuse

Very cool work @timwis!

Important question to check on, however. It looks like you've licensed JKAN under the MIT License (one of my favorites), but CKAN itself is Affero GPL licensed.

Did you reuse any CKAN code in JKAN? I'm interested in contributing to JKAN or creating a Socrata-backed version, but I want to make sure we're treading on safe ground with the license.

Remove category.html and organization.html templates

No longer needed

Document using a custom domain

Originally suggested the admin page allow setting the CNAME file. Perhaps we can settle for documenting this since you'll need to know how to change your domain's DNS anyway, so we can assume a degree of technical aptitude.

Submit Dataset Changes

Hi!
Seems that the button Submit for dataset changes is not linked with any action.
you can see my fork here. http://iltempe.github.io/opendatagentediprato/
Can you help me?
Congratulation, great work!

Jekyll slug could mismatch JavaScript slug

datasets.js uses its own slugify() function to generate a slug for categories and organizations so that datasets.json doesn't need to contain 2 extra slugs for each dataset. But the /categories and /organizations pages link to /datasets?category=slug-here using Jekyll's slugify function.

Example:

Indego bike share stations links to ?organization=mayor-s-office-of-transportation-utilities instead of ?organization=mayors-office-of-transportation-and-utilities

Options:

Remove the /categories and /organizations pages entirely (maybe they're unnecessary)
Generate categories.json and organizations.json files and have datasets.js use those for its slugs
Look at Jekyll's slugify logic and make sure the JavaScript logic produces the same results.

Use JavaScript-driven listings/filters/pagination

Enough fighting with jekyll and liquid. Just output the datasets as JSON and build the navigation with javascript.

Setup page may be broken

Appears to be prepending /content/ to the file path

Disqus comments

Identify simpler JS development process & add to readme

At the moment it's a bit complicated having to run webpack watch and jekyll serve, and once the JS bundle is regenerated, you have to wait for jekyll to push it over to the _site directory. Gotta be a better way, and it needs to be documented.

Add travis tests

Use jekyll build and htmlproof as per https://jekyllrb.com/docs/continuous-integration/

Language support

Great to see an italian JKAN from @iltempe. We should have language/dictionary files to make it easier to implement in other languages.

"Delete dataset" functionality

Geojson preview should link to geojson.io

ex. http://geojson.io/#data=data:text/x-url,http%3A%2F%2Fapi.tiles.mapbox.com%2Fv3%2Ftmcw.map-gdv4cswo%2Fmarkers.geojson

I wonder if there's a similar site for CSVs?

Hide login indicator until state is known

At the moment, every page you navigate flashes Login even when logged in, while it tests to see if you have an OAuth cookie and fetches your user info. Instead, Login should be hidden at page load, and only shown if there's no OAuth cookie, just to avoid the flashing.

Setup domain

Use data.json as dataset model/yml

This would allow Data.gov to register data from future forks:

Use popup-based login instead of redirect

https://github.com/MrSwitch/hello.js/blob/55b89bf5487d4f39ff9a13b52498fbc976eb9fb7/src/hello.js#L1234

Add a Data Quality Dashboard with Issues

We recently worked on a jekyll template that uses bootstrap progress bar with _data directory to generate a simple/edittable dashboard...

it uses a 1-100 value split equally between todo/doing/done along with a simple issue url/text/type in the yaml frontmatter...

items:
  - item: 'Text'
    todo: 10
    doing: 0
    done: 90   
    issues:
    - issue: info #todo, doing, done
      text: Example Link
      url:     
    - issue: todo #todo, doing, done
      text: Example Link
      url: 
    - issue: doing #todo, doing, done
      text: Example Link
      url: 
    - issue: done #todo, doing, done
      text: Example Link
      url:

https://github.com/GSA/datagovATO/blob/gh-pages/_data/alerts.yml

Redirect to setup page if auth settings are blank

If they're blank, it indicates a fresh install.

Although some forkers may not want the auth component.

Add screenshots to readme

Discussion: Ideas for file uploading

Out of the box, JKAN currently supports linking to files hosted somewhere via their URL, so if you have an FTP server, or dropbox, or google drive, etc. you can just paste in the URL. But I imagine we can make that experience a little more seamless. Some ideas:

Use a /files directory in the jkan repo. A drag & drop interface commits binary files to it. (Would probably increase build times)
Have users create a separate jkan-files repo and use a drag & drop interface to commit binary files to it there in the gh-pages branch so they'll be served over a CDN
Filestack stores files for you and provides a JS lib for the interface. Free tier allows 250 uploads per month and 3GB bandwidth. Not open source.

Any other ideas? The basic requirements are that there be at least a decent free tier and a way to upload via HTTP.

Organizations (Index) page

CSV previews

Options

A. Use recline.js (like CKAN)
B. Roll something new using PapaParse and a grid library that handles large datasets
C. Build a website like geojson.io for CSVs

Considerations

Should load files even if CORS not available or different protocol (http on https). Perhaps a proxy like YQL?

Set up is more complicated than forking

GitHub doesn't deploy to GitHub Pages after a fork, it turns out—it's necessary to make a commit to that fork first. (This is contra to the setup instructions.) Unless there's a way around this, I imagine that the instructions will have to modified to include a step in which the repo is altered in some way, e.g. a dataset is added (which I appreciate isn't yet supported).

Improve javascript organization

Started out with a single JS file so I didn't use any of my usual tricks or frameworks, but now it's grown and there's value from code re-use. Odd to imagine the organization in the context of a jekyll site as opposed to a single-page application though.

Make JKAN more ready-to-fork

At the moment, JKAN is setup as a demo site rather than a ready-to-go fork. A few things could be done to improve this:

Remove the CNAME file
Leave in only a handful of datasets (6? how many does CKAN come with?)

Not to mention, what issues will we run into when upgrading a fork?

To make up for what would be lost, we should probably have (a) a demo site, and (b) a landing page explaining what JKAN is.

Look into using oauth.io instead of gatekeeper

Discussion: Is Jekyll necessary?

This project started out as almost entirely a Jekyll site, with a /datasets page, a /categories page and /organizations page but I pulled back on that because of a couple Jekyll limitations, such as:

Pagination only works on posts on github-pages-built sites
The where filter doesn't work for arrays (PR)
The group_by filter doesn't tell you the size of each group (PR)

I realized I could build most of that functionality in JavaScript alone. As the JS footprint has increased, I'm now wondering whether Jekyll is a necessary part of the stack. What if JKAN were just a static JS single-page-app that read/modified local JSON/YAML files the same way it does now?

Pros of this approach

Simpler architecture, lower barrier to entry for collaborators
Faster build times (no build process)
GitHub-as-a-filestore could be abstracted, theoretically allowing other filestores with similar APIs
Liberation from the shortcomings of Liquid templates

Cons of this approach / benefits of Jekyll

Jekyll provides faster build times by generating static HTML pages for each dataset
Changing a dataset in a JavaScript-only architecture would require 2 HTTP requests: (1) Change the dataset's JSON, and (2) change the dataset in the full list (datasets.json) (maybe a 3rd to fetch datasets.json)
Hashbang links are kinda ugly (demo.jkan.io/#datasets/foo-bar/)
Jekyll makes it easy to add other content ("about" page, blog posts, etc.) (though this could be done with ajax)

The most important thing is that both Jekyll and JavaScript begin with J, so we're still good on the name JKAN :P

What do folks think? @jalbertbowden @JJediny @waldoj @chriswhong @mheadd

Travis test not working

It's failing because _config.yml has baseurl set to /jkan and the jekyll build command doesn't accept the --baseurl override attribute.

Add filters/params to breadcrumbs

With a remove button

Inconsistent use of repository_owner

https://github.com/timwis/jkan/search?utf8=%E2%9C%93&q=repository_owner

I thought it was there for the sole purpose of running a development server, since gh-pages provides it in the site.github namespace. But there may have been another reason...

Make auth credentials easily configurable

Putting them in _config.yml seems like the logical idea, though it'd be even nicer to put them somewhere easier to configure.

:O Perhaps JKAN needs an administration page that configures _config.yml omg

Proper, paginated categories/tags pages

jekyll-archives would be great but it only seems to support posts

Can't really change baseurl/auth settings from admin page

If baseurl isn't correct, the admin page doesn't save properly.

Ideas

Calculate baseurl dynamically by (1) checking if using custom domain or gh-pages, (2) checking if project page or org page
Document that this should be done before changing the repo name
Move all JS code to admin.html to it doesn't depend on any other resources

EDIT: Just realized this applies to auth settings too, at least on the initial install...

Perhaps an /install page that prompts for these settings? Wizard could verify client_id is no longer than x characters (otherwise it's probably the secret)

Dataset description should support markdown and long-form

Some datasets may need more description and some potential forks could require tweeting the yaml content...

Consider adding a markdown editor to the edit form, the output of which could be combined with the yaml as a single payload to github for the commit:

https://github.com/NextStepWebs/simplemde-markdown-editor

yaml + --- + markdown === dataset.md --> Github API

Instead of using a hard coded JS web form, we've been considering using a webform that gets generated from a json schema whose output is yaml. This would allow projects/programs with their own established standards or using a common standard like data.json to be the yml model.