GithubHelp home page GithubHelp logo

timwis / jkan Goto Github PK

View Code? Open in Web Editor NEW
214.0 18.0 309.0 6.82 MB

A lightweight, backend-free open data portal, powered by Jekyll

Home Page: https://jkan.io

License: MIT License

HTML 62.42% Ruby 0.48% JavaScript 31.48% CSS 4.24% Dockerfile 1.39%

jkan's Introduction

jkan's People

Contributors

amercader avatar bryanquigley avatar dependabot[bot] avatar dracos avatar jjediny avatar keeganmcbride avatar lukemckinstry avatar lxyu0405 avatar lydiascarf avatar pezholio avatar timwis avatar tobinbradley avatar tursics avatar wilsaj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jkan's Issues

Add UUID to core schema

With the addition of being able to swap schema(s) thanks to @timwis #60 ...

There is still one critical field missing in the common/shared JKAN/CKAN schema. Currently there is no unique identifier used (i.e. a Global - GUID or Universal - UUID which are two names for the same thing), without the use of one; machine re-association or human cross-referencing of a dataset is subject to a broken links/reference in the future and/or duplicative entries under two names.

To avoid this and to make it easier to build out features/use-cases around the datasets categories. I think it is critical to add a UUID to the core schema. As a feature of UUIDs are that they can be distributively generated and yet still be guaranteed* to be unique because of the tens of trillions that could be generated without a duplicate. A simple approach to implement would be to use JKAN's edit form to add a UUID if none exists or is new see identifier in this form as an example which uses a pure JS UUID function to generate one.

This would allow for backwards compatibility as the new blank field would have a UUID added next time a dataset is updated... so long as the UUID function only adds one if the field is blank this would ensure no UUID is overriden only that one be created if new or blank... This UUID too should/could be used as the permalink so that the URL to the dataset wouldn't break when the title is changed/updated

Search

Would be great to use a search-as-a-service tool like swift type since it's all just static html files

Management buttons should require collaborator access

Edit/add/delete/admin/setup buttons/pages

  • Anonymous users: buttons should say "Suggest Changes" etc. and prompt them to login
  • Logged-in users (non-collaborators): buttons should say "Suggest Changes" etc. and submit a pull request
  • Logged-in Collaborators: buttons should say "Edit" etc. and make commits

Create installation wizard

  1. Walk through heroku deploy & credential provision
  2. Prompt user to login via GitHub
  3. Save auth credentials to _config.yml

screen shot 2016-03-23 at 11 30 41

Management buttons should reflect user access

Also add/delete/admin/setup.

  • Anonymous users: buttons should say "Suggest Changes" etc. and prompt them to login
  • Logged-in users (non-collaborators): buttons should say "Suggest Changes" etc. and submit a pull request
  • Logged-in Collaborators: buttons should say "Edit" etc. and make commits

LICENSE and CKAN reuse

Very cool work @timwis!

Important question to check on, however. It looks like you've licensed JKAN under the MIT License (one of my favorites), but CKAN itself is Affero GPL licensed.

Did you reuse any CKAN code in JKAN? I'm interested in contributing to JKAN or creating a Socrata-backed version, but I want to make sure we're treading on safe ground with the license.

Document using a custom domain

Originally suggested the admin page allow setting the CNAME file. Perhaps we can settle for documenting this since you'll need to know how to change your domain's DNS anyway, so we can assume a degree of technical aptitude.

Jekyll slug could mismatch JavaScript slug

datasets.js uses its own slugify() function to generate a slug for categories and organizations so that datasets.json doesn't need to contain 2 extra slugs for each dataset. But the /categories and /organizations pages link to /datasets?category=slug-here using Jekyll's slugify function.

Example:

  • Indego bike share stations links to ?organization=mayor-s-office-of-transportation-utilities instead of ?organization=mayors-office-of-transportation-and-utilities

Options:

  1. Remove the /categories and /organizations pages entirely (maybe they're unnecessary)
  2. Generate categories.json and organizations.json files and have datasets.js use those for its slugs
  3. Look at Jekyll's slugify logic and make sure the JavaScript logic produces the same results.

Identify simpler JS development process & add to readme

At the moment it's a bit complicated having to run webpack watch and jekyll serve, and once the JS bundle is regenerated, you have to wait for jekyll to push it over to the _site directory. Gotta be a better way, and it needs to be documented.

Hide login indicator until state is known

At the moment, every page you navigate flashes Login even when logged in, while it tests to see if you have an OAuth cookie and fetches your user info. Instead, Login should be hidden at page load, and only shown if there's no OAuth cookie, just to avoid the flashing.

Add a Data Quality Dashboard with Issues

We recently worked on a jekyll template that uses bootstrap progress bar with _data directory to generate a simple/edittable dashboard...
image

it uses a 1-100 value split equally between todo/doing/done along with a simple issue url/text/type in the yaml frontmatter...

items:
  - item: 'Text'
    todo: 10
    doing: 0
    done: 90   
    issues:
    - issue: info #todo, doing, done
      text: Example Link
      url:     
    - issue: todo #todo, doing, done
      text: Example Link
      url: 
    - issue: doing #todo, doing, done
      text: Example Link
      url: 
    - issue: done #todo, doing, done
      text: Example Link
      url:  

https://github.com/GSA/datagovATO/blob/gh-pages/_data/alerts.yml

Discussion: Ideas for file uploading

Out of the box, JKAN currently supports linking to files hosted somewhere via their URL, so if you have an FTP server, or dropbox, or google drive, etc. you can just paste in the URL. But I imagine we can make that experience a little more seamless. Some ideas:

  • Use a /files directory in the jkan repo. A drag & drop interface commits binary files to it. (Would probably increase build times)
  • Have users create a separate jkan-files repo and use a drag & drop interface to commit binary files to it there in the gh-pages branch so they'll be served over a CDN
  • Filestack stores files for you and provides a JS lib for the interface. Free tier allows 250 uploads per month and 3GB bandwidth. Not open source.

Any other ideas? The basic requirements are that there be at least a decent free tier and a way to upload via HTTP.

CSV previews

Options

A. Use recline.js (like CKAN)
B. Roll something new using PapaParse and a grid library that handles large datasets
C. Build a website like geojson.io for CSVs

Considerations

  • Should load files even if CORS not available or different protocol (http on https). Perhaps a proxy like YQL?

Set up is more complicated than forking

GitHub doesn't deploy to GitHub Pages after a fork, it turns out—it's necessary to make a commit to that fork first. (This is contra to the setup instructions.) Unless there's a way around this, I imagine that the instructions will have to modified to include a step in which the repo is altered in some way, e.g. a dataset is added (which I appreciate isn't yet supported).

Improve javascript organization

Started out with a single JS file so I didn't use any of my usual tricks or frameworks, but now it's grown and there's value from code re-use. Odd to imagine the organization in the context of a jekyll site as opposed to a single-page application though.

Make JKAN more ready-to-fork

At the moment, JKAN is setup as a demo site rather than a ready-to-go fork. A few things could be done to improve this:

  • Remove the CNAME file
  • Leave in only a handful of datasets (6? how many does CKAN come with?)

Not to mention, what issues will we run into when upgrading a fork?

To make up for what would be lost, we should probably have (a) a demo site, and (b) a landing page explaining what JKAN is.

Discussion: Is Jekyll necessary?

This project started out as almost entirely a Jekyll site, with a /datasets page, a /categories page and /organizations page but I pulled back on that because of a couple Jekyll limitations, such as:

  • Pagination only works on posts on github-pages-built sites
  • The where filter doesn't work for arrays (PR)
  • The group_by filter doesn't tell you the size of each group (PR)

I realized I could build most of that functionality in JavaScript alone. As the JS footprint has increased, I'm now wondering whether Jekyll is a necessary part of the stack. What if JKAN were just a static JS single-page-app that read/modified local JSON/YAML files the same way it does now?

Pros of this approach

  • Simpler architecture, lower barrier to entry for collaborators
  • Faster build times (no build process)
  • GitHub-as-a-filestore could be abstracted, theoretically allowing other filestores with similar APIs
  • Liberation from the shortcomings of Liquid templates

Cons of this approach / benefits of Jekyll

  • Jekyll provides faster build times by generating static HTML pages for each dataset
  • Changing a dataset in a JavaScript-only architecture would require 2 HTTP requests: (1) Change the dataset's JSON, and (2) change the dataset in the full list (datasets.json) (maybe a 3rd to fetch datasets.json)
  • Hashbang links are kinda ugly (demo.jkan.io/#datasets/foo-bar/)
  • Jekyll makes it easy to add other content ("about" page, blog posts, etc.) (though this could be done with ajax)

The most important thing is that both Jekyll and JavaScript begin with J, so we're still good on the name JKAN :P

What do folks think? @jalbertbowden @JJediny @waldoj @chriswhong @mheadd

Travis test not working

It's failing because _config.yml has baseurl set to /jkan and the jekyll build command doesn't accept the --baseurl override attribute.

Make auth credentials easily configurable

Putting them in _config.yml seems like the logical idea, though it'd be even nicer to put them somewhere easier to configure.

:O Perhaps JKAN needs an administration page that configures _config.yml omg

Can't really change baseurl/auth settings from admin page

If baseurl isn't correct, the admin page doesn't save properly.

Ideas

  1. Calculate baseurl dynamically by (1) checking if using custom domain or gh-pages, (2) checking if project page or org page
  2. Document that this should be done before changing the repo name
  3. Move all JS code to admin.html to it doesn't depend on any other resources

EDIT: Just realized this applies to auth settings too, at least on the initial install...

Perhaps an /install page that prompts for these settings? Wizard could verify client_id is no longer than x characters (otherwise it's probably the secret)

Dataset description should support markdown and long-form

Some datasets may need more description and some potential forks could require tweeting the yaml content...

Consider adding a markdown editor to the edit form, the output of which could be combined with the yaml as a single payload to github for the commit:

yaml + --- + markdown === dataset.md --> Github API

Instead of using a hard coded JS web form, we've been considering using a webform that gets generated from a json schema whose output is yaml. This would allow projects/programs with their own established standards or using a common standard like data.json to be the yml model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.