opendatacity / re-data Goto Github PK

re:publica Data API

License: MIT License

JavaScript 73.81% CSS 1.25% HTML 24.94%

re-data's Introduction

re-data

This project is the ongoing effort of providing an simple JSON API interface for conferences and under heavy development.

Example data includes re:publica, 30C3, 31C3. Take a look at the scrapers directory.

Documentation

Documentation on the API can be found here

Contributing

The infrastructure is still very rough as this is a side project of several people. If you are interested in helping out, send pull requests or join the mailing list.

Examples

Several apps for Android and iOS for the re:publica 2014
An app for AltConf based on one of the re-publica apps.
re:publica - rp15
Congress – 31C3 for the 31st Chaos Communication Congress
More…

Set it up locally

you need a CouchDB instance (use for example a docker container to set on up easily)
- docker run -d -p 5984:5984 fedora/couchdb
- curl -X PUT http://localhost:5984/_config/admins/user -d '"secret"' (creates user user with password secret)
copy config.js.dist to config.js and fill in the credentials for the CouchDB instance (see curl step above)
copy scraper/config/scrapers.js.example to scraper/config/scrapers.js (default config is fine for a first run)
fetch dependencies via npm (needs to be executed in scraper subdirectory):
- npm install
run the resetDB command inside the scraper subdirectory:
- NODE_PATH=node_modules node scraper.js resetDB (NODE_PATH just specifies not globally install locations - was created by the npm install step)
run the import command inside the scraper subdirectory:
- NODE_PATH=node_modules node scraper.js import

re-data's People

Contributors

Stargazers

Watchers

Forkers

toto schobiwan axxg morrisjobke astro robtranquillo meriland below 1stvamp alice-wl ocdata worldhack666

re-data's Issues

Use Accept-Language header instead of handcrafted label names

http://www.ietf.org/rfc/rfc2616.txt <- 14.4

It would be better to use the HTTP header to only specify the language you want the content in and not ship all languages, because this could become a lot of entries.

cc @phibos as he suggested this

Prepend /events to /<event-id> routes - to be RESTful

I will try to fix this myself. ;)

api responses filtern nach last_modified

Problem setting it up

$ NODE_PATH=./node_modules node scraper.js import

module.js:340
    throw err;
          ^
Error: Cannot find module './31C3/scraper.js'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/home/mjob/Projekte/re-data/scraper/config/scrapers.js:5:19)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)

My ./config.js:

#!/usr/bin/env node

/* configuration */
module.exports = {

    /* listen on tcp or socket */
    app: {
        host: 'localhost',
        port: 9999
    },

    /* couchdb configuration */
    db: {
        database: 'rp-data',
        host: 'localhost',
        port: 5984,
        options: {
            secure: false,
            auth: {
                username: '',
                password: ''
            }
        }
    },

    /* version */
    version: 0.1

};

my scrapers/config/scraper.js

exports.scrapers = {
    // { module:require('./rp13/scraper.js'), db:true },
    // { module:require('./rp14/scraper.js'), db:true }
    // { module:require('./altconf14/scraper.js'), db:false }
    '30C3': { module:require('./31C3/scraper.js'), db:true }
}

cc @toto - feel free to ping me on Twitter

32C3 scraper: Add new fields to database

The schedule.json now also includes urls to the attachments, e.g. the slides as pdf (was added with frab/frab@52a3d25#diff-39578a12c77e40b5c2175f1fc4c5a18e).

Maybe it's a good idea to import this information also to your database/dumps.

gzip-Kompression für JSON und TSV

Move specification to a separate repository/organisation

At the moment the specification looks like an api documentation for the tools hosted in the same repository.

To get the specification used be other developers/users it should be moved to a separate repository/organisation. It might also be a good idea to host the latest specification on a github page.

The "TLS 1.3 Draft" is an example for a specification hosted on github.

multi-language support + default value

two entries per label:

label: "Deutsche Version",
label_translated: {
   "de": "Deutsche Version",
   "en": "English Version"
}

Don't fuck up database, when source is down!

Da brauchen wir ne Lösung, dass wir ne History der Daten anlegen und zu einem älteren Stand reverten können.

Szenario: re-publica.de-Server ist down und unser Scraper zerschießt unsere Datenbank!

in scraper/lib/db.js kommen die gescrapten Daten an und müssen in der couch-db aktualisiert werden. Unter importer/importer.js ist der alte Code. Im Idealfall werden die alten DB-Einträge nicht weggeworfen, sondern nur aktualisiert.

Find a name for the specification

The specification should get a name. It is much easier to talk/write about something if it has a name.

Something like "OEDF" - "Open Event Data Format"

Paginierung für API

Für die Übersichten /speakers und /sessions wäre wohl eine Paginierung sinnvoll. Default 20, max 100 oder was immer ihr für richtig haltet. Klar, die Ausgabe sollte ohnehin lokal gecached werden, aber wenn da 300 Sessions und ebensoviele Speaker komplett mit allen Details ausgegeben werden macht das gut Traffic, ganz davon ab dass die dann auch lokal erst verarbeitet werden müssten.

Planning mode

One thing to consider is, if we want to allow sessions without time/date in sort of a planning mode.

This would enable a few use cases we have not covered and is typical for all conferences I know of:

You could do something like halfnarp with re-data
For an app I would imagine that it would go into a "planning mode" where you can pre-pick your fav sessions before the final timeslots are selected.

Technically I would make two adjustments:

Make day and begin, end optional for sessions
Add some kind of state to the event so that an API consumer can easily tell that state the conference is in and whether he supports it (not all apps make sense in planning mode, some only there).

What do you say?