w3c / w3c-api Goto Github PK

The W3C API

Home Page: https://w3c.github.io/w3c-api/

HTML 92.81% CSS 7.19%

w3c-api's Introduction

W3C API

In response to demand from developers in our community wanting to interact with W3C's data, the W3C Systems Team has developed a Web API. Through it we are making available data on Specifications, Groups, Organizations and Users.

We will be expanding what information we expose over time.

The W3C API is a read-only Web API based on the JSON format exposing only public data.

Documentation

The W3C API is documented at https://w3c.github.io/w3c-api/ and the details about the different endpoints are available at https://api.w3.org/doc.

Webhooks are documented at https://w3c.github.io/w3c-api/webhooks.

Libraries

Apiary is a simple JavaScript library to leverage the W3C API. This library is intended to be used from W3C pages: domain pages, group pages, personal pages, etc. With Apiary, you can inject data that is retrieved using the W3C API, in a declarative way using placeholders.

node-w3capi provides a client for the W3C API, which exposes information about things such as specifications, groups, users, etc. It follows a simple pattern in which one builds up a query, and then causes the data to be fetched.

For using the W3C API with other languages see the list of libraries for working with JSON HAL.

Feedback and contributions

The code of this API is not public because it is highly tied to the W3C database infrastructure and would expose sensitive information. However, we welcome feedback and strongly encourage people to submit issues here.

You can also join the W3C's IRC #webapi channel.

Historical references

w3c-api's People

Contributors

Stargazers

Watchers

Forkers

accommodation pacificcoast xgqelite fritexvz oldray stevencbloom rozikhan a780201 mohammedamine radicallove maestrotechkochi kaydoh rottintek khurramijazm adrian-d-hidalgo denisff runt18 boonhome3014 alexxnica worksup wenksky123 jackass0528 lakshya04 uraniid swetaarya04 acidburn0zzz edisplay gustavoarmoa connectthefuture pebsconsulting enterstudio sportsbite takeratta mariemichele lxw618 adbaltd ppatience spitnshine 216giorgiy omunroe-com arjed555 aggltd-org bhanditz xuongrau sarwartusshar rattanaksamath maytinhduyanh fahmyhamzan alejagapatrick isabella232 queilawithaq sheanjohn jakestorms manny27nyc lhongjum informaticacba vinishabraham thebutlersguilty3 ibanknatoprad samkenxstream softwarehistorysociety shafiul2shelp zazvo familyfirst4 meatchy4lif gsinghg19 luxmangrd classicvalues qpc-github narutokonn alexanderramos89 hackwrld999 thedegeneratedev5150 jaxensdad2012 sharpycobra guieco kokomo43 mrleo18 saliyman pkkashyap2004 g1style

w3c-api's Issues

When no "Origin" is set, ignore the domain restrictions

The rule that the Origin header is used to filter who get access to the API is only useful to impose the restriction on browsers, which are guaranteed to add the Origin header when they make a cross-origin request.

There is no point (that I can see) in rejecting requests that comes with no Origin header; allowing them makes it much easier to view the result of an API call by just GETting the api.w3.org URL.

Rejected proposals

All my attempts to POST on /proposals end up with a 406 (Not Acceptable) response code :(

Inconsistent use of “name” and “title”

In the API, field names name and title seem to be used indistinctively.

An example: in the data returned by https://api-test.w3.org/domains/41381?embed=true, the name of the domain is inside name; but the name of the domain lead is inside _links.lead.title (not _links.lead.name).

Another example: in https://api-test.w3.org/users/ggdj8tciu9kwwc4o4ww888ggkwok0c8?embed=true, the name of the user is inside name; but in https://api-test.w3.org/users/ggdj8tciu9kwwc4o4ww888ggkwok0c8/specifications?embed=true, the name of the first spec is inside _embedded.specifications[0].title (not _embedded.specifications[0].name).

A consistent choice would help Apiary and other HATEOAS clients. Right now, Apiary has to check both fields when it expects a field to contain “the name of something”.

I suggest we always use name; I think it's the most generic term, and it's OK for people, groups, institutions, documents, etc.

items parameter not taken into account on /specifications/{shortname}/versions/{date}/previous

The items parameter used to control how many items are displayed on paginated results is not taken into account on /specifications/{shortname}/versions/{date}/previous (and I believe there is a similar issue for .../next).

See for example the version below which has 4 previous versions. I requested only 2 items per page but got 4 items.
https://api-test.w3.org/specifications/WCAG20-TECHS/versions/20060427/previous?items=2

Specification next/previous modelled as lists

Currently, the JSON returns for next and previous on specifications are modelled as lists. This may be correct (though I can't find examples) but it's hard to guess since before of #20 they show up as scalars anyway.

If there is only ever one prev/next, these should be exposed as scalars directly and not a pageable result.

If they are meant to be exposed as lists, then it would be very nice if the field name in the list results' _link were called next and previous instead of the current versions. In most of the rest of the API, there is a mapping there, but not here. That means ad hoc code to access these specific fields.

Similar questions apply to supersede(d|s).

Need schema and better defaults

I just got bitten because some data instances don't follow the same pattern as others, and cannot be predicted. Many user affiliations look like this but there are some that look like that.

This cannot be guessed because there is no schema documenting the expectation, and it can easily cause an application to malfunction (it just did).

More generally, the pattern of exposing a field as a array if there are several items but as a scalar when there is only one is a poor one. You're much better off with always having an array, possibly of one item.

Documentation isn't available without an API key, and therefore not linkable

The README says

the API is documented at https://api-test.w3.org/doc

but that link is now broken.

I guess now users have to manually append their API key (one that is not restricted per domain, or at least associated to w3.org) to that URL to be able to read the docs, eg https://api-test.w3.org/doc?apikey=<your-key-here>.

As we don't want to share keys in that way, it means the documentation isn't public. It won't be indexable for search engines, nor browsable by a casual passer-by, when the GH projects are public and the API is announced. Most importantly, we can't link and share specific sections using fragment IDs.

Is this an issue?
Is there an easy way to open up the docs, while keeping the API itself restricted using the keys?

(Apart from this detail, kudos for a great job introducing, managing and documenting the keys!)

Results for users should include link to People page for W3C staff

E.g. the API result for Philipp Hoschka should ideally include a link to http://www.w3.org/People/#Hoschka

Note that the in-page anchors don't follow a set format such as the user's family name so there may need to be a re-naming of anchors (at the risk of breaking some existing links).

Only list public W3C groups for a user

The list of groups returned for a user (/users/{hash}/groups) should only include public groups (wg/ig/cg/bg/tag/ab) similar to what we have on the /groups endpoint.

See for example Robin's API result which should not list the "Alumni" group.

Remove info that are irrelevant for closed groups

I was wondering if we should not remove from the API info that are not relevant for closed groups (similarly for domains and activities).

In particular I am thinking about links that relates to group participants (as a closed group do not have any participant anymore).

eg for closed groups the users, chairs and team_contact links are useless as they will always return an empty list.

New endpoint for affiliations

New affiliation endpoint (which includes all organizations):

/affiliations
/affiliations/{aff-id}

This is similar to what we have done for /domains.

Add Events to API

The idea is be to expose what we have in our database regarding upcoming meetings and events (admin interface for handling events data restricted to W3C Team).

This includes upcoming events like, AC, TPAC, Workshops, group F2F meetings, and other events but was not build to handle regular group teleconf.

Events are currently displayed on those pages:

Arrays should always be arrays

This was already mentioned in #15, but it is enough of a problem that it warrants its own bug.

Some fields are clearly meant to hold arrays, and when they contain multiple items they do. But if by chance they hold only a single item they become scalars. This is a painful antipattern to use, it constantly causes failures. I am finding myself having to write code to test for arrays pretty much everywhere.

Invalid JSON returned when API key is absent or wrong

Quotes are missing from one of the properties of the returned object:

$ curl https://api-test.w3.org/domains/41381

{
    "message": Missing API key,
    "documentation_url": "https://github.com/w3c/w3c-api"
}

$ curl https://api-test.w3.org/domains/41381?apikey=foo

{
    "message": Invalid API key,
    "documentation_url": "https://github.com/w3c/w3c-api"
}

Please open the group

This group started off open, it should not be closed.

I know that the API isn't announced yet (though it's public), but this repo was created for the specific purpose of gathering feedback from early adopters whom we've pointed at the API. They can't do that now. If it's to have a closed group why not just stick to the GitLab instance?

Use specific media type for the W3C API

Following discussions started with @tripu on issue #29 I went to the conclusion that it will make things easier if the W3C API had its own media type. That way we could clearly identify when link relations points to another W3C API resource (using the type property on link objects).

@dontcallmedom also suggested we include the API version in the media type as done by GitHub.

FYI, several popular APIs are using their own media type:

SensioLabs (Symfony): application/vnd.com.sensiolabs.connect+xml
GitHub: application/vnd.github.v3+json

Proposed W3C API media type: application/vnd.org.w3.api.v1+json

Explanation:

vnd indicates it is a vendor-specific MIME type
org.w3.api indicates it's the W3C API
v1 version 1.x of the API we would change the major version number only when we bring changes that break backward compatibility
Adding hal might be an overkill and would only be a human readable hint to indicates we follow HAL (the only official HAL media type being application/hal+json)
+json to says we follow the "application/json"
sends back JSON objects (+json)

Collision on previous and next relation names

In /specifications/{shortname}/versions/{date}/previous the previous relation is used to link to previous version(s) of a given specification version, unfortunately that name is already used by the pagination module to indicate the previous result page.
This leads to strange results as both links are being merged together.

See for example:
https://api-test.w3.org/specifications/WCAG20-TECHS/versions/20060427/previous?page=2
(with page=2 to force the previous pagination link)

The exact same issue exists with next.

A solution would be to rename our previous rel in previous-version or, as proposed by the IANA Link Relations Registry and RFC5829 - Version Navigation Link Relations, call it predecessor-version.

3.5. 'predecessor-version'

When included on a versioned resource, this link points to a resource
containing the predecessor version in the version history.

Some systems may allow more than one of these link relations in the
case of multiple branches merging.

Remove link to services on affiliations

We do not have any service tied to affiliations or organizations so that link should be removed from the API results for those entities.

See bug on eg. https://api-test.w3.org/groups/1066

Value of query param "embed" is ignored; any value means "true"

All these return the same (append your API key to the URLs):

Provide webhook on TR publication

Since TR publications are one of the key aspects on how W3C operates, it would be great if one could register a Webhook that gets invoked each time a TR gets published.

This would for instance enable automatic notifications to mailing lists.

HTTP Error 502 Response

Sometimes I get HTTP response code 502. It appears to be load related. I get them about once every 500 queries if I'm running 50 queries per second. I haven't gotten any (after more than 5000 queries) when I slow down to 20 queries per second.

API doc: parameters and requirements

requirements should be use to list the restrictions on the parameters. If there's a path parameter like api/reports/{report}, {report} should be listed in the parameters.

It brings another problem which is how to differentiate path params and query params such as embed. In swagger, it's possible using the ’in’ property but our bundle doesn't implement it.

Inconsistent meaning of field “href”

In the API, href sometimes contains a HATEOAS reference (another URL for the API where more data can be discovered), and sometimes it contains a regular URL intended for humans (eg, the URL of a page or image under w3.org).

For example, look at all values returned inside href properties here:
https://api-test.w3.org/groups/68239

This may not seem like a big deal, but when trying to use the API programmatically (without hard-coding field names etc), this is an issue. Apiary doesn't know if a href is a hyperlink it should simply return to the user, or a pointer to fetch more information from the API. (Yes, it could check the beginning of the URL, but I think that is a weak patch for the issue.)

I suggest we always use href only for API URLs, and url only for URLs of resources outside the API.

Add Talks to API

It would be nice to integrate W3C Talks data to the API.

Talks management is currently implemented using a different system not tied to the W3C database which is not maintained by W3C Systems Team (Ivan wrote it but was willing to transfer it to someone else). (For reference see project documentation and Talk submission form.

Currently It can't easily be added to the API as it would likely need a full rewrite but is still interesting for a future milestone.

Inconsitent property name for team contacts

https://api-test.w3.org/groups/68239 and https://api-test.w3.org/groups/68239/teamcontacts are inconsistent: the former has team_contacts while the latter has teamcontacts. This makes the API harder to discover since when one follows the link named team_contacts, he expects to get back something named team_contacts.

This is related to w3c/apiary#20 (much more than #28 I believe)

Get rid of discr property and merge it with type

type property is needed on:

groups (various types of groups wg / ig / bg / cg)
affiliations (to differentiate between affiliation and organization)
Not needed elsewhere as it is obvious which type of object we are getting based on the endpoint

Increase maximum number of items on paginated results

We got several requests asking to increase the number of items per page (some people even asked us to get rid of pagination, see issue #46 from @sandhawke).

While it does not seem feasible to completely remove pagination (as we already experience it when we reached memory limits), we should be able to increase the current maximum which is at 100 to an higher number. That number would be determined by testing our various endpoints, and also tuning of PHP's max memory setting.

Additionally this would reduce the number of request to retrieve a full list of results, which would be a performance increase for users that are not local to MIT.

New API route needed to get a specification based on its shortname

Based on request from @darobin and othersthey would need a way to retrieve a specification from its current shortname.

This means we would need a route that handles the redirect based on spec shortnames (this should be case insensitive ie wcag20 or WCAG20 should both work).

https://api-test.w3.org/reportshornames/WCAG20 -> https://api-test.w3.org/reports/249

Add links to previous and next charters on a given group charter

it would be nice to have previous/next charter links on https://api-test.w3.org/groups/43696/charters/141

Add participations information for affiliation, groups and users

As group participation is now public we can add a few more routes related to organizations and affiliations' participations.

This would enable to create based on the API pages like the IPP status page and should answer @dontcallmedom request #37.

On affiliations add the following links and subpages:

/affiliations/{aff-id}/participations list of participations in public groups (wg/ig/cg/bg) for this affiliation
/affiliations/{aff-id}/participants list of all users participating in public groups sponsored by this affiliation

On groups add:

/groups/{group-id}/participations list of participations in that group

On users add:

/users/{user-id}/participations list of participations for that user

This depends on issues:

/affiliations endpoint (see #40)
/participations (see #42)

Add filters to retrieve only W3C Member organizations

It would be handy to be able to retrieve only the list of W3C Member organizations (useful to build a page like the Current W3C Members list.

This could be done by adding a filter like ?members=true on the /affiliations endpoint.

RDF (Turtle) output

It is very important that the W3C API supports Linked Data principles as well as providing JSON. There are two aspects to this:

being able to request the data as Turtle (as well as JSON and, presumably, other formats over time);
exposing the dataset overall as a set of Web resources in different representations including at least HTML, JSON and some serialisation of RDF (Turtle and JSON-LD are top of the list).

I can easily help work out what the Turtle should look like. Sandro might be the better person to help with a wrapper/code.

Add "latest" shortcut to /reports

It would be nice if /reports/{shortname}/latest was an alias to whatever is the latest known version of the said spec. Right now, it takes two hops from a shortname to the list of its WGs, and the "latest" alias would avoid that issue.

Alternatively, the description of the latest version might be in-lined in the report/{shortname} endpoint — it is after all likely to be relevant information for the API user.

Schema for the data model

Would it be possible to develop a schema for the data model at some point? It does not need to be very constraining, but if it could list what can be relied upon versus what's optional in the data model, it would be helpful.

The case came up with https://api-test.w3.org/groups/76983 which was added to group-hug. It breaks the interface because there is very little information, notably it is missing some things that I assumed would be available on all groups (like type).

I guess it is possible for data consumers to assume that if the type has a given value then the rest of the structure is likely to be predictable? If so, not having type specified is probably a bug?

Increase maximum number of requests per period

Currently up to 500 requests per hour to the API are allowed.
Now that we have caching widely used, we should be able to safely increase that limit by a factor 10 allowing 5000 requests per hour.

@jean-gui I am going to do the change in the AccountsBundle, please do similar modifications on the Varnish side and then deploy it.

Inconsistent name for team contacts endpoint and links

As discussed with @deniak and @tagawa the name of the team contacts endpoint /groups/*id*/teamcontacts and the link name used to point to that resource (_links.team_contacts) are not consistent (the first one does not contain an _).

The endpoint: https://api-test.w3.org/groups/40318/teamcontacts

A link to it as seen in: https://api-test.w3.org/groups/40318

"_links": {
    "id": 40318,
    "name": "HTML Working Group",
    ...
    "team_contacts": 
    {
        "href": "https://api-test.w3.org/groups/40318/teamcontacts"
    },

We could s/teamcontacts/team_contacts/g as all multi-words data attributes are using underscores (eg. start_date ) but so far we don't have any endpoint containing an underscore.

For the record:

<denis> vivien, re: s/team_contacts/teamcontacts/, I think we should be consistent. We already have other fields like the feedback dates with that naming convention (e.g. "last_call_feedback_due)
<vivien> denis, but currently we are not consistent as the endpoint (/groups/ID/teamcontacts) and the link name (_links.team_contacts) are different
...
<vivien> we could do the opposite with s/teamcontacts/team_contacts/ (just that till now we did not had any endpoint with an _)
<denis> we should create an issue vivien
* vivien creates it
<denis> GH is using a '_' in its API results
<denis> but I didn't find any endpoints composed with multiple words
<denis> http://stackoverflow.com/questions/778203/are-there-any-naming-convention-guidelines-for-rest-apis

empty collections: return 404 or empty table

Should we send a 404 when there's an empty collection or display the json with no data? /api/reports/576/versions/20090226/next

Get your current rate limit status

Similar to what GitHub is doing https://developer.github.com/v3/rate_limit/ we should add a new endpoint GET /rate_limit so that developers can get the rate limite status of their API Key.

Harmonize property and rel names

#47 suggests we use - instead of _ in rel names. We should have the same format for property names as well.

New endpoint for W3C Group participations

New participation endpoint:

/participations/{participation-id} an entity's (individual or organization) participation in a W3C Group
- a type property with value organization/individual
- created and ended properties (find good names)
- a group link
- a user link (only if type=individual)
- an organization link (only if type=organization)
- a participants link (see below only if type=organization)
/participations/{participation-id}/participants list of users participating in this group sponsored by this affiliation

Make it easy to get all the available data

If I want to get at the data in some way that's not anticipated, it's very painful. Hypothetically, if I wanted to find the users who had 6-letter last names, I'd have to traverse all the 100-item pages of all the groups, find their users, then traverse all the 100-item pages of those users. For some data, I'd then need to fetch every single one of those user pages, too.

A simple first step would be to allow all types to be fetched. Not just users in groups, but users, at the top level (while still only showing those in public groups). And don't max out ?items at 100. Max it at 100,000 if you need a max. That's still only a few megabytes.

It would also be nice to be able to inline data other than href and title, like with an ?inline=true flag, but that's lower priority.

Thanks.

Give participation affiliation in groups user list

In some groups (typically, WGs, IGs, CGs), people participate as representative of another group (typically, a Member org); as far as I can tell, that (very useful) information is not currently available in the API.

(one can retrieve the list of the users's affiliation, but not the one that specifically "sponsors" the participation in the said group)

Limit the number of items

Github restricts the number of items to 100

"Superseeded"

There is a "superseeded" field on specifications, as can be seen in https://api-test.w3.org/specifications/SVG/superseeded or in the docs.

That's not an English word. See http://dictionary.reference.com/browse/superseded.

Ditto "superseeds".

Is it still possible to change this?

Add links to pages for joining the group and see its patent policy status

Following discussion with @darobin and @r12a it would be handy if the API could have links to those 2 IPP pages.
This is only relevant for groups that are managed by IPP (which applies to all currently running WG, IG, CG, BG).

eg. for the WebRTC Working Group:
https://www.w3.org/2004/01/pp-impl/47318/join
https://www.w3.org/2004/01/pp-impl/47318/status

In API key generation, domains shouldn't restricted to be of the form domain.tld

In particular, running an in-browser app from localhost when developing it is rather common :)

Harmonize format of rel names and use of URIs for our own relation names

The Web Linking RFC that HAL references indicates:

Registered relation type names MUST conform to the reg-rel-type rule,
and MUST be compared character-by-character in a case-insensitive
fashion.
(...)
reg-rel-type = LOALPHA *( LOALPHA | DIGIT | "." | "-" )
LOALPHA = <any US-ASCII lowercase letter "a".."z">
DIGIT = <any US-ASCII digit "0".."9">

Our relation names don't follow that pattern we should harmonize and follow it.

Note that relation names that are not registered should be URIs (possibly shorten with CURIE) as explained in HAL's "8.2. Link relations" section and Web Linking RFC's "4.2 Extension Relation Types" section.

This proposed change only concerns rel names and does not affect JSON properties names which contain underscore characters.

IRC logs of the discussion with @deniak

<vivien> BTW regarding our recent rel renames and the registered rel names out there I wonder if we should follow the pattern of only allowing lowercase and - in rel names
<vivien> this would mean replacing our rel that currently have _ by -
<denis> +1 on harmonizing all the properties but no strong opinion on the format
<denis> dashes seem fine to me
<vivien> The Web Linking RFC that HAL references says:
<vivien> https://tools.ietf.org/html/rfc5988#section-4.1
<vivien> [[
<vivien>    Registered relation type names MUST conform to the reg-rel-type rule,
<vivien>    and MUST be compared character-by-character in a case-insensitive
<vivien>    fashion.
<vivien> ]]
<vivien> [[
<vivien>    reg-rel-type   = LOALPHA *( LOALPHA | DIGIT | "." | "-" )
<vivien> ]]
<vivien> LOALPHA        = <any US-ASCII lowercase letter "a".."z">
<vivien> DIGIT          = <any US-ASCII digit "0".."9">
<denis> so it means we are not currently HAL compatible
<vivien> not really as not all our relation names are registered ones
<vivien> but it would be better to follow that convention
<vivien> like active_charter should be called active-charter
<vivien> same for the previous/next charter link
<denis> let's discuss that with JG tomorrow
<vivien> yep
<vivien> https://tools.ietf.org/html/rfc5988#section-4.2
<vivien>    Applications that don't wish to register a relation type can use an
<vivien>    extension relation type, which is a URI [RFC3986] that uniquely
<vivien>    identifies the relation type.
<vivien>  
<vivien> in theory we should even use URIs (shorten with CURIEs) for the non registered rel names we introduced
<vivien>  
<vivien> I'll create a ticket for that

Add API to get user by GitHub ID

We're starting to build a system to track contributions made to GitHub automatically, and notably to be able to flag pull requests as "safe" (because they come from someone we know to be in the group) or "merge with caution" if we're not sure.

In order for that to work, we need to be able to handle GitHub webhooks that give us a GH user ID, and use that to get a W3C user, their affiliation, etc.

I'm not sure what the best way to expose it is; maybe just a redirection from /github/{id} to the user?

I'd be happy to discuss priorities for this issue as we are likely to need it sooner rather than later.

Low performances even on cached pages

@darobin indicated in issue #17 that he was experiencing low performances even when fetching the same resources multiple times which are supposed cached.

I am creating this new seprate issue so that @jean-gui can take a closer look at it.

Forced paging breaks service

It used to be that the following worked:

https://api-test.w3.org/groups?items=500&embed=true

Now it maxes out at 100. This is breaking the group listing in the repository manager. I'm not sure what the purpose of this limitation is. The UI needs to show all groups so all that is going to happen is that instead of one request there will be several. The same applies to users in a group: I need to load them all in order to find group intersections. This will just increase the load and the code complexity.