GithubHelp home page GithubHelp logo

w3c-api's Introduction

W3C API

In response to demand from developers in our community wanting to interact with W3C's data, the W3C Systems Team has developed a Web API. Through it we are making available data on Specifications, Groups, Organizations and Users.

We will be expanding what information we expose over time.

The W3C API is a read-only Web API based on the JSON format exposing only public data.

Documentation

The W3C API is documented at https://w3c.github.io/w3c-api/ and the details about the different endpoints are available at https://api.w3.org/doc.

Webhooks are documented at https://w3c.github.io/w3c-api/webhooks.

Libraries

Apiary is a simple JavaScript library to leverage the W3C API. This library is intended to be used from W3C pages: domain pages, group pages, personal pages, etc. With Apiary, you can inject data that is retrieved using the W3C API, in a declarative way using placeholders.

node-w3capi provides a client for the W3C API, which exposes information about things such as specifications, groups, users, etc. It follows a simple pattern in which one builds up a query, and then causes the data to be fetched.

For using the W3C API with other languages see the list of libraries for working with JSON HAL.

Feedback and contributions

The code of this API is not public because it is highly tied to the W3C database infrastructure and would expose sensitive information. However, we welcome feedback and strongly encourage people to submit issues here.

You can also join the W3C's IRC #webapi channel.

Historical references

w3c-api's People

Contributors

deniak avatar jean-gui avatar vivienlacourba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

w3c-api's Issues

When no "Origin" is set, ignore the domain restrictions

The rule that the Origin header is used to filter who get access to the API is only useful to impose the restriction on browsers, which are guaranteed to add the Origin header when they make a cross-origin request.

There is no point (that I can see) in rejecting requests that comes with no Origin header; allowing them makes it much easier to view the result of an API call by just GETting the api.w3.org URL.

Rejected proposals

All my attempts to POST on /proposals end up with a 406 (Not Acceptable) response code :(

Inconsistent use of “name” and “title”

In the API, field names name and title seem to be used indistinctively.

An example: in the data returned by https://api-test.w3.org/domains/41381?embed=true, the name of the domain is inside name; but the name of the domain lead is inside _links.lead.title (not _links.lead.name).

Another example: in https://api-test.w3.org/users/ggdj8tciu9kwwc4o4ww888ggkwok0c8?embed=true, the name of the user is inside name; but in https://api-test.w3.org/users/ggdj8tciu9kwwc4o4ww888ggkwok0c8/specifications?embed=true, the name of the first spec is inside _embedded.specifications[0].title (not _embedded.specifications[0].name).

A consistent choice would help Apiary and other HATEOAS clients. Right now, Apiary has to check both fields when it expects a field to contain “the name of something”.

I suggest we always use name; I think it's the most generic term, and it's OK for people, groups, institutions, documents, etc.

items parameter not taken into account on /specifications/{shortname}/versions/{date}/previous

The items parameter used to control how many items are displayed on paginated results is not taken into account on /specifications/{shortname}/versions/{date}/previous (and I believe there is a similar issue for .../next).

See for example the version below which has 4 previous versions. I requested only 2 items per page but got 4 items.
https://api-test.w3.org/specifications/WCAG20-TECHS/versions/20060427/previous?items=2

Specification next/previous modelled as lists

Currently, the JSON returns for next and previous on specifications are modelled as lists. This may be correct (though I can't find examples) but it's hard to guess since before of #20 they show up as scalars anyway.

If there is only ever one prev/next, these should be exposed as scalars directly and not a pageable result.

If they are meant to be exposed as lists, then it would be very nice if the field name in the list results' _link were called next and previous instead of the current versions. In most of the rest of the API, there is a mapping there, but not here. That means ad hoc code to access these specific fields.

Similar questions apply to supersede(d|s).

Need schema and better defaults

I just got bitten because some data instances don't follow the same pattern as others, and cannot be predicted. Many user affiliations look like this but there are some that look like that.

This cannot be guessed because there is no schema documenting the expectation, and it can easily cause an application to malfunction (it just did).

More generally, the pattern of exposing a field as a array if there are several items but as a scalar when there is only one is a poor one. You're much better off with always having an array, possibly of one item.

Documentation isn't available without an API key, and therefore not linkable

The README says

the API is documented at https://api-test.w3.org/doc

but that link is now broken.

I guess now users have to manually append their API key (one that is not restricted per domain, or at least associated to w3.org) to that URL to be able to read the docs, eg https://api-test.w3.org/doc?apikey=<your-key-here>.

As we don't want to share keys in that way, it means the documentation isn't public. It won't be indexable for search engines, nor browsable by a casual passer-by, when the GH projects are public and the API is announced. Most importantly, we can't link and share specific sections using fragment IDs.

Is this an issue?
Is there an easy way to open up the docs, while keeping the API itself restricted using the keys?

(Apart from this detail, kudos for a great job introducing, managing and documenting the keys!)

Only list public W3C groups for a user

The list of groups returned for a user (/users/{hash}/groups) should only include public groups (wg/ig/cg/bg/tag/ab) similar to what we have on the /groups endpoint.

See for example Robin's API result which should not list the "Alumni" group.

Remove info that are irrelevant for closed groups

I was wondering if we should not remove from the API info that are not relevant for closed groups (similarly for domains and activities).

In particular I am thinking about links that relates to group participants (as a closed group do not have any participant anymore).

eg for closed groups the users, chairs and team_contact links are useless as they will always return an empty list.

New endpoint for affiliations

New affiliation endpoint (which includes all organizations):

  • /affiliations
  • /affiliations/{aff-id}

This is similar to what we have done for /domains.

Arrays should always be arrays

This was already mentioned in #15, but it is enough of a problem that it warrants its own bug.

Some fields are clearly meant to hold arrays, and when they contain multiple items they do. But if by chance they hold only a single item they become scalars. This is a painful antipattern to use, it constantly causes failures. I am finding myself having to write code to test for arrays pretty much everywhere.

Please open the group

This group started off open, it should not be closed.

I know that the API isn't announced yet (though it's public), but this repo was created for the specific purpose of gathering feedback from early adopters whom we've pointed at the API. They can't do that now. If it's to have a closed group why not just stick to the GitLab instance?

Use specific media type for the W3C API

Following discussions started with @tripu on issue #29 I went to the conclusion that it will make things easier if the W3C API had its own media type. That way we could clearly identify when link relations points to another W3C API resource (using the type property on link objects).

@dontcallmedom also suggested we include the API version in the media type as done by GitHub.

FYI, several popular APIs are using their own media type:

  • SensioLabs (Symfony): application/vnd.com.sensiolabs.connect+xml
  • GitHub: application/vnd.github.v3+json

Proposed W3C API media type: application/vnd.org.w3.api.v1+json

Explanation:

  • vnd indicates it is a vendor-specific MIME type
  • org.w3.api indicates it's the W3C API
  • v1 version 1.x of the API we would change the major version number only when we bring changes that break backward compatibility
  • Adding hal might be an overkill and would only be a human readable hint to indicates we follow HAL (the only official HAL media type being application/hal+json)
  • +json to says we follow the "application/json"
    sends back JSON objects (+json)

Collision on previous and next relation names

In /specifications/{shortname}/versions/{date}/previous the previous relation is used to link to previous version(s) of a given specification version, unfortunately that name is already used by the pagination module to indicate the previous result page.
This leads to strange results as both links are being merged together.

See for example:
https://api-test.w3.org/specifications/WCAG20-TECHS/versions/20060427/previous?page=2
(with page=2 to force the previous pagination link)

The exact same issue exists with next.

A solution would be to rename our previous rel in previous-version or, as proposed by the IANA Link Relations Registry and RFC5829 - Version Navigation Link Relations, call it predecessor-version.

3.5. 'predecessor-version'

When included on a versioned resource, this link points to a resource
containing the predecessor version in the version history.

Some systems may allow more than one of these link relations in the
case of multiple branches merging.

Provide webhook on TR publication

Since TR publications are one of the key aspects on how W3C operates, it would be great if one could register a Webhook that gets invoked each time a TR gets published.

This would for instance enable automatic notifications to mailing lists.

HTTP Error 502 Response

Sometimes I get HTTP response code 502. It appears to be load related. I get them about once every 500 queries if I'm running 50 queries per second. I haven't gotten any (after more than 5000 queries) when I slow down to 20 queries per second.

API doc: parameters and requirements

requirements should be use to list the restrictions on the parameters. If there's a path parameter like api/reports/{report}, {report} should be listed in the parameters.

It brings another problem which is how to differentiate path params and query params such as embed. In swagger, it's possible using the ’in’ property but our bundle doesn't implement it.

Inconsistent meaning of field “href”

In the API, href sometimes contains a HATEOAS reference (another URL for the API where more data can be discovered), and sometimes it contains a regular URL intended for humans (eg, the URL of a page or image under w3.org).

For example, look at all values returned inside href properties here:
https://api-test.w3.org/groups/68239

This may not seem like a big deal, but when trying to use the API programmatically (without hard-coding field names etc), this is an issue. Apiary doesn't know if a href is a hyperlink it should simply return to the user, or a pointer to fetch more information from the API. (Yes, it could check the beginning of the URL, but I think that is a weak patch for the issue.)

I suggest we always use href only for API URLs, and url only for URLs of resources outside the API.

Add Talks to API

It would be nice to integrate W3C Talks data to the API.

Talks management is currently implemented using a different system not tied to the W3C database which is not maintained by W3C Systems Team (Ivan wrote it but was willing to transfer it to someone else). (For reference see project documentation and Talk submission form.

Currently It can't easily be added to the API as it would likely need a full rewrite but is still interesting for a future milestone.

Get rid of discr property and merge it with type

type property is needed on:

  • groups (various types of groups wg / ig / bg / cg)
  • affiliations (to differentiate between affiliation and organization)
  • Not needed elsewhere as it is obvious which type of object we are getting based on the endpoint

Increase maximum number of items on paginated results

We got several requests asking to increase the number of items per page (some people even asked us to get rid of pagination, see issue #46 from @sandhawke).

While it does not seem feasible to completely remove pagination (as we already experience it when we reached memory limits), we should be able to increase the current maximum which is at 100 to an higher number. That number would be determined by testing our various endpoints, and also tuning of PHP's max memory setting.

Additionally this would reduce the number of request to retrieve a full list of results, which would be a performance increase for users that are not local to MIT.

Add participations information for affiliation, groups and users

As group participation is now public we can add a few more routes related to organizations and affiliations' participations.

This would enable to create based on the API pages like the IPP status page and should answer @dontcallmedom request #37.

On affiliations add the following links and subpages:

  • /affiliations/{aff-id}/participations list of participations in public groups (wg/ig/cg/bg) for this affiliation
  • /affiliations/{aff-id}/participants list of all users participating in public groups sponsored by this affiliation

On groups add:

  • /groups/{group-id}/participations list of participations in that group

On users add:

  • /users/{user-id}/participations list of participations for that user

This depends on issues:

  • /affiliations endpoint (see #40)
  • /participations (see #42)

RDF (Turtle) output

It is very important that the W3C API supports Linked Data principles as well as providing JSON. There are two aspects to this:

  1. being able to request the data as Turtle (as well as JSON and, presumably, other formats over time);
  2. exposing the dataset overall as a set of Web resources in different representations including at least HTML, JSON and some serialisation of RDF (Turtle and JSON-LD are top of the list).

I can easily help work out what the Turtle should look like. Sandro might be the better person to help with a wrapper/code.

Add "latest" shortcut to /reports

It would be nice if /reports/{shortname}/latest was an alias to whatever is the latest known version of the said spec. Right now, it takes two hops from a shortname to the list of its WGs, and the "latest" alias would avoid that issue.

Alternatively, the description of the latest version might be in-lined in the report/{shortname} endpoint — it is after all likely to be relevant information for the API user.

Schema for the data model

Would it be possible to develop a schema for the data model at some point? It does not need to be very constraining, but if it could list what can be relied upon versus what's optional in the data model, it would be helpful.

The case came up with https://api-test.w3.org/groups/76983 which was added to group-hug. It breaks the interface because there is very little information, notably it is missing some things that I assumed would be available on all groups (like type).

I guess it is possible for data consumers to assume that if the type has a given value then the rest of the structure is likely to be predictable? If so, not having type specified is probably a bug?

Increase maximum number of requests per period

Currently up to 500 requests per hour to the API are allowed.
Now that we have caching widely used, we should be able to safely increase that limit by a factor 10 allowing 5000 requests per hour.

@jean-gui I am going to do the change in the AccountsBundle, please do similar modifications on the Varnish side and then deploy it.

Inconsistent name for team contacts endpoint and links

As discussed with @deniak and @tagawa the name of the team contacts endpoint /groups/*id*/teamcontacts and the link name used to point to that resource (_links.team_contacts) are not consistent (the first one does not contain an _).

The endpoint: https://api-test.w3.org/groups/40318/teamcontacts

A link to it as seen in: https://api-test.w3.org/groups/40318

"_links": {
    "id": 40318,
    "name": "HTML Working Group",
    ...
    "team_contacts": 
    {
        "href": "https://api-test.w3.org/groups/40318/teamcontacts"
    },

We could s/teamcontacts/team_contacts/g as all multi-words data attributes are using underscores (eg. start_date ) but so far we don't have any endpoint containing an underscore.

For the record:

<denis> vivien, re: s/team_contacts/teamcontacts/, I think we should be consistent. We already have other fields like the feedback dates with that naming convention (e.g. "last_call_feedback_due)
<vivien> denis, but currently we are not consistent as the endpoint (/groups/ID/teamcontacts) and the link name (_links.team_contacts) are different
...
<vivien> we could do the opposite with s/teamcontacts/team_contacts/ (just that till now we did not had any endpoint with an _)
<denis> we should create an issue vivien
* vivien creates it
<denis> GH is using a '_' in its API results
<denis> but I didn't find any endpoints composed with multiple words
<denis> http://stackoverflow.com/questions/778203/are-there-any-naming-convention-guidelines-for-rest-apis

New endpoint for W3C Group participations

New participation endpoint:

  • /participations/{participation-id} an entity's (individual or organization) participation in a W3C Group
    • a type property with value organization/individual
    • created and ended properties (find good names)
    • a group link
    • a user link (only if type=individual)
    • an organization link (only if type=organization)
    • a participants link (see below only if type=organization)
  • /participations/{participation-id}/participants list of users participating in this group sponsored by this affiliation

Make it easy to get all the available data

If I want to get at the data in some way that's not anticipated, it's very painful. Hypothetically, if I wanted to find the users who had 6-letter last names, I'd have to traverse all the 100-item pages of all the groups, find their users, then traverse all the 100-item pages of those users. For some data, I'd then need to fetch every single one of those user pages, too.

A simple first step would be to allow all types to be fetched. Not just users in groups, but users, at the top level (while still only showing those in public groups). And don't max out ?items at 100. Max it at 100,000 if you need a max. That's still only a few megabytes.

It would also be nice to be able to inline data other than href and title, like with an ?inline=true flag, but that's lower priority.

Thanks.

Give participation affiliation in groups user list

In some groups (typically, WGs, IGs, CGs), people participate as representative of another group (typically, a Member org); as far as I can tell, that (very useful) information is not currently available in the API.

(one can retrieve the list of the users's affiliation, but not the one that specifically "sponsors" the participation in the said group)

Harmonize format of rel names and use of URIs for our own relation names

The Web Linking RFC that HAL references indicates:

Registered relation type names MUST conform to the reg-rel-type rule,
and MUST be compared character-by-character in a case-insensitive
fashion.
(...)
reg-rel-type = LOALPHA *( LOALPHA | DIGIT | "." | "-" )
LOALPHA = <any US-ASCII lowercase letter "a".."z">
DIGIT = <any US-ASCII digit "0".."9">

Our relation names don't follow that pattern we should harmonize and follow it.

Note that relation names that are not registered should be URIs (possibly shorten with CURIE) as explained in HAL's "8.2. Link relations" section and Web Linking RFC's "4.2 Extension Relation Types" section.

This proposed change only concerns rel names and does not affect JSON properties names which contain underscore characters.

IRC logs of the discussion with @deniak

<vivien> BTW regarding our recent rel renames and the registered rel names out there I wonder if we should follow the pattern of only allowing lowercase and - in rel names
<vivien> this would mean replacing our rel that currently have _ by -
<denis> +1 on harmonizing all the properties but no strong opinion on the format
<denis> dashes seem fine to me
<vivien> The Web Linking RFC that HAL references says:
<vivien> https://tools.ietf.org/html/rfc5988#section-4.1
<vivien> [[
<vivien>    Registered relation type names MUST conform to the reg-rel-type rule,
<vivien>    and MUST be compared character-by-character in a case-insensitive
<vivien>    fashion.
<vivien> ]]
<vivien> [[
<vivien>    reg-rel-type   = LOALPHA *( LOALPHA | DIGIT | "." | "-" )
<vivien> ]]
<vivien> LOALPHA        = <any US-ASCII lowercase letter "a".."z">
<vivien> DIGIT          = <any US-ASCII digit "0".."9">
<denis> so it means we are not currently HAL compatible
<vivien> not really as not all our relation names are registered ones
<vivien> but it would be better to follow that convention
<vivien> like active_charter should be called active-charter
<vivien> same for the previous/next charter link
<denis> let's discuss that with JG tomorrow
<vivien> yep
<vivien> https://tools.ietf.org/html/rfc5988#section-4.2
<vivien>    Applications that don't wish to register a relation type can use an
<vivien>    extension relation type, which is a URI [RFC3986] that uniquely
<vivien>    identifies the relation type.
<vivien>  
<vivien> in theory we should even use URIs (shorten with CURIEs) for the non registered rel names we introduced
<vivien>  
<vivien> I'll create a ticket for that

Add API to get user by GitHub ID

We're starting to build a system to track contributions made to GitHub automatically, and notably to be able to flag pull requests as "safe" (because they come from someone we know to be in the group) or "merge with caution" if we're not sure.

In order for that to work, we need to be able to handle GitHub webhooks that give us a GH user ID, and use that to get a W3C user, their affiliation, etc.

I'm not sure what the best way to expose it is; maybe just a redirection from /github/{id} to the user?

I'd be happy to discuss priorities for this issue as we are likely to need it sooner rather than later.

Forced paging breaks service

It used to be that the following worked:

https://api-test.w3.org/groups?items=500&embed=true

Now it maxes out at 100. This is breaking the group listing in the repository manager. I'm not sure what the purpose of this limitation is. The UI needs to show all groups so all that is going to happen is that instead of one request there will be several. The same applies to users in a group: I need to load them all in order to find group intersections. This will just increase the load and the code complexity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.