clariah / clariah-plus Goto Github PK

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.

Makefile 57.70% TeX 23.83% Python 18.47%

clariah-plus's People

Contributors

Stargazers

Watchers

Forkers

ddeboer hayco brambg rozelemarijn stakats

clariah-plus's Issues

Set up CLARIAH Container Registry

CLARIAH will deliver a container registry where OCI-container images (e.g. Docker) can be submitted by all CLARIAH partners.

Implement tool store component for tool discovery

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/shared-development-roadmap/epics/fair-tool-discovery.md as follows:

Function: Stores all harvested/aggregated tool metadata
Function: API for updating (invoked by harvester)
Function: API for querying (invoked by end-user, portals)

Further details all remain to be worked out still, reuse of existing solutions and coordination with the FAIR Datasets epic is recommended. I'd advocate for something fairly minimal and simple.

Define Service Description, Requirements and Components for FAIR Datasets inlcuding a workflow schema (visual)

Clear requirements and instructions needed on connecting with the CLARIAH authentication infrastructure

I think we as a group need to put out clear instructions for developers/devops on how to to connect their service to the CLARIAH infrastructure, and clear requirements for software developers about what technologies need to be implemented to connect to this infrastructure (= OAuth2+OpenID Connect).

Internally at HuC I found https://confluence.socialhistoryservices.org/display/HucDI/Satosa , which goes a great job already at describing most aspect, but is not a public resource I think. @janpieterk I don't know if there's already public documentation elsewhere? It may well be that I'm missing things that have already been done.

All the endpoints need to be properly listed somewhere as well. After some searching I found that https://authentication.clariah.nl/.well-known/openid-configuration produces a decent specification, but we need something more accessible.

I also want to raise the point that we need a clearly described procedure to register new clients, whether that it is automatic or human-mediated.

Define cross-WP team for FAIR Datasets

Implement Ineo export/import for tool discovery

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/technical-committee/shared-development-roadmap/epics/shared/fair-tool-discovery.md as follows:

Client using the tool store API (or a direct extension thereof) converting output to a format understood by Ineo, for interoperability with it. Needed if (and only if) we can't make Ineo correct directly to our tool store backend

Relies on a clear specification of Ineo's YAML import (also requested in #32).

Implement codemeta validation component for tool discovery, against clariah requirements

The following was already suggested by @ddeboer in #32.

Create a SHACL shape to validate CLARIAH-specific codemeta.json files.

This sounds good to me and would be a valuable component that can be invoked from the harvester (#33).
@ddeboer Is this something you'd want to take up implementing? (I don't have prior expertise with SHACL myself so having that would be a plus).

Prepare ELAN for interacting with the elucidate web annotation repository

Extend the existing file-based web annotation import and export with options to load from and store in the repository using the API. Test if this results in a viable workflow. Relates to #54.

Define cross-WP team for Distribution & Deployment

The current composition of this team can be found here: https://github.com/orgs/CLARIAH/teams/fair-distribution-and-deployment

Formulate requirements for CLARIAH software and CLARIAH infrastructure

We need to formulate requirements of what it means, technically, for software to be CLARIAH software and what we expect from our software providers and infrastructure providers. Such an intention was formulated in the internal memo "CLARIAH compatibiliteit (20210412)", with the aim to deliver "CLARIAH compatibility guidelines v1.0" in Q4 2021. I'm attempting to start actually formulating these requirements in the requirements branch of this repository, and heavily draw from NDE's infrastructure requirements by @ddeboer and from my earlier work in CLARIAH-CORE regarding software quality and sustainability. See the following:

Software and Service Requirements - Aimed at software developers/technology providers.
Infrastructure Requirements - Aimed at infrastructure operators/providers.

This is an invitation to everybody who feels compelled to contribute, everything at this stage is simply an initial proposal and all input is welcome, you can use this issue for comments/questions/feedback or even commit directly to the requirements branch with your additions (pull requests also welcome for possibly larger/controversial changes).

Furthermore, we have a first global schematic overview of certain aspects of the infrastructure in the CLaaS architecture schema discussed in issue #3 and at the past TC meeting. Some of the initial requirements proposed are derived from the apparent consensus at that meeting.

(Edit: Links updated to reflect situation after merge of #5)

Collect questions and discussion points to address next tech day (februari)

Set up Clariah CI/CD

Basic has been done on local server and a docker host

code --> gitlab --> runner --> build --> registry

Next step is to create a real life environment in dev

code -- gitlab (on k8s) -- runner (on k8s) --> build --> image --> Harbor --> replication --> scripts --> deploy to docker and or k8s

setup the environment will take place in week 7 in week 8 we hopefully have a fully operational proof-of concept.

Define components, standards, requirements for tool discovery

We need to clearly define the software components, service components and data components for tool discovery, along with the standards we adopt and requirements we want to set for all CLARIAH participants.

All these will be formulated here as part of the Shared Development Roadmap v2: https://github.com/CLARIAH/clariah-plus/blob/main/shared-development-roadmap/epics/fair-tool-discovery.md

It contains an initial proposal, which was already discussed and positively received by the technical committee, but further details remain to be filled. A workflow schema also needs to be added.

Further discussion can take place in this thread.

Establishing a common metadata vocabulary for our use cases

We just had a Technical Committee meeting in which we looked at the use cases that have been collected thus-far. Eventually we want to use these use cases as a basis to distil objectives, which will be more abstractly defined.

We came to the agreement that we need to move to some kind of simple tagging vocabulary and show these tags in the index of all the use-cases, make them more easily discernible. The current version already does this to a certain extent by adding at least a WPx label and possible some relevant technology cues.

This relates to the broader discussion on the metadata for the use cases:

Daan (@dgbroeder) raised the issue of wanting to see more and better defined metadata for the use cases, amongst other things regarding those that have been marked completed (Mario (@mmisworking) also remarked on this).
@dgbroeder: Could you specify in this issue what kind of metadata and vocabulary you would like to suggest? Then we can discuss it here.

Daan also raised the point of various use-cases being tied to specific technology, or where a certain technology is the adopted or suggested solution (which doesn't imply it is the only possible one of course).

I think we can currently roughly distinguish two types of use cases (which are already indicated as such mostly):

specific - Use cases tied to a very specific research project, with discernible researchers
generic - Use cases that group a more generic research question.

Still, both can either be tied to particular technology or be in more general terms, so that is another dimension we may want to qualify explicitly in the metadata.

I suppose, as we are establishing vocabularies anyway, that we may even want to define what exactly we see as a "use case".

Having said all this, it is important to stress (as @roelandordelman did in the meeting) that we are in an initial phase and in this stage we are happy to just to collect as many use cases as we can, regardless of whether they adhere strictly to the template or to a specific metadata vocabulary. This discussion should not inhibit anybody from contributing use cases.

[usecases] Historical enrichment: Proof of concept corpus exploitation

https://github.com/CLARIAH/usecases/blob/master/cases/poc-corpus-historical-enrichment.md

FAIR Annotation: Joint "Project Initiation Document", max 4 pages

Define Service Description, Requirements and Components for Data Stories including a workflow schema (visual)

As a scholar, I want to tell a story illustrated by live queries on CLARIAH data sources and attractive visualisations of the answers in order to present the results/value of my data work.

We want to bring together:

the WP4 / DRUID activities on data stories (mainly focussed on sparql endpoint + visualisations + publication in run time environment)
the WP5 / Media Suite activities on data stories (mainly focussed on API search + visualisations + publication in blogs / enhanced publications referencing to the runtime environment/Media Suite)
WP3 / La Machine activities (Jupyter Notebooks)

We need to:

define the service based on the requirements from WP4-5 activities and plans
identify the main components and visualize the workflow(s) in a diagram.

Make presentation for tech day 2022-01-27 for Distribution & Deployment

[usecases] Parallel historical corpus mining

https://github.com/CLARIAH/usecases/blob/master/cases/hipaco.md

[shared development roadmap] Formulate Tool Discovery service

Formulate tool discovery/registry service in shared development roadmap - phase 2

Vocabulary Registry

Extend bartoc to show the info we want, e.g., add a reviews tab
How to get the info there?
Populate the registry
Crawler for usage info
Vocab cache
LDProxy & https

Prepare a small demo/presentation of tools/infra-components of your organisation

[IG Text] Request for Comment: WP3 Virtual Research Environment Plan 3

Within CLARIAH WP3 there have been plans for a so-called "Virtual Research Environment" (VRE) for several years already and this was given funding in CLARIAH-CORE and CLARIAH-PLUS. A first ambitious plan already emerged in 2017 by Daan Broeder et al. and had been worked on for a while, but this effort eventually failed and was ceased, leaving a gap to be filled.

Some weeks ago I proposed a completely renewed third VRE plan and discussed this with WP3, who agreed to continue with this proposal and in doing so fill the gap. The main focus of the plan is connectivity/interoperability between WP3 tools and building upon what we already have and get the most out of it, essentially and paradoxically, we intend to provide a VRE without very explicitly building a VRE, "the whole is more than the sum of its parts".

Please read the full plan here:

PDF
Markdown

The initial plan was written in a way that it could even be conducted by one developer (myself), hence the focus on solutions I developed in the years for WP3, but more capacity is available so there is some room for expansion, which is currently being discussed internally at HuC. Involvement of other CLARIAH partners (also beyond WP3, especially WP6 comes to mind) would be much appreciated.

I would now like to requests comments from the Interest Group about this plan, specific questions I want to ask are:

Are there specific tools you feel are missing in the current plan and that should be considered for integration?
Are there particular connections between tools that you (or someone you know) are interested in? Bonus points if you can you think of specific use cases.
Do you have other comments, questions, or concerns about the plan and the direction it takes?

Request for Comment: Proposal for forwarding data between RESTful webservices in a heterogenous authentication environment

As one of ideas of the CLARIAH Interest Groups is to come up with best practises and proposals, I would like to kick this off for our workflow group with the following proposal.

You can view it in either markdown or PDF:

I'd be glad to hear any feedback and opinions on this.

Take inventory of metadata from CLARIAH centers/partners

Make an inventory of metadata formats supplied by CLARIAH centers/partners used to declare dataset descriptions.

which format
where (endpoint)
how (protocol)

Inventory

NDE

Dataset owners submit HTTP URLs to the Dataset Register at which they provide a dataset description (or set of descriptions, a catalog). The Dataset Register will then periodically crawl those URLs.
The descriptions must conform to the NDE Requirements for Datasets.
Dataset owners can also provide that description as a static file that can be checked into a Git repository.
~700 dataset descriptions (in DCAT) are already available in the NDE Dataset Register, including LD datasets from the KB.

[Note: info below comes from interviews by Femmy with various CLARIAH partners. It gives a broad overview, but details still need to be added]

KB: via verschillende endpoints worden plukjes uitgeleverd, vaak in context van Europeana, LOD beschikbaar gesteld via http://data.bibliotheken.nl/ (OAI-PMH). Andere data die beschikbaar worden gesteld via verschillende services zijn te vinden op https://www.kb.nl/bronnen-zoekwijzers/dataservices-en-apis (ook vaak OAI-PMH, of via Wikicommons).

IvdNT: stelt taalmaterialen ter beschikking via https://taalmaterialen.ivdnt.org/. Metadata open beschikbaar via endpoint (?), OAI-PMH.

IISG: metadata altijd open, API levert metadata uit, OAI-MPH, wordt ook aan Europeana/WorldCat uitgeleverd

MPI: endpoint waar metadata open beschikbaar worden gesteld via OAI-PMH.

Meertens: deel van de metadata kan worden geharvest, via OAI-PMH

Huygens: een aantal datasystemen hebben een eigen API, oudere software omgevingen werken met Datadumps (http://oaipmh.huygens.knaw.nl/oai).

Beeld & Geluid: Verschilt per collectie. Metadata van‘Open beelden’ (CC0) wordt geharvest o.a. door Europeana, ‘Open Data’ heeft endpoint met CC0 metadata.

DANS: metadata altijd open via OAIMPH endpoint en ook te harvesten via DataCite (http://easy.dans.knaw.nl/oai/?verb=Identify).

(…)

Implement harvester component for tool discovery

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/technical-committee/shared-development-roadmap/epics/shared/fair-tool-discovery.md as follows:

Harvester for software & service metadata. Periodically queries all endpoints listed in the CLARIAH Tool Source Registry, converts metadata to a common scheme, and finally updates the tool store. Endpoints may be git source repositories from which metadata is extracted, or service endpoints that explicitly provided metadata.

I would advocate for a simple approach for the harvester, and relying on codemetapy and other tools to do the necessary conversions. All details still have to be worked out.

Depends on implementation of a data component: Tool Source Registry , currently described as follows:

Simple registry of software source repositories and service endpoints. Serves as input for the harvester.

This source registry could be simply be implemented as a simple plain text list of URLs in a git repository on github, new registrations can be added using pull requests. Or implemented using the planned baserow database that holds all software components.

[shared development roadmap] Formulate FAIR Datasets Service

Formulate the FAIR Datasets service as part of the CLARIAH+ Shared Development Roadmap

Compose and publish metadata for all software/services, for tool discovery

All CLARIAH software/services should provide metadata in the agreed upon form (will be specified as part of the Software Requirements), it's most likely going to be the requirement to include a codemeta.json file in the root of the source code repository, and OpenAPI or other dedicated endpoints for webservices.

This task is one that is spread over all CLARIAH partners/institutions and this issue may track general progress.

Each participating organisation adds minimally 2 backlog items that they want to take up

Define cross-WP team for tool discovery

We need to identify people who want to work on this core-shared service.

It is important we have representation from Ineo, from WP4 (vocabularies), from WP3 (existing codemeta & CMDI initiatives)

The current composition of this team can be found here: https://github.com/orgs/CLARIAH/teams/fair-tool-discovery

[usecases] Micro-frontends for manual scholarly annotations

https://github.com/CLARIAH/usecases/blob/master/cases/scenarios-manual-annotation.md

CLARIAH Federated Authentication and 'homeless' user registration

We already had a discussion of this in #5 , but I think we better continue this in a dedicated issue, also to clearly keep it on the radar as an important issue to solve for CLARIAH.

The problem is how to provide access to 'homeless' users in our federated authentication infrastructure, that is to say, users that do not have an account with any participating institute. Since we piggy-back on CLARIN's infrastructure currently, we inherit their solution which is that such users can register an account at https://user.clarin.eu/user/register .

My criticism of that solution is that it requires a human-in-the-loop for account activation, so there is a delay that is often undesirable from a user perspective. For various non-sensitive services (such as the ones running in Nijmegen), immediate access after registration is desired (the user immediately gets a confirmation mail and can test the service). Of course, any sensitive service can simply not provide authorization to such 'homeless' users and only allow academic/verified ones.

@menzowindhouwer suggested we could propose this solution to CLARIN.eu . @roelandordelman expressed some concerns about 'turning this around' (but I think we may have a bit of miscommunication there?).

Define extra vocabulary for tool discovery

We decided on adopting codemeta (+schema.org), and linked open data in general, as a basis for software metadata. Codemeta derives a lot of terms from schema.org and actively collaborates with them. Additionally, they propose some terms on top of schema.org that are not assimilated yet.

However, on top of this, we may still need certain CLARIAH specific terms for describing more domain-specific aspects of software/service. In this issue, which can be considered a continuation of #23, we want to track that effort and define that vocabulary. We may be able to reuse vocabulary compiled by @JanOdijk in a prior CLARIN/CLARIAH project (which was in the form of a CMDI profile).

As Ineo is the prospective front-end for tool discovery, we also need an exact specification of the vocabulary they currently adopted (and I believe for which they have an import facility via YAML). They have undoubtedly given this subject a fair amount of thought already and a mapping between this vocabularies and codemeta + whatever extensions we add is essential.

(minor update 2022-02-23: The YAML import facility Ineo offers is not relevant for us)

Persistent identifiers for use cases

At our meeting, the need was expressed to have some kind of persistent identifier for use cases, so we can easily refer to them in discussions.

In order to keep things as simple as possible, we can just give each use case a number, start the filename for that usecase with that number followed by a period.

Optionally we can add symbolic links to each use case, consisting of only the number (I'll have to confirm first if github redirects properly in such cases).

Define set of top-level "stories" for FAIR Datasets with a workload estimate (under which specific tasks can be organised)

Open up/extend the OAI harvester

Open up/extend OAI harvester:

To speak to the CLARIAH endpoint registr(y|ies)
To speak the needed protocols
To deal with the needed formats

Moving old issues (sorry for notification spam! please ignore)

I am currently moving issues from various the IG-* repositories to this new central clariah-plus repositories. This may result in a fair amount of notification spam about old issues to everybody subscribed to CLARIAH. Unfortunately, I can't prevent the mail notifications.

Sorry for the inconvenience! It will be temporary, please do not unsubscribe from this repository as then you will miss important notifications in the future.

Implement switchboard export for tool discovery

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/technical-committee/shared-development-roadmap/epics/shared/fair-tool-discovery.md as follows:

Client using the tool store API (or a direct extension thereof) converting output to the format required by the CLARIN Switchboard, for interoperability with it.

The switchboard registry resides at https://github.com/clarin-eric/switchboard-tool-registry

[usecases] Interoperability Integrated Text Annotation and Linked Open Data

https://github.com/CLARIAH/usecases/blob/master/cases/folia-lod.md

Generic infrastructure requirements

Netwerk Digitaal Erfgoed (NDE) develops web applications (‘netwerkdiensten’) including HTTP APIs and demonstrators. These applications currently run on our Kubernetes cluster but will be moved to other organisations for hosting in the long run. We’re in the process of formalising application infrastructure requirements that we will use when discussing application handover. (Unlike many of the CLARIAH use cases, these requirements are not aimed at solving research questions.)

These requirements follow best practices such as Twelve-Factor and infrastructure as code. Additional principles are:

A clear boundary between the application and the infrastructure it runs on. The application code should not contain infrastructure specifics and be as generic as possible so instances can be hosted by multiple parties, including developers that want to run or debug the application locally. On the other hand, existing infrastructure should not have to change fundamentally to accommodate new applications.
Processes are automated, which strengthens transparency and prevents human error; for example application code is built automatically when commits are pushed.
The infrastructure is reliable, which helps adoption of the applications.

Questions

Should I create a use case among many for this, for instance ‘Serving web applications’?
Do we have a place for generic infrastructure (i.e. non application-specific) best practices?
Can we work together towards a single set of requirements or are the participating organisations’ situations too different?

Implement CMDI export for tool discovery

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/technical-committee/shared-development-roadmap/epics/shared/fair-tool-discovery.md as follows:

Client using the tool store API (or a direct extension thereof) converting output to an established CMDI software metadata profile for interoperability with CLARIN. Possibly also offer a OAI-PMH endpoint serving the converted data.

How to handle, automatically harvest and curate software metadata?

I spoke with Sebastiaan Fluitsma about Ineo and the role of software metadata today. On past occasions I also often spoke with @JanOdijk about this. I believe software metadata curation fits the theme of this interest group, although it also heavily involves Linked-Open-Data (poking @rlzijdeman), so I'll post my thought on software metadata and automatic harvesting here:

Ineo is a portal for researchers that aims to present various CLARIAH resources (tools/services and data). One of the concerns we acknowledged was the need to keep the tool metadata in Ineo up-to-date; tool metadata should be accurate, version numbers correct, links valid. This may seem obvious but it is something that often goes wrong, so I'm advocating for clear update and automatic harvesting procedures for metadata.

I have been using codemeta as a solution for all my software metadata needs. Codemeta is a linked open data scheme for describing software metadata and it is especially focussed at providing so-called *'crosswalks' * with various other existing software metadata standards. The crosswalk link metadata description fields from for example the Python Package Index, CRAN, Maven, Debian, etc. to to fields that are included in schema.org . Tools like codemetapy and codemetar do such conversions.

I'm a big supporter of storing the metadata as close to the software as possible, and automatically harvest and convert it where possible and in doing so avoid unnecessary data duplication. A certain amount of fundamental metadata can be harvested from the software repositories where the software is deposited (Python Package Index, CRAN, Maven, Debian,etc) .. Alternatively, codemeta metadata can be explicitly provided in the software's source code repository, by including a codemeta.json file as for example here.

For instance, for all software installed in a LaMachine installation, a codemeta registry is automatically compiled that describes all software it contains. This is used in turn to present a simple portal page like the one that can be seen on https://webservices.cls.ru.nl (A LaMachine installation on a production server in Nijmegen). Of course Ineo is going to be more elaborate than this, but I would still be in favour of letting it pull metadata from a registry that's in part compiled by automatic harvesting from other metadata sources as much as possible. I would want to prevent having all kinds of different versions of duplicated metadata in existence, especially if those are independently and manually curated.

The codemeta initiative is limited to the more fundamental metadata that describes all software, this is not enough. There has been an effort by @JanOdijk to compile official CMDI metadata for various CLARIN/CLARIAH WP3 tools which takes into account more elaborate domain-specific metadata. This has been a manual curation effort. This is great, but a main concern I have here is that there seems not be any proper update & maintenance mechanism here; currently the raw CMDI files are put on surfdrive. I'd much rather see them maintained in a git repository here in the CLARIAH group so we have a 1) clear update procedure, 2) proper version control and 3) transparency & community interaction.

I think metadata collection/curation could be a layered approach where we combine data from multiple sources when needed. We first grab the basic metadata from as close to the source as possible (converting it from whatever repo it is stored in to codemeta), usually containing metadata directly provided by the developers. Then on top of that we can have a manual curation effort that adds extra CLARIAH domain-specific fields. The final result could be expressed as linked open data in some form, like the JSON-LD that I use for codemeta, which I think is more flexible and preferably, but even as CMDI if that is still preferred. (I believe there are existing initiatives within CLARIAH that treat CMDI as Linked Open Data, like cmd2rdf?). Tools like Ineo can in turn pull from some kind of central CLARIAH metadata registry to always present accurate metadata.

This is just my view on things of course, which I just want to throw out here for debate because I think we have some gains to make here. The LOD-crowd probably has more to say on this too.

[IG Annotation] Defining user needs for annotation support

I've started a document on user needs for manual annotation support, which is copied from earlier documents we wrote with o.a. @lilimelgar and @jblom.

This is also based on how @jauco described his idea of the IGs: focus on two or three technical use cases that each groups chooses to focus on. The ones I picked are based on things we've been working on since CLARIAH Core, but of course this is all up for discussion.

Define service description, requirements and components for Distribution & Deployment

This epic is still rather vague, a precise definition and description needs to be drafted, including a workflow visual schema. In the SDRv1 we called this "Scalable Multimedia Processes" for a while, but that felt too specific so during the tech day we opted for a more generic "(FAIR) Distribution & Deployment" instead.

I think the primary concern for this epic is to agree on, provide and communicate the software & infrastructure guidelines and to set up a (possibly proof-of-concept) implementation of a infrastructure. Most of the implementation is firmly in the scope of WP2 I'd say.

I'm (again) risking a formulation of this epic that this rather broad so it encompasses most of the provisioning services WP2 delivers, but I think that's not a bad thing. Services such as monitoring, CD/CI are strongly interconnected with deployment. @roelandordelman and @tvermaut may want to check what they think of this.

Prepare tool Discovery presentation for tech-day 2022-01-27

Document 'software metadata requirements'

As part of the software & infrastructure requirements, we already formulated a clause stating the requirement to deliver software metadata. The actual metadata requirements, however, may need to be worked out in a separate document (because of length), and we can then refer to that from the software & infrastructure requirements.

This would be a main deliverable of this tool discovery track.

What does the central catalog expect and how do we get there?

What does the central catalog expect and how do we get there?
(shared task with FAIR tool discovery)

Needed Transforms
Normalization
Central curation, if yes, where?
- In original format (before transform), so easy feedback to source center/partner to, hopefully, fix in source
- In catalog format, less variety to deal with … central catalog will be clean but less change to fix in source
Upload/ingest to Ineo/Index/Registry

What do we expect from centers/partners in terms of Accessibility (Registry=Findability)?

Make an instance of the elucidate annotation repository available for the team

Comments on CLaaS architecture

@proycon Thanks for fec78b6. Some things that come to mind when looking at your architecture overview:

Deployed docker containers; or container registries storing the container images. (blue boxes)

In this stage, the containers haven’t yet been deployed. Do you mean built Docker containers instead? Let’s make this a bit clearer: software components (light green boxes) must push their build output as one or more Docker containers into a container registry. Merely adding a Dockerfile to software component source code is not sufficient, because CLaaS will not build the containers.

An application deployment configuration (Infrastructure as Code)

This may be stored in the same repository that holds the software component itself.

Data Store (yellow cilinder)

Do you mean a data store for managing the infrastructure or a data store that is used by the applications, such as a database?

App Interfaces

They now only run containers. Is that sufficient for all use cases, particularly HPC? Using a proper infrastructure orchestration tool, CLaaS could both deploy containers (preferred) and provision VMs (if necessary).

[IG Annotation] Defining the scope of the Annotation group

@marijnkoolen already listed the following in the initial README document:

support of the creation and use of annotations on any media type

support of manual annotation processes

support of manual correction of automatic annotation
...

Annotation aspects that are outside the scope of this Interest Group (because they are covered by other IGs):

automatic annotation processes

crowdsourcing

I have the impression this is a solid start we're all in agreement with?

@roelandordelman formulated it nicely in a mail prior to the creation of this group (dutch):

Mijn two cents in dezen is dat het mijn sterke voorkeur zou hebben om de automatische annotatie/verrijkingsprocessen (eenduidige definities CLARIAH breed is overigens ook een goeie) als aparte (buiten)categorie te behandelen en expliciet te focussen op ‘manual annotation’, ofwel annotatie vanaf het moment dat er een ‘human in de loop’ is waarbij je dan ook eventueel het grijze gebied van handmatige correctie van automatische annotaties of zaken als supervision in auto processen kunt meepakken. Dus de stap van een automatisch gegenereerd transcript naar een generiek annotatie formaat in een human readable omgeving zou wat mij betreft ook bij annotatie horen. Het interfacen met automatische annotatieprocessen ook. Maar dus niet de internals van iets als automatische spraakherkenning of sentiment analyse. Ook crowdsourcing —toch weer een andere tak van sport— zou ik in eerste instantie buiten scope plaatsen en zien als een blackbox die wel weer materiaal kan aanleveren aan het annotatieproces.

I'd say those automatic annotation processes should indeed be kept out of our scope, and are in the scope of the Text Interest Group and the AudioVisual Interest Group.

Do you all agree it is definitely within our scope to provide an overview of and eventually recommendations for annotation models/paradigms/formats, because that was one of the first things I was thinking about. I'm thinking we could first of all establish an inventory of annotation models/paradigms/formats (for text, audio, video, or whatever) that are in use in CLARIAH, their users, and an overview of what interoperability tools are already available (e.g. converters). These would then also be out-of-scope for the Text/AV IG group, so we have clear boundaries.

There's also the planned interest group on Linked Open Data which may have overlap with this group, after all, annotations may take the form of linked open data (like web annotation) or may be expressed in more intrinsic formats such as FoLiA, TEI, ELAN, etc.. Do you already have ideas on how these two groups should relate?

clariah / clariah-plus Goto Github PK

clariah-plus's People

Contributors

Stargazers

Watchers

Forkers

clariah-plus's Issues

Inventory

NDE

[Note: info below comes from interviews by Femmy with various CLARIAH partners. It gives a broad overview, but details still need to be added]

(…)

Questions

Recommend Projects

Recommend Topics

Recommend Org

Jobs