libero / community Goto Github PK

A place for community-wide issues, discussion, resources, files and sharing

License: MIT License

community's Introduction

Getting Started

Libero Publisher consists of a number of services, libraries and components so the best place to start is with the sample-configuration repository that contains a simple guide to cloning the latest containers and running them locally using docker-compose.

https://github.com/libero/sample-configuration

eLife use TravisCI for automated deployments to AWS instances (although currently as a single instance). The configuration and scripts for that can be found in the environments repository.

https://github.com/libero/environments

For further technical assistance please join the conversation on Slack at https://libero.pub/join-slack/

To find out more about our MVP please see below and look at the high level progress/plan on our roadmap at https://elifesci.org/roadmap

Libero Publisher MVP Hypothesis statement

eLife's mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. With Libero Publisher, we want to have an impact on scholarly publishing and demonstrate that our technology is reusable. We will know if there is a potential market fit for Libero Publisher when our MVP has confirmed the following 3 hypothesis:

Hypothesis #1

A simple journal can adopt Libero to publish their articles once our MVP is available.

Hypothesis #2

Libero provides a platform that can be extended so that larger, more complex journals can publish their articles using the platform.

Hypothesis #3

Libero can help journals lower their infrastructure and maintenance costs compared to dominant commercial platforms.

Definitions of a simple or complex journal

To help define a simple or complex journal we have looked at the differences in technology that would be required to produce Libero to publish and host different types of journal.

A simple journal is defined as having a low throughput of articles (less than 1 per week), a low amount of visitors (less than 50,000 a month) and displays only scholarly content that is uncomplicated in nature so supports only text, images, tables and mathematical forumlas. Articles are grouped by research category and one other way (collections, issues etc.).

Examples might include The International Journal of Microsimulation, Bioinorganic Chemistry and Applications or small society journals.

A complex journal is defined as having higher throughput (greater than 25 articles per week), a significant amount of visitors (greater than 500,000 a month) and scholarly content that is more complicated containing items like videos, IIIF images and custom sections (like plain language summaries or decision letters). Articles are supported by other editorial content, blog posts and news, and might be grouped in multiple ways including special editions, cross discipline collections or through specific channels.

Examples might include eLife, The BMJ or large society journals.

community's People

Watchers

Forkers

davidcmoulton

community's Issues

RFC: Migrate away from Texture

Problem

Developing new features for Libero Editor is proving significantly more difficult than it should, and the development team are finding it increasingly more effort to implement new functionality into the tool.

The problems that the development team have encountered are as follows:

Lack of documentation or sample code for both Texture and the Substance library that it is built upon.
Very little use of existing libraries or frameworks means large amounts of code must be written to achieve trivial things.
The code base is overly abstracted, which makes it overly hard to follow or understand. This is particularly an issue when trying to implement a complex new feature.
The tool cannot be configured without making changes to the code.

These problems have given rise to the following concerns:

Lack of active development on either the Substance library or Texture will result in a large maintenance burden on the development team.
Unable to make use of existing frameworks, e.g. React, without making significant alterations to the underlying code.
The ease at which new developers can be on-boarded and make contributions to the project.
Implementation of complicated new functionality, e.g. Tracked Changes or Collaborative editing, will be extremely costly in effort to achieve.

Suggestion

After evaluation of other potential solutions, we recommend to restart development of the editor using ProseMirror.

We believe that actioning the above address the problems raised as follows:

ProseMirror has been proven in production environments, most notably in Atlasian’s AtlasKit. It is also used in several other smaller projects, such as Manuscripts.io, including many that are currently in development.
ProseMirror is still actively developed, and has an active community around it.
ProseMirror is well documented, with plenty of documentation discussing architecture and design, lots of sample code on how to implement various bits of functionality as well as TypeScript definition files.
Integrates well with existing frameworks, e.g. React.
There is plenty of existing code and plugins out there that we can potentially leverage to accelerate development and contribute back too. E.g. the Manuscripts.io developers have already developed a JATS plugin.
Complex functionality, for example tracked changes and collaborative editing, are significantly easier to implement.

Concerns

Although we have had success in rapidly prototyping with ProseMirror, that developing a new editor from scratch will take longer than planned or is considered ideal. Although we have mitigated this risk and feel confident that this is unlikely to occur, it is not 100% risk free.

RFC: API

Problem

Libero needs to allow for services to read and modify data in other services.

Suggestion

Use standard HTTP methods (GET, POST, PUT, DELETE) as a mechanism to read and modify content
Adhere to HTTP, eg:
- meaningful status codes
- Content encoding negotiation
- Language negotiation
Use RAML to describe APIs
Split entities into multiple endpoints, rather than having monolithic responses
- eg separate article front matter, body, reference list
Contract first rather than implementation
- Allows replacement of implementations

Concerns

RAML is an opinionated choice, but other existing options/standards don’t meet our requirements (eg OpenAPI)
- eg OpenAPI is purely JSON focused, and doesn’t allow for composing schemas
Doesn’t appear to be a way to actually use RAML during testing, which reduces its value a lot

RFC: Container Image Tagging Convention

Definition of Done:

adr with proposed state is merged and can be applied to Libero Reviewer for evaluation

Originating discovery issue elife/5482

Tasks

define tag lifecycle from component to umbrella and from dev to release
define tag naming and versioning convention
check for obvious implementation roadblocks when using GithubAction, Dependabot/Renovate/Flux
collect feedback from product teams

RFC: Event bus

Problem

Libero needs to allow for disparate services to indirectly communicate with each other, so they can monitor and react to data changes.

Suggestion

Use RabbitMQ (AMQP 0.9.1 standard + publisher confirms extension) to allow for services to reliably emit notifications that something has happened
Define a standard for routing keys
- Namespacing
Services can act both in an upstream and downstream role
- Upstream services own and send messages to their ‘topic’ exchange(s)
- Downstream services bind their queue(s) to one or more exchanges
- Downstream services consume messages from their queue(s)

Concerns

RabbitMQ is an opinionated choice, but other existing options/standards don’t meet our requirements (AMQP has two flavours, and doesn’t a performant way of confirming publication)

Define RFC for Continuous Delivery

Currently the walking skeleton does not define an automated build to validate pull requests, nor it does deploy to a production-like environment. A set of tools and technologies to achieve this has to be evaluated and proposed to be get in place for the August sprint.

RFC: Discovery documents

Problem

Libero product features and architectural choices often go through a period of discovery where specifications can be produced and implementation options can be evaluated.

The process and its output can benefit from the standardization of discovery documents that enable collaboration by being shared between team members, across teams, and outside of eLife.

Suggestion

Always track discovery work in at least one dedicated Github issue.
Prefix Google Docs discovery documents titles with Discovery document:
Move them into the single Libero shared drive.
Link them from the relevant Github issue.
Establish a timebox in calendar time at the start of the discovery activity.
Organize in questions and answers; some of the answers are allowed to be assumption if clearly stated that they are.
Link to a Github milestone if applicable.
Add a TL;DR section at the top before closing the document at the end of the timebox.

Concerns

A timebox helps limiting analysis time to favor prototyping and actual implementation.
Replenishment meetings are a possibility for choosing the lenght of the timeboxes.
Timeboxes are measured in calendar time so they need to take into account the percentage of time that is allocated to other tasks from the people that work on the discovery.
Github milestones may not exist if the discovery is related to an aspect of the product or to architecture rather than to a particular feature.
There is no explicit way to track the closure of a document on the document itself.

Define RFC for repository structure

https://github.com/libero/walking-skeleton is a monorepo, but in the future it could probably be split into independent repositories. If, when and how to do this has to be proposed.

Application Architecture starting point

From the Initial discussions Issue we suggested an ideal general software architecture requirements. What I don't see defined yet is our components architecture starting point. How do we plan our components and process to look like, workflows, dashboard, publication process? I believe we have eLife's process as a starting point, but being flexible about it. I would suggest publishing a starting software architecture diagram which would contain a smaller set of applications that will support an initial version of Libero.

RFC: Community and Code Basics

Problem

We need to define some basic processes and conventions early so that engagement in the community and code is easier. We have learned a lot from helping people reuse software at eLife but also from our work with Hindawi, YLD, ThinSlices and the Coko Foundation on our other products. We propose the following as a starting point based on the output of a team session here at eLife. Feedback welcome.

Suggestion

Basics

Libero is the name of the platform of applications, tools and services
Individual components, applications, tools and services will have their own name that is descriptive and relevant to the solution they are providing
The name Continuum is no longer in use
Libero and some eLife specific applications power eLife's Journal
Libero will be composed of parts of the existing system that powers eLife's Journal along with new components and changed versions of those existing components
Libero is a suite of defined interfaces, libraries and applications. The applications are built from the set of Libero libraries, and can freely be replaced
Use Github
Use the libero Github organisation
eLife are stewards of the code and community but changes are all collaborative

Decision Making

Is done in the open, on github, using issues and pull requests as a way to make suggestions and discuss solutions before making a decision
Larger decisions are done using a Request for Comments (RFC) process, using issues and a format similar to this RFC.
The community will endeavour to meet at least four times per year

Code and Deployment

MIT licensed by eLife Sciences Publications, Ltd as stewards of the community and code
Primary languages are PHP for browser-facing applications and Python for the rest. JavaScript and SASS in the browser.
Unit testing is required, TDD is preferred
All libraries, components, applications, services and APIs should be versioned
A "Container-first" approach to sharing and deployment
Deployment and tools are platform agnostic (i.e. not only AWS)

High Level Feature Considerations

Versioned content is required
Previewable content is required
Multiple content and front-end languages are going to be required
Custom content is required
Event reporting and auditing

Concerns

We are concerned that laying out these ideas now might seem like eLife want to dictate the direction of the product and community - this is not the case and hope that this RFC is taken in the spirit in which it is intended - to provide a useful starting point.

Initial Issues for Discussion: Technology

Problem

We have identified a number of specific areas that require discussion within the community, and probably some investigation or experimentation, before an RFC can be created. Below are the areas suggested for further discussion with some initial thoughts.

Suggestion

Components versus Software

Components are more complicated but allow for flexibility
- Everything (or most things) can be optional
- Things can be replaced if you need access to legacy content, for example
Provide ready-made apps/containers with the core modules enabled and the ability to change the basics through configuration
- Front-end probably won't be very reusable as soon as you want your own design or an extra page then components would be needed

Choice of Languages and Frameworks

Opted for PHP and Python
PHP for web frontend, Python for backends
We don't want adopters to have to know too many languages
Open source languages that can be deployed on Linux
Relatively easy to find developers for these languages
How do we choose frameworks and libraries?
Best tool for each job versus having too many tools
Define an approach to evaluating third-party libraries

Making Everything Flexible

Flexible, composable, extensible - define these and decide which are most useful, how they overlap etc. in this context
"Build your own schemas"
- Core is very small
- Based on standards but not coupled to them
- Should be easy to add your own or replace existing with your own

Versioning of API Calls and Backwards Compatibility

SemVer everywhere including the front-end?
- Gives stability over features
- Version things together or individually?
Can API call versioning be avoided?
- Changing your data structure would break backwards compatibility
- What about upgrading an extension that has a break in backwards compatibility?

Release Schedules

Should we have LTS versions
- For individual projects or as a whole?
- e.g. Node.js 6.x/8.x vs Ubuntu 18.04
Do we cut releases for components or working software?
How does Docker play a role in this?

Issue Tracking

In a single place, per repository or a mix?
- Per project keeps close to the code but harder to see an overall view
- Others have used dedicated issues repos for example https://github.com/puli/issues
- Standardised use of labels
  - An automated repository "butler" would be useful, for example https://github.com/carsonbot/carsonbot

Merge Policy

Automated Restrictions
People based restrictions - e.g. author cannot merge their own requests

Scalability

The applications will need to scale up and down for different potential users
The scaling for each application will vary based on use case
- One publisher might have low traffic but publish many articles
- One publisher might have very popular articles but publish very little
- One publisher might publish infrequently and have few visitors

Ease of Operation

How many people do you need to look after the platform when it is running?

Support Channels

A few options
- Issues
- Slack, Mattermost, Gitter
- Stack Overflow / Stack Exchange
- Google Group / Mailing list
- In person - who, how often, how are costs covered?

Concerns

There are too many issues to discuss as an RFC, but splitting them may resulting too many RFCs and low levels of engagement.
RFCs may also be interrelated in an initial stage, so it can be difficult to decide everything independently from everything else
A successful approach is to appointing "working groups" on particular topics but this can be costly in terms of time

RFC: JATS support

Problem

Libero's data model is planned to support schemas like JATS (see #11), but developing solely based on Libero's native schema is slow (as it doesn't exist yet). We have to model all the possibilities, which is not amenable to rushing (especially as we don't want to version it). For example, libero/publisher#5 would require a lot of schema work.

Users like eLife will need non-JATS content, but IJM only have scholarly content and have already investigated converting their archive to JATS.

Suggestion

Commit to supporting JATS now and prioritise it.
Continue to build-up Libero's schema, based on the JATS support that is implemented (ie as non-blocking, possibly follow-up, work).

Concerns

What JATS to support. JATS4R have been making progress on recommendations but isn't comprehensive and might still be too open. DAR seems too strict.
How to support multiple versions of JATS (and flavours?), including the rumoured 2.0.
How to handle assets. eLife XML just references a TIF (without an actual URI), whereas we'd want a IIIF endpoint.
Complexity of supporting multiple schemas across all services. (This is an existing concern, but doing it now does bring it to the forefront.)

RFC: High-level architecture

Problem

Libero will be a framework for producing publication and presentation platforms. It needs to:

scale up and down
allow for a high degree of customisation and extension: allowing users to share features, as well as catering for the bespoke
be deployable anywhere
easily integrate with other systems

Suggestion

Use a (micro)service-oriented architecture
- Encapsulate features in separate applications
Overall choreography, local orchestration
- Communication between services is decoupled and processing is asynchronous
- Inside services it can be neither
Agnostic to where content originates and is owned
- Source of truth can be elsewhere
Specify communication methods and integration points, allowing for extension
- Require a small core of structure and data
- Provide modules for common features
- “Build your own”
Allow integration with external systems
- API
- Event bus
- Message schema
Use existing standards where possible

Concerns

Distributed systems are more complex to understand, develop and run
- Barrier to entry
- Observability: distributed logging and tracing
Too-high level of abstraction provides little value
Yet another data format to define and translate data into to satisfy the user experience

RFC: Node.js server side frameworks

Problem

As of January 2020, Libero Publisher uses Koa, while LIbero Reviewer is using NestJS and Express. As such there is currently no documentation or ADR as to the reasons for those choices and whether we continue with those choices.

Suggestion

The Reviewer team initially chose NestJS as it has GraphQL support as well as other useful features that ship as part of the framework. However, it seems to be proving boilerplate heavy and it might lesten the cognitive load on developers to drop it in favour of one framework (e.g. Express). Third party packages can be used to provide the desired features for GraphQL, logging and security amongst others that NestJS provides.
Compare Express and Koa and determine if we should keep using both and if so highlight reasons for using one or the other. Ideally we would want to keep using one framework per product.

RFC: Workflows

Problem

Libero needs to allow for services to perform groups of related tasks based on incoming data in the form of workflows.

The key criteria for a potential workflow system is:

avoid dependency on a specific service provider
minimal required system configuration
easy to deploy
has a developer friendly API to create workflows and tasks

Suggestion

Implement workflows using Airflow DAGs
Use Airflow as a standard solution for executing DAGs and their tasks
An Airflow instance must have a parent service and will be owned by this service
An Airflow instance will only have its DAGs triggered by its parent service
Airflow will emit notifications to the event bus

Concerns

Airflow is an opinionated choice, but far less so than other existing options which in addition, fail to meet our requirements criteria in some way
Can the communication method between a service and Airflow be separately specified, allowing for Airflow to be replaced with other implementations?
Airflow is primarily geared towards scheduled workflows and currently only has an experimental rest - API for external interaction, though this can be extended via Airflow’s extensible plugin system
An understanding of Airflow’s key concepts is required to actually create and manage workflows

RFC: Message schemas

Problem

Libero needs to allow for services to communicate with each other (either through the API or the event bus), these messages need a structure.

Suggestion

Use RELAX NG as a schema for the messages (both the HTTP API and the event bus)
Require the absolute minimum, and provide extensions for common concepts and allow custom extensions (including embedding other schemas, eg JATS)
Provide clear mappings between parts of other standards (eg JATS+JATS4R, TEI) and Libero definitions
Use XML namespaces instead of versioning schemas
- A breaking change requires a new namespace, the old method remaining available but deprecated (ie schemas are immutable)

Concerns

JSON is easier to work with, but doesn’t handle mixed content
RELAX NG is an opinionated choice, but other existing options/standards don’t meet our requirements (XML Schema is common, as are DTDs in publishing)
- XML Schema is more common than RELAX NG, but doesn’t have the same level of support for extensions
Non-eLife usage so far appears to be purely for JATS content (though this doesn’t meet eLife’s need)
Reinventing the wheel?
Is an immutable schema overcommitting for Libero at an alpha/beta stage?

RFC: Continuous Delivery

Problem

Libero needs continuous feedback on pull requests and release candidates, in terms of:

Style checks, static analysis and similar code-oriented tools being run
Containers and other artifacts being built successfully
Project, end-to-end (for components declared compatible by semantic-versioning), performance, etc. tests run successfully
Integration with multiple components versions (e.g. Python, PHP, Symfony, …)
Deployment to key target platforms being successful (e.g. container-based or not; GKE, EKS, AKS, on-premise)

The maintenance requirements are:

Being able to modify a build definition from source control (some Git repository) to foster contributions and reviews
Due to the microservice orientation, being able to extract tools and build steps from different projects to reduce duplication
If not necessary to run custom servers, avoiding operations work by outsourcing management of the infrastructure

Suggestion

Provide Travis CI builds for all projects, including both code and infrastructure needed to test it.

Concerns

Limited capability to customize the environment in which builds run, requiring tools to be installed on the fly on every build that needs them.
Duplication of build patterns and steps across different repositories is difficult to remove. Build files generation may be able to mitigate this.
The level of performance provided is not under our control, both in terms of underlying resources of a single build, reuse of resources created by previous builds, and number of concurrent builds that can be run.

Other evaluated solutions (comparison table):

Circle CI is less popular than Travis CI but has a promising model for customization of build environments (container-based). It has however a setup using remote Docker that makes it difficult to build with docker-compose e.g. mounting local volumes.
Jenkins and TeamCity are at their core unmanaged solutions oriented to private projects, which create the need for maintenance, opening and securing of a new build platform.
GoCD is subpar in the Pipeline as code area, requiring manual UI configuration or non-standard solutions to manage build definitions through source control.
Concourse is a niche product, still immature and tied to the Pivotal ecosystem.

RFC: Start development in one language

This is my personal suggestion as a community member, and not directly from eLife.

Background

eLife built their microservices using both PHP and Python. The division was based purely on who was writing each service.
For Libero, eLife proposed realigning language use so that PHP is for browser-facing services, and Python for APIs.
There’s no requirement to use the Libero code, it’s specification-first so Libero-namespaced code doesn’t have to be used (Hindawi have mentioned using PHP/Drupal for APIs and JavaScript for browser-facing services).
Long-term Libero should at least provide adapters (client libraries) for different languages.
One of the reasons for using microservices is the ability to (responsibly) use different languages. For eLife, and the known potential uses of Libero, there is no specific reason to use PHP or Python for a particular service: neither holds clear benefits over the other.
eLife doesn’t currently have an in-house Python specialist (currently hiring); the team overall has more PHP knowledge.
Choosing any language to develop in is opinionated: developers/groups outside eLife want to get involved/use Libero won’t necessarily know either language.
PHP is well known in the publishing world due to OJS and Drupal usage.
A small amount of code was written in both languages during the recent sprint.

Problems

Starting Libero in two languages will see a lot of duplication of code (eg API client, Event Bus client, logging, coding standards).
Current lack of Python specialism in the eLife office.
For a group who currently don’t use either language (eg RSC and Hindawi), it’s two to learn (if not rewriting).

Suggestions

Solely use PHP for Libero-namespaced code until at least the MVP is released.
Look at creating adapters in other languages (starting with Python) afterwards.
Be willing to use other languages when there is a suitable reason.

Concerns

Might accidentally create a monoculture, or at an impression of one.
Harder for the wider eLife team to contribute.
Unknown impact outside of eLife (might block contributions from some areas, though could help in others).
PHP has an image problem, though this isn’t justified with ‘modern’ PHP.

libero / community Goto Github PK

community's Introduction

Getting Started

Libero Publisher MVP Hypothesis statement

Hypothesis #1

Hypothesis #2

Hypothesis #3

Definitions of a simple or complex journal

community's People

Watchers

Forkers

community's Issues

Problem

Suggestion

Concerns

Problem

Suggestion

Concerns

Definition of Done:

Related

Tasks

Problem

Suggestion

Concerns

Problem

Suggestion

Concerns

Problem

Suggestion

Basics

Decision Making

Code and Deployment

High Level Feature Considerations

Concerns

Problem

Suggestion

Components versus Software

Choice of Languages and Frameworks

Making Everything Flexible

Versioning of API Calls and Backwards Compatibility

Release Schedules

Issue Tracking

Merge Policy

Scalability

Ease of Operation

Support Channels

Concerns

Problem

Suggestion

Concerns

Problem

Suggestion

Concerns

Problem

Suggestion

Problem

Suggestion

Concerns

Problem

Suggestion

Concerns

Problem

Suggestion

Concerns

Background

Problems

Suggestions

Concerns

Recommend Projects

Recommend Topics

Recommend Org

Jobs