GithubHelp home page GithubHelp logo

community's Introduction

Libero Publisher

Getting Started

Libero Publisher consists of a number of services, libraries and components so the best place to start is with the sample-configuration repository that contains a simple guide to cloning the latest containers and running them locally using docker-compose.

https://github.com/libero/sample-configuration

eLife use TravisCI for automated deployments to AWS instances (although currently as a single instance). The configuration and scripts for that can be found in the environments repository.

https://github.com/libero/environments

For further technical assistance please join the conversation on Slack at https://libero.pub/join-slack/

To find out more about our MVP please see below and look at the high level progress/plan on our roadmap at https://elifesci.org/roadmap

Libero Publisher MVP Hypothesis statement

eLife's mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. With Libero Publisher, we want to have an impact on scholarly publishing and demonstrate that our technology is reusable. We will know if there is a potential market fit for Libero Publisher when our MVP has confirmed the following 3 hypothesis:

Hypothesis #1

A simple journal can adopt Libero to publish their articles once our MVP is available.

Hypothesis #2

Libero provides a platform that can be extended so that larger, more complex journals can publish their articles using the platform.

Hypothesis #3

Libero can help journals lower their infrastructure and maintenance costs compared to dominant commercial platforms.

Definitions of a simple or complex journal

To help define a simple or complex journal we have looked at the differences in technology that would be required to produce Libero to publish and host different types of journal.

A simple journal is defined as having a low throughput of articles (less than 1 per week), a low amount of visitors (less than 50,000 a month) and displays only scholarly content that is uncomplicated in nature so supports only text, images, tables and mathematical forumlas. Articles are grouped by research category and one other way (collections, issues etc.).

Examples might include The International Journal of Microsimulation, Bioinorganic Chemistry and Applications or small society journals.

A complex journal is defined as having higher throughput (greater than 25 articles per week), a significant amount of visitors (greater than 500,000 a month) and scholarly content that is more complicated containing items like videos, IIIF images and custom sections (like plain language summaries or decision letters). Articles are supported by other editorial content, blog posts and news, and might be grouped in multiple ways including special editions, cross discipline collections or through specific channels.

Examples might include eLife, The BMJ or large society journals.

community's People

Contributors

bluerezz avatar davidcmoulton avatar diversemix avatar erkannt avatar giorgiosironi avatar nuclearredeye avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

davidcmoulton

community's Issues

RFC: API

Problem

Libero needs to allow for services to read and modify data in other services.

Suggestion

  • Use standard HTTP methods (GET, POST, PUT, DELETE) as a mechanism to read and modify content
  • Adhere to HTTP, eg:
    • meaningful status codes
    • Content encoding negotiation
    • Language negotiation
  • Use RAML to describe APIs
  • Split entities into multiple endpoints, rather than having monolithic responses
    • eg separate article front matter, body, reference list
  • Contract first rather than implementation
    • Allows replacement of implementations

Concerns

  • RAML is an opinionated choice, but other existing options/standards don’t meet our requirements (eg OpenAPI)
    • eg OpenAPI is purely JSON focused, and doesn’t allow for composing schemas
  • Doesn’t appear to be a way to actually use RAML during testing, which reduces its value a lot

RFC: High-level architecture

Problem

Libero will be a framework for producing publication and presentation platforms. It needs to:

  • scale up and down
  • allow for a high degree of customisation and extension: allowing users to share features, as well as catering for the bespoke
  • be deployable anywhere
  • easily integrate with other systems

Suggestion

  • Use a (micro)service-oriented architecture
    • Encapsulate features in separate applications
  • Overall choreography, local orchestration
    • Communication between services is decoupled and processing is asynchronous
    • Inside services it can be neither
  • Agnostic to where content originates and is owned
    • Source of truth can be elsewhere
  • Specify communication methods and integration points, allowing for extension
    • Require a small core of structure and data
    • Provide modules for common features
    • “Build your own”
  • Allow integration with external systems
    • API
    • Event bus
    • Message schema
  • Use existing standards where possible

Concerns

  • Distributed systems are more complex to understand, develop and run
    • Barrier to entry
    • Observability: distributed logging and tracing
  • Too-high level of abstraction provides little value
  • Yet another data format to define and translate data into to satisfy the user experience

RFC: Community and Code Basics

Problem

We need to define some basic processes and conventions early so that engagement in the community and code is easier. We have learned a lot from helping people reuse software at eLife but also from our work with Hindawi, YLD, ThinSlices and the Coko Foundation on our other products. We propose the following as a starting point based on the output of a team session here at eLife. Feedback welcome.

Suggestion

Basics

  • Libero is the name of the platform of applications, tools and services
  • Individual components, applications, tools and services will have their own name that is descriptive and relevant to the solution they are providing
  • The name Continuum is no longer in use
  • Libero and some eLife specific applications power eLife's Journal
  • Libero will be composed of parts of the existing system that powers eLife's Journal along with new components and changed versions of those existing components
  • Libero is a suite of defined interfaces, libraries and applications. The applications are built from the set of Libero libraries, and can freely be replaced
  • Use Github
  • Use the libero Github organisation
  • eLife are stewards of the code and community but changes are all collaborative

Decision Making

  • Is done in the open, on github, using issues and pull requests as a way to make suggestions and discuss solutions before making a decision
  • Larger decisions are done using a Request for Comments (RFC) process, using issues and a format similar to this RFC.
  • The community will endeavour to meet at least four times per year

Code and Deployment

  • MIT licensed by eLife Sciences Publications, Ltd as stewards of the community and code
  • Primary languages are PHP for browser-facing applications and Python for the rest. JavaScript and SASS in the browser.
  • Unit testing is required, TDD is preferred
  • All libraries, components, applications, services and APIs should be versioned
  • A "Container-first" approach to sharing and deployment
  • Deployment and tools are platform agnostic (i.e. not only AWS)

High Level Feature Considerations

  • Versioned content is required
  • Previewable content is required
  • Multiple content and front-end languages are going to be required
  • Custom content is required
  • Event reporting and auditing

Concerns

We are concerned that laying out these ideas now might seem like eLife want to dictate the direction of the product and community - this is not the case and hope that this RFC is taken in the spirit in which it is intended - to provide a useful starting point.

RFC: Message schemas

Problem

Libero needs to allow for services to communicate with each other (either through the API or the event bus), these messages need a structure.

Suggestion

  • Use RELAX NG as a schema for the messages (both the HTTP API and the event bus)
  • Require the absolute minimum, and provide extensions for common concepts and allow custom extensions (including embedding other schemas, eg JATS)
  • Provide clear mappings between parts of other standards (eg JATS+JATS4R, TEI) and Libero definitions
  • Use XML namespaces instead of versioning schemas
    • A breaking change requires a new namespace, the old method remaining available but deprecated (ie schemas are immutable)

Concerns

  • JSON is easier to work with, but doesn’t handle mixed content
  • RELAX NG is an opinionated choice, but other existing options/standards don’t meet our requirements (XML Schema is common, as are DTDs in publishing)
    • XML Schema is more common than RELAX NG, but doesn’t have the same level of support for extensions
  • Non-eLife usage so far appears to be purely for JATS content (though this doesn’t meet eLife’s need)
  • Reinventing the wheel?
  • Is an immutable schema overcommitting for Libero at an alpha/beta stage?

RFC: Node.js server side frameworks

Problem

As of January 2020, Libero Publisher uses Koa, while LIbero Reviewer is using NestJS and Express. As such there is currently no documentation or ADR as to the reasons for those choices and whether we continue with those choices.

Suggestion

  • The Reviewer team initially chose NestJS as it has GraphQL support as well as other useful features that ship as part of the framework. However, it seems to be proving boilerplate heavy and it might lesten the cognitive load on developers to drop it in favour of one framework (e.g. Express). Third party packages can be used to provide the desired features for GraphQL, logging and security amongst others that NestJS provides.
  • Compare Express and Koa and determine if we should keep using both and if so highlight reasons for using one or the other. Ideally we would want to keep using one framework per product.

RFC: JATS support

Problem

Libero's data model is planned to support schemas like JATS (see #11), but developing solely based on Libero's native schema is slow (as it doesn't exist yet). We have to model all the possibilities, which is not amenable to rushing (especially as we don't want to version it). For example, libero/publisher#5 would require a lot of schema work.

Users like eLife will need non-JATS content, but IJM only have scholarly content and have already investigated converting their archive to JATS.

Suggestion

  • Commit to supporting JATS now and prioritise it.
  • Continue to build-up Libero's schema, based on the JATS support that is implemented (ie as non-blocking, possibly follow-up, work).

Concerns

  • What JATS to support. JATS4R have been making progress on recommendations but isn't comprehensive and might still be too open. DAR seems too strict.
  • How to support multiple versions of JATS (and flavours?), including the rumoured 2.0.
  • How to handle assets. eLife XML just references a TIF (without an actual URI), whereas we'd want a IIIF endpoint.
  • Complexity of supporting multiple schemas across all services. (This is an existing concern, but doing it now does bring it to the forefront.)

Application Architecture starting point

From the Initial discussions Issue we suggested an ideal general software architecture requirements. What I don't see defined yet is our components architecture starting point. How do we plan our components and process to look like, workflows, dashboard, publication process? I believe we have eLife's process as a starting point, but being flexible about it. I would suggest publishing a starting software architecture diagram which would contain a smaller set of applications that will support an initial version of Libero.

RFC: Start development in one language

This is my personal suggestion as a community member, and not directly from eLife.

Background

  • eLife built their microservices using both PHP and Python. The division was based purely on who was writing each service.
  • For Libero, eLife proposed realigning language use so that PHP is for browser-facing services, and Python for APIs.
  • There’s no requirement to use the Libero code, it’s specification-first so Libero-namespaced code doesn’t have to be used (Hindawi have mentioned using PHP/Drupal for APIs and JavaScript for browser-facing services).
  • Long-term Libero should at least provide adapters (client libraries) for different languages.
  • One of the reasons for using microservices is the ability to (responsibly) use different languages. For eLife, and the known potential uses of Libero, there is no specific reason to use PHP or Python for a particular service: neither holds clear benefits over the other.
  • eLife doesn’t currently have an in-house Python specialist (currently hiring); the team overall has more PHP knowledge.
  • Choosing any language to develop in is opinionated: developers/groups outside eLife want to get involved/use Libero won’t necessarily know either language.
  • PHP is well known in the publishing world due to OJS and Drupal usage.
  • A small amount of code was written in both languages during the recent sprint.

Problems

  • Starting Libero in two languages will see a lot of duplication of code (eg API client, Event Bus client, logging, coding standards).
  • Current lack of Python specialism in the eLife office.
  • For a group who currently don’t use either language (eg RSC and Hindawi), it’s two to learn (if not rewriting).

Suggestions

  • Solely use PHP for Libero-namespaced code until at least the MVP is released.
  • Look at creating adapters in other languages (starting with Python) afterwards.
  • Be willing to use other languages when there is a suitable reason.

Concerns

  • Might accidentally create a monoculture, or at an impression of one.
  • Harder for the wider eLife team to contribute.
  • Unknown impact outside of eLife (might block contributions from some areas, though could help in others).
  • PHP has an image problem, though this isn’t justified with ‘modern’ PHP.

RFC: Discovery documents

Problem

Libero product features and architectural choices often go through a period of discovery where specifications can be produced and implementation options can be evaluated.

The process and its output can benefit from the standardization of discovery documents that enable collaboration by being shared between team members, across teams, and outside of eLife.

Suggestion

  • Always track discovery work in at least one dedicated Github issue.
  • Prefix Google Docs discovery documents titles with Discovery document:
  • Move them into the single Libero shared drive.
  • Link them from the relevant Github issue.
  • Establish a timebox in calendar time at the start of the discovery activity.
  • Organize in questions and answers; some of the answers are allowed to be assumption if clearly stated that they are.
  • Link to a Github milestone if applicable.
  • Add a TL;DR section at the top before closing the document at the end of the timebox.

Concerns

  • A timebox helps limiting analysis time to favor prototyping and actual implementation.
  • Replenishment meetings are a possibility for choosing the lenght of the timeboxes.
  • Timeboxes are measured in calendar time so they need to take into account the percentage of time that is allocated to other tasks from the people that work on the discovery.
  • Github milestones may not exist if the discovery is related to an aspect of the product or to architecture rather than to a particular feature.
  • There is no explicit way to track the closure of a document on the document itself.

RFC: Migrate away from Texture

Problem

Developing new features for Libero Editor is proving significantly more difficult than it should, and the development team are finding it increasingly more effort to implement new functionality into the tool.

The problems that the development team have encountered are as follows:

  • Lack of documentation or sample code for both Texture and the Substance library that it is built upon.
  • Very little use of existing libraries or frameworks means large amounts of code must be written to achieve trivial things.
  • The code base is overly abstracted, which makes it overly hard to follow or understand. This is particularly an issue when trying to implement a complex new feature.
  • The tool cannot be configured without making changes to the code.

These problems have given rise to the following concerns:

  • Lack of active development on either the Substance library or Texture will result in a large maintenance burden on the development team.
  • Unable to make use of existing frameworks, e.g. React, without making significant alterations to the underlying code.
  • The ease at which new developers can be on-boarded and make contributions to the project.
  • Implementation of complicated new functionality, e.g. Tracked Changes or Collaborative editing, will be extremely costly in effort to achieve.

Suggestion

After evaluation of other potential solutions, we recommend to restart development of the editor using ProseMirror.

We believe that actioning the above address the problems raised as follows:

  • ProseMirror has been proven in production environments, most notably in Atlasian’s AtlasKit. It is also used in several other smaller projects, such as Manuscripts.io, including many that are currently in development.
  • ProseMirror is still actively developed, and has an active community around it.
  • ProseMirror is well documented, with plenty of documentation discussing architecture and design, lots of sample code on how to implement various bits of functionality as well as TypeScript definition files.
  • Integrates well with existing frameworks, e.g. React.
  • There is plenty of existing code and plugins out there that we can potentially leverage to accelerate development and contribute back too. E.g. the Manuscripts.io developers have already developed a JATS plugin.
  • Complex functionality, for example tracked changes and collaborative editing, are significantly easier to implement.

Concerns

Although we have had success in rapidly prototyping with ProseMirror, that developing a new editor from scratch will take longer than planned or is considered ideal. Although we have mitigated this risk and feel confident that this is unlikely to occur, it is not 100% risk free.

Initial Issues for Discussion: Technology

Problem

We have identified a number of specific areas that require discussion within the community, and probably some investigation or experimentation, before an RFC can be created. Below are the areas suggested for further discussion with some initial thoughts.

Suggestion

Components versus Software

  • Components are more complicated but allow for flexibility
    • Everything (or most things) can be optional
    • Things can be replaced if you need access to legacy content, for example
  • Provide ready-made apps/containers with the core modules enabled and the ability to change the basics through configuration
    • Front-end probably won't be very reusable as soon as you want your own design or an extra page then components would be needed

Choice of Languages and Frameworks

  • Opted for PHP and Python
  • PHP for web frontend, Python for backends
  • We don't want adopters to have to know too many languages
  • Open source languages that can be deployed on Linux
  • Relatively easy to find developers for these languages
  • How do we choose frameworks and libraries?
  • Best tool for each job versus having too many tools
  • Define an approach to evaluating third-party libraries

Making Everything Flexible

  • Flexible, composable, extensible - define these and decide which are most useful, how they overlap etc. in this context
  • "Build your own schemas"
    • Core is very small
    • Based on standards but not coupled to them
    • Should be easy to add your own or replace existing with your own

Versioning of API Calls and Backwards Compatibility

  • SemVer everywhere including the front-end?
    • Gives stability over features
    • Version things together or individually?
  • Can API call versioning be avoided?
    • Changing your data structure would break backwards compatibility
    • What about upgrading an extension that has a break in backwards compatibility?

Release Schedules

  • Should we have LTS versions
    • For individual projects or as a whole?
    • e.g. Node.js 6.x/8.x vs Ubuntu 18.04
  • Do we cut releases for components or working software?
  • How does Docker play a role in this?

Issue Tracking

Merge Policy

  • Automated Restrictions
  • People based restrictions - e.g. author cannot merge their own requests

Scalability

  • The applications will need to scale up and down for different potential users
  • The scaling for each application will vary based on use case
    • One publisher might have low traffic but publish many articles
    • One publisher might have very popular articles but publish very little
    • One publisher might publish infrequently and have few visitors

Ease of Operation

  • How many people do you need to look after the platform when it is running?

Support Channels

  • A few options
    • Issues
    • Slack, Mattermost, Gitter
    • Stack Overflow / Stack Exchange
    • Google Group / Mailing list
    • In person - who, how often, how are costs covered?

Concerns

  • There are too many issues to discuss as an RFC, but splitting them may resulting too many RFCs and low levels of engagement.
  • RFCs may also be interrelated in an initial stage, so it can be difficult to decide everything independently from everything else
  • A successful approach is to appointing "working groups" on particular topics but this can be costly in terms of time

RFC: Continuous Delivery

Problem

Libero needs continuous feedback on pull requests and release candidates, in terms of:

  • Style checks, static analysis and similar code-oriented tools being run
  • Containers and other artifacts being built successfully
  • Project, end-to-end (for components declared compatible by semantic-versioning), performance, etc. tests run successfully
  • Integration with multiple components versions (e.g. Python, PHP, Symfony, …)
  • Deployment to key target platforms being successful (e.g. container-based or not; GKE, EKS, AKS, on-premise)

The maintenance requirements are:

  • Being able to modify a build definition from source control (some Git repository) to foster contributions and reviews
  • Due to the microservice orientation, being able to extract tools and build steps from different projects to reduce duplication
  • If not necessary to run custom servers, avoiding operations work by outsourcing management of the infrastructure

Suggestion

Provide Travis CI builds for all projects, including both code and infrastructure needed to test it.

Concerns

  • Limited capability to customize the environment in which builds run, requiring tools to be installed on the fly on every build that needs them.
  • Duplication of build patterns and steps across different repositories is difficult to remove. Build files generation may be able to mitigate this.
  • The level of performance provided is not under our control, both in terms of underlying resources of a single build, reuse of resources created by previous builds, and number of concurrent builds that can be run.

Other evaluated solutions (comparison table):

  • Circle CI is less popular than Travis CI but has a promising model for customization of build environments (container-based). It has however a setup using remote Docker that makes it difficult to build with docker-compose e.g. mounting local volumes.
  • Jenkins and TeamCity are at their core unmanaged solutions oriented to private projects, which create the need for maintenance, opening and securing of a new build platform.
  • GoCD is subpar in the Pipeline as code area, requiring manual UI configuration or non-standard solutions to manage build definitions through source control.
  • Concourse is a niche product, still immature and tied to the Pivotal ecosystem.

RFC: Container Image Tagging Convention

Definition of Done:

  • adr with proposed state is merged and can be applied to Libero Reviewer for evaluation

Related

Originating discovery issue elife/5482

Tasks

  • define tag lifecycle from component to umbrella and from dev to release
  • define tag naming and versioning convention
  • check for obvious implementation roadblocks when using GithubAction, Dependabot/Renovate/Flux
  • collect feedback from product teams

RFC: Event bus

Problem

Libero needs to allow for disparate services to indirectly communicate with each other, so they can monitor and react to data changes.

Suggestion

  • Use RabbitMQ (AMQP 0.9.1 standard + publisher confirms extension) to allow for services to reliably emit notifications that something has happened
  • Define a standard for routing keys
    • Namespacing
  • Services can act both in an upstream and downstream role
    • Upstream services own and send messages to their ‘topic’ exchange(s)
    • Downstream services bind their queue(s) to one or more exchanges
    • Downstream services consume messages from their queue(s)

Concerns

  • RabbitMQ is an opinionated choice, but other existing options/standards don’t meet our requirements (AMQP has two flavours, and doesn’t a performant way of confirming publication)

RFC: Workflows

Problem

Libero needs to allow for services to perform groups of related tasks based on incoming data in the form of workflows.

The key criteria for a potential workflow system is:

  • avoid dependency on a specific service provider
  • minimal required system configuration
  • easy to deploy
  • has a developer friendly API to create workflows and tasks

Suggestion

  • Implement workflows using Airflow DAGs
  • Use Airflow as a standard solution for executing DAGs and their tasks
  • An Airflow instance must have a parent service and will be owned by this service
  • An Airflow instance will only have its DAGs triggered by its parent service
  • Airflow will emit notifications to the event bus

Concerns

  • Airflow is an opinionated choice, but far less so than other existing options which in addition, fail to meet our requirements criteria in some way
  • Can the communication method between a service and Airflow be separately specified, allowing for Airflow to be replaced with other implementations?
  • Airflow is primarily geared towards scheduled workflows and currently only has an experimental rest - API for external interaction, though this can be extended via Airflow’s extensible plugin system
  • An understanding of Airflow’s key concepts is required to actually create and manage workflows

Define RFC for Continuous Delivery

Currently the walking skeleton does not define an automated build to validate pull requests, nor it does deploy to a production-like environment. A set of tools and technologies to achieve this has to be evaluated and proposed to be get in place for the August sprint.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.