thefrontside / playhouse Goto Github PK

View Code? Open in Web Editor NEW

21.0 12.0 14.0 135.81 MB

Frontside's Backstage Portal

Home Page: https://backstage.frontside.services/

JavaScript 5.18% HTML 0.57% TypeScript 93.74% Dockerfile 0.23% Smarty 0.29%

backstage

playhouse's Introduction

Backstage

To start the Backstage app with Postgres via docker compose, run:

docker-compose up
yarn install
yarn dev

Running with simulation

We're using the Auth0 Simulator for local development.

It's automatically started by yarn dev.

Keep an eye for user information in the log output.

Running with production config

Running with simulation should be used in most situations. However, there are times where using the production config is necessary to debug, e.g. specific Auth0 configuration issues.

With access to the Frontside 1Password vault and the 1Password CLI, you may inject the secrets into a .gitignored config file and use the yarn dev:config command to pick up these values.

op inject -i app-config.1password.yaml.tpl -o app-config-credentials.yaml

Running in minikube

TAG=whatever

yarn install
yarn tsc
yarn build
yarn build-image --tag backstage:$TAG

minikube start
eval $(minikube docker-env)
minikube image load backstage:$TAG

# TODO - need to modify the charts so that the following steps are automated but for now please do the following:
* 💥 comment out ./backstate/templates/certificate.yaml
* 💥 move ./backstage/templates/secrets.yaml to another directory
* 💥 comment out `volumeMounts` and `volumes`
* 💥 modify container command to exclude `app-config.production.yaml`

PG=whatever

helm upgrade --install min-postgres-chart ./charts/postgres \
  -f ./charts/postgres/Values.yaml \
  --set postgresUsername=$PG \
  --set postgresPassword=$PG

POSTGRES_SERVICE_PORT=5432 POSTGRES_USER=$PG POSTGRES_PASSWORD=$PG helm upgrade --install min-backstage-chart ./charts/backstage \
  -f ./charts/backstage/Values.yaml \
  --set backstageImage=backstage:$TAG \
  --set baseUrl=http://localhost:7007

kubectl port-forward svc/backstage 7007:80

playhouse's People

Contributors

Stargazers

Watchers

Forkers

nilsty humanitec johanneswuerbach blackbaud-leowanderer ddovbii webark yakshavingcatherder alodela awanlin kunickiaj kurtaking testruction davidfestal

playhouse's Issues

Failed preview publish without clear reason

We're getting failed preview publish steps.

It's not clear why this is happening

You can see it here https://github.com/thefrontside/backstage/runs/7288397238?check_suite_focus=true

It fails here https://github.com/thefrontside/actions/blob/d5a4b48b6f3f3dc5daf7be88356a224bbee6de71/publish-pr-preview/src/publish.ts#L105

How can we get better feedback on why it's failing?

Show CLI Command Help on CLI Download Page

Spike authenticating CLI

Once we have secure endpoints for running platform API commands, we need a way to be able to authenticate and store credentials from the command line.

Releasing Effection Inspector plugin for Backstage

In effort to bring visibility into Backstage server process, we need to be able to see the Effection Inspector in Backstage. We can do this by creating a Backstage plugin that will provide render component in @effection/inspector-ui package.

[Incremental Ingestion Backend] We shouldn't commit when there are no entities

Incremental Ingestion Backend should only attempt to insert marks when there are entities to mark. In our close source version, we added the following code

    if (entities.length > 0) {
      await tx('ingestion.ingestion_mark_entities').insert(
        entities.map(entity => ({
          id: v4(),
          ingestion_mark_id: markId,
          ref: stringifyEntityRef(entity.entity),
        })),
      );
    }

[GraphQL Plugin] Add ability for post-/pre- processing values

Motivation

Some times you want to modify a value that comes from the backstage catalog. You probably would add a resolver for specific field, but you couldn't. It's because if you use @field or @relation directives for that field, the graphql plugin will add a resolver and your resolver will conflict with it.

Approach

We can add pre and post hooks that duplicate schema structure:

const hooks = {
  pre: {
    Entity: (entity: Entity) => ({ ...entity, id: encodeEntityId(entity) }),
  post: {
    Component: {
      tag: (tag: string, entity: Entity) => ({ id: encodeId({ key: 'tag.label', value: tag }), label: tag })
    }
  }
}

Here you can see that in pre.Entity hook we add new field id to the entity. There is no reason to add field specific pre-hooks. But maybe it should be better to rename them to interfaces and fields hooks. For the post.Component.tag field we change a type from string to { id: string, label: string } and then new value is used as a result of resolver. The idea is use pre hook for adding/removing fields from source object and post for transforming a specific field value.

With graphql best practice fetching data at field level https://medium.com/paypal-tech/graphql-resolvers-best-practices-cd36fdbcef55 it makes difficult to use post hooks on fields with @relation directives. And pre hooks are more suitable for this case.

Alternative approach

Instead of having that structure we can allow to pass an array of transformers:

const preHooks = [
  {
    predicate: (interface: string, entity: Entity) => interface == 'Entity', /* isSpecificEntity(entity) */
    transform: (entity: Entity) => ({ ...entity, id: encodeEntityId(entity) })
  }
]
const postHooks = [
  {
    predicate: (field: string, interface: string, entity: Entity) => interface == 'Component' && field == 'tag',
    transform: (value: any, entity: Entity) => ({ id: encodeId({ key: 'tag.label', value: tag }), label: tag })
  }
]

I'd like the first variant, but, the second one is more flexible, it allows apply transformations according a data.

[Incremental Ingestion Backend] Create test harness & write tests

Make improvements for graphql plugin

During integration it to HP project I noticed that our graphql plugin has some flaws

It handles graphql schema directives only at runtime, but we want it to be able generate correct types from final schema and also provide final schema for clients
Under the hood it uses catalog client for a data loader, but we have a better and more performant batch loader
It doesn't allow to provide custom loader either. It's needed when you have multiple sources which aren't ingested into backstage catalog

RFC: Heroku-like experience for Internal Developer Platforms with Backstage

Summary

This RFC proposes creating a Backstage plugin for distributing a CLI and a Backstage UI plugging for managing runtime environments, secret management, environment management, and observability tooling. The CLI and Backstage UI will use a pluggable Platform API.

Motivation

Heroku-like experience in the Cloud Native ecosystem has been an elusive mirage. Those familiar with the experience of using Heroku often refer to it as the experience developers want while using their Internal Developer Platforms. Many innovations made Heroku experience memorable. Developers could create their applications and push code to a Git repository, where the code would be automatically built and deployed to the Heroku platform. When it took months for Ops to set up a database, Heroku gave developers the ability to add databases in minutes. Developers could add a database to their application, and the Heroku platform would automatically make the connection string for the application available through the environment variables. Developers could manage secrets for their applications that would be added as environment variables without storing sensitive information in the source code. Heroku made running ambitious applications easy and enjoyable. This is why it's been a lasting inspiration for developers building Internal Developer Platforms in the Cloud Native ecosystem.

Many features of the Heroku platform are now a standard architecture of Internal Developer Platforms. Most IDPs allow developers to push their code to a Git repository provider such as GitHub, GitLab, or BitBucket. When a new commit is pushed into a branch, it's automatically built by a Continuous Integration system like GitHub Actions, GitLab CI or BitBucket Pipelines. The CI workflow creates a container image and pushes it into a container registry. A Continuous Deployment service monitors the container registry and automatically deploys the container to the platform. Once the application runs on the platform, it's autoscaled by Kubernetes, which is managed by the platform engineering team. The platform is instrumented with observability tooling that allows developers to see their logs and metrics in tools like Kibana and Prometheus. Mass adoption of git-based workflows makes it possible to provision resources like databases using configurated stored as code with standards PAWS - Platform-Agnostic Workload Specification. Modern IDPs can now automatically provision resources based on configurations in a paws.yaml file.

The Cloud Native ecosystem matured to a point where in some respects, the developer experience of using the platform is superior to that of Heroku, but there is still a gap. This gap is in the interfaces that developers have to use to interact with their Internal Developer Platforms. Heroku allowed developers to interact with their platform using a single CLI and a unified UI. CLI and UI provided feature parity in features that were important to developers. A developer could create an application, clone environments, set environment variables, and view logs from the CLI. The CLI experience was project-code-centric - the project's working directory sets the CLI's context. When a developer invoked commands from the CLI, those commands automatically assumed the Heroku application based on the code in the working directory.

The experience of interacting with the IDP from the CLI with a context set by the working directory and the ability to perform all of the same operations via a Web UI is the last remaining piece in realizing a Heroku-like experience on Internal Developer Platforms.

The emergence of Backstage as a standard developer portal is the last missing piece in the Heroku-like experience puzzle. Before Backstage, each company created its portal using a variety of technologies and different architectures. Creating a reusable toolkit that companies could use off the shelf without a consistent architecture is difficult. Backstage provides a UI framework that can be used to create a reusable user experience in a web application and a pluggable server architecture that can be used to create a reusable server backend for the web application and CLI tool.

Approach

Features

The most common features of a Heroku-like DX are the ability to manage environments, manage secrets, view logs and metrics, and manage releases.

Managing Environments

Runtime environments for services are an important part of the developer's everyday life. Developers need to see where the application is running, know how to access the service, manipulate secrets and see logs for each environment. They need to be able to create ephemeral environments that automatically deploy from branches and provision new resources. In all of these use cases, the runtime environment is the context for each activity. This necessitates for Backstage to treat Runtime Environment as a first-class concept.

Manage Secrets

Most services require credentials and tokens to connect to external integrations. Each platform has a different way of managing these secrets. Some Internal Developer Platforms, especially those going through change, may have multiple secret stores or a different secret store for each environment. Managing these secrets can be a time consuming and error prone activity. To simplify the process of managing secrets and reduce error, developers should be able to manage secrets from the Backstage portal or the CLI.

View logs and metrics

When a service is failing or when doing troubleshoot, developers need to be able to find all of the information associated with a service without jumping around different systems. Logs and metrics are critical to debugging and learning about runtime behaviour of a service. This information needs to be available at their finger tips in Backstage or in the command line interface. They should be able to switch between different components and see the observability data immediately without doing any digging.

Manage releases

Shipping software is perhaps the most important activity in the entire software development lifecycle. How that software makes it to the users varies from organization to organization, from team to team and even from software to software. Empowering developers to ship their software in an reliable and fail-safe way without introducing unnecessary meetings and obstacles is perhaps one of the biggest contributors to improving organizational DevOps maturity and the experience of shipping software on an Internal Developer Platform. Backstage and the CLI tool have the opportunity to make it easier for organizations to ship their software by providing a flexible interface to deployment requirements on the Internal Developer Platform.

Architecture

One of this problem's biggest challenges is that each Internal Developer Platform is different. One platform may use AWS Secret Manager, and another may use Hashicorp Vault. Depending on the maturity of the IDP, it's not unusual to see two different technologies being used for the same purpose on different projects. Even though the goal is to standardize, not two platforms are alike, and their implementation changes over time. This requires designing the architecture in a way that allows maximizing reuse while being flexible to the decisions of each Internal Developer Platform.

We can define a clear interface between the client and the server. The clients - the Backstage UI components and the CLI, will use a clearly defined schema to communicate to the platform API. The platform API will expose an API that implements the schema, but the implementation of each Platform API will be platform specific. For example, regardless of whether the platform uses Hashicorp Vault or AWS Secrets Manager, the UI components and CLI will make the same requests to the Platform API. The Platform API will be responsible for making appropriate calls to the Hashicorp Vault or AWS Secrets Manager, depending on the platform's use.

Adaptors

Some tools in the Cloud Native ecosystem offer a lot of value when building an Internal Developer Platform. These tools typically provide their Platform APIs and accelerate the creation of a modern IDP. Humanitec is a perfect example of such a tool. It does the hard work of dynamically generating configuration, provisioning resources, and deploying applications. For many companies, platforms like Humanitec make up a big percentage of their Internal Developer Platform. The Adapter API will allow these tools to provide a shortcut to implementing the Platform API. Each Adapter will be able to provide one or more features.

TODO

Research: How do we compile the binary?
1. Download the deno binary from npm
2. Checks for existence of compiled files
3. If not there, compiles the files
How do we authenticate it?
1. Login command
2. Open browser
3. Follow authentication flow
4. Get token
5. Store it on file system
6. Use it to make requests
Implement Commands
- version - version of the CLI, URL of Backstage instance, version of Backstage used to compile it
- project
  - info - read catalog info entity idp info [--component]
  - environments
  - variables
  - releases
- templates

Workflows

Install Plugin
- Frontend
  - Download CLI Page
    - provide curl command
    - execute curl command from local machine
    - Download the binary (make it executable)
- Backend
  - Platform REST API

CLI Commands

version - version of the CLI, URL of Backstage instance, version of Backstage used to compile it
project
- info - read catalog info entity
- environments
- variables
- releases
templates
search

Unknowns

How do we compile the binary?
1. Download the deno binary from npm
2. Checks for existence of compiled files
3. If not there, compiles the files
How do we authenticate it?
1. Login command
2. Open browser
3. Follow authentication flow
4. Get token
5. Store it on file system
6. Use it to make requests
How do make determine context server configurable?
- As a developer working on the Backstage portal, I need to be able to configure how the CLI determines what is the component associated with the current working directory. Ideally, this would happen without having to release a new version of the CLI.

You might want to disable the Create tasks for unauthenticated users

I was able to create a repo (according to the logs, i cant find it but i guess it's private) https://github.com/thefrontside/test123.git

[GraphQL Plugin] Add support for GraphQL Cursor Connections Specification

Motivation

As a developer using the GraphQL API provided by the GraphQL plugin, I need to be able to query many entities with filtering. The standard for querying many entities via a GraphQL API is described in GraphQL Cursor Connections Specification. We need to allow querying records using Schema and resolvers that conform to GraphQL Cursor Connections Specification.

Approach

There is nothing stopping someone from implementing the connection specification manually. For example, if there is type called Repositories that has the following schema,

type Repository @extend(type: "Entity") {
  owner: User
  url: String
}

Someone familiar with GraphQL Cursor Connections Specification could write the following types to make it possible to query repositories that are owned by a specific user.

type PageInfo {
  startCursor: String!
  endCursor: String!
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
}

type RepositoryEdge {
  node: Repository!
  cursor: String!
}

type RepositoryConnection {
  edges: [RepositoryEdge]
  pageInfo: PageInfo!
}

extend User {
  repositories: RepositoryConnection
}

After they implement resolvers, they would be able to query records using the following query

{
  owner {
    repositories(first: 100) {
       edges {
          node {
            ... on Repository {
              url
            }
          }
       }
    }
   }
}

This process requires

Writing the correct schema
Writing the resolvers

We can make this process simpler by allowing to define these connection using a directive. Instead of all these steps, we'd do the following,

extend User {
  repositories: Repository @connection() @relation(type: "ownedBy")
}

This will automatically expand into the schema example schema and implement the necessary resolvers.

Implement `clone` command for IDP CLI

[GraphQL-Plugin] Support mapping source data to object type fields

For example we'd like to group some fields into specific object type:

type Address {
  zip: String!
  country: String!
  city: String!
  street: String!
  house: String!
  apartment: String
}

interface IPerson @extend(interface: "IUser", when: "kind", is: "Person") {
  address: Address
}

And entity's structure looks like:

interface PersonEntity extends Entity {
  kind: "Person";
  spec: {
    zipCode: string;
    country: string;
    city: string;
    street: string;
    house: string;
    apartment?: string;
    /* ...rest fields... */
  };
}

Right now @field directive allows map only primitive types, like string/numbers/arrays or objects 1:1. But there is no way to implement that case from above.

Github Webhook Entity Provider

Motivation

GitHub is an important source of information for a Backstage portal. It contains all project source code which is increasingly becoming the source of truth for the configuration of assets in an engineering organization ecosystem. A GitHub repository contains configurations of CI/CD systems, package dependencies, and deployment configurations to name just a few examples. The trends of treating a repository as a source of truth will only increase with the adoption of GitOps practices.

Backstage has several mechanisms used to interact with GitHub. The GithubURLReader used to read a single or multiple files from a repository via the GitHub REST API. Various GitHub processors in @backstage/plugin-catalog-backend-module-github package used to read organization information and discover repositories. GitHubEntityProvider can be used to pull groups and users for an organizations into Backstage. Each of these mechanisms provide some functionality but they don't cover all of the use cases for ingesting data from Backstage.

Further more, Custom Processors such as GithubDiscoveryProcessor are being deprecated in favour of Entity Providers. Entity Providers are replacing Custom Processors for ingestion because custom processing ingestion pipeline proved to be inefficient when processing data from large GitHub instances with hundreds of organizations and thousands of repositories. For large GitHub instances, custom processors resulted in long delays in processing because the custom processing pipeline would attempt to indiscriminately ingest every location regardless of wether the location has new data. Entity Providers are a more scalable approach because they allow reacting to change in GitHub instances without proactively looking for changes.

The most efficient way of ingesting data from large GitHub instances is to trigger an entity provider as a response to a webhook. You can find an example of this in #10. In the most rudimentary form, we trigger the provider's read function when a specific event is triggered. Here is an example of triggering read on GithubOrgEntityProvider when a person is added to a team or an organization.

  router.post('/github/webhook', async (req, _res) => {
    const event = req.headers["x-github-event"];
    if (event == "membership") {
      await githubOrgEntityProvider.read();
      env.logger.info("Successfully triggered database update via github webhook event");
    }
    // TODO: we should forward requests to smee for local development
  });

This code in this webhook will have to handle all of the entity providers that will interact with GitHub. The complexity of this webhook handler will grow as the number of entity providers in this webhook increases. It will become increasingly difficult to debug this webhook and will inevitably lead to confusion. I would like for us to get ahead of this by introducing an API to make it easier to write and debug entity provider for the GitHub webhook.

The specific APIs are TBD and @cowboyd will have very good opinions on the subject. I wanted to share some thoughts to get the ball rolling.

Detailed Design

Installation of a GitHub Webhook plugin

The GitHub Webhook Plugin will allow a developer to install the webhook as a regular Backstage plugin. This plugin will mount an express route that will receive events once the webhook is added to an organization in GitHub.

Debug-ability

The goal of this plugin is to make debugging easier by giving developers a way to inspect the behavior of the entity providers that are handling events received by the webhook. Visibility into execution of the webhook will be provided by the Effection Inspector. For Effection Inspector to show execution of the entity providers, each entity provider must be written as an Effection task.

TypeScript types

We want to make it as easy as possible to write strictly typed handlers for these events. The API for extending the webhook handler should use types from https://github.com/octokit/webhooks#importing-types to guide implementors in hooking into the webhook.

Extensibility

In addition to providing the Effection Inspector, Effection provides some useful building blocks for a custom API...

[Incremental Ingestion Backend] Clean up entity marks after a few ingestions

Our mark and sweep strategy creates a mark for each committed entity. In the case of a large entity provider like LDAP with 100k+ users, that'll be 100k marks for every ingestion. We never clean up previous marks, which bloats the database unnecessarily.

One possible solution is to create a task that will run on an interval to delete marks from previous ingestions. Another option could be to delete older marks when computing deleted entities.

Implement `logs` command in IDP CLI

call new API endpoint to retrieve variables information /idp/entities/:entityRef/logs
- pass entity ref to variables endpoint

Implement `variables` command in IDP CLI

call new API endpoint to retrieve variables information /idp/entities/:entityRef/variables
- pass entity ref to variables endpoint

Implement `environments` command in IDP CLI

call new API endpoint to retrieve environments information /idp/entities/:entityRef/environments
- pass entity ref to environments endpoint

Implement `releases` command in IDP CLI

call new API endpoint to retrieve variables information /idp/entities/:entityRef/releases
- pass entity ref to variables endpoint

Integrate GraphQL plugin into HP and help them to refactor resolvers

Spike download binary via plugin

[Incremental Ingestion Backend] Must JSON.parse response from remove entities query

In our client's project, we found that after applying solution in #92, removing entities was causing Malformed Entities to be committed. We traced it to remove query returning strings instead of JSON objects for entities. We added the following code to remove logic.

return removed.map(entity => {
  return { entity: JSON.parse(entity.entity) };
});

We need to add the same logic to our open source version of Incremental Ingestion Backend.

[GraphQL Plugin] Allow reading fields that have in the key `.`

There are lots of annotations that have comma in their keys. For example metadata.annotations.github.com/project-slug. We need to be able to retrieve this value with @field(at: "metadata.annotations.github.com/project-slug"). Currently, this doesn't work because github.com/project-slug goesn't get read correctly.

Implement templates `create` command in IDP CLI

/idp/:entityRef/create?
/idp/template:default/springboot/create

Rethinking the rollback function in the deployment workflow

In #3 we introduced a roll back step in the release workflow for when a new deployment of backstage is not successful. This will work fine in most cases but there will be a problem when we have two merge commits happen back-to-back and if the resulting image of both those pull requests are not healthy.

This problem will occur if the first deployment workflow is at the kubectl rollout status step (which lasts for 60 seconds for a bad deployment) and the second deployment workflow (running concurrently) runs helm upgrade --install min-backstage-chart.

The upgrade of the second workflow run will extend/prolong the kubectl rollout status step of the first workflow run.

At this point we'll have a chart history that looks like this:

good deployment 1
bad deployment 2 (first workflow run)
bad deployment 3 (second workflow run)

The failing result of the kubectl rollout status will rollback from bad deployment 3 to bad deployment 2 and the chart history will become:

good chart 1
bad deployment 2
bad deployment 3
bad deployment 2 (as 4)

We have the workflow configured to prevent parallel runs but the same problem persists if the first workflow gets past the helm upgrade --install min-backstage-chart step.

Secure backstage endpoints based on authenticated identity

We currently do not have any role assignment to various endpoints and so they are all unprotected. We need to not only a strategy for securing endpoints, but also a decision tree for which routes should be protected and which should be public. Including a policy when no decision has been made.

[Incremental Ingestion Backend] Add indexes from closed source version

We improved performance of the Incremental Ingestion Backend by adding indexes to the database. We need to add the same to our open source version.

/*
 * (C) Copyright 2022 HP Development Company, L.P.
 * Confidential computer software. Valid license from HP required for possession, use or copying.
 * Consistent with FAR 12.211 and 12.212, Commercial Computer Software,
 * Computer Software Documentation, and Technical Data for Commercial Items are licensed
 * to the U.S. Government under vendor's standard commercial license.
 */
/**
 * @param { import("knex").Knex } knex
 * @returns { Promise<void> }
 */
exports.up = async function (knex) {
  const schema = () => knex.schema.withSchema('ingestion');

  await knex.raw(
    `CREATE INDEX IF NOT EXISTS increment_ingestion_provider_name_idx ON public.final_entities ((final_entity::json #>> '{metadata, annotations, CHANGE_ME}'));`,
  );

  await knex.raw(`DROP VIEW IF EXISTS ingestion.current_entities`);

  await schema().alterTable('ingestions', t => {
    t.primary('id');
    t.index('provider_name', 'ingestion_provider_name_idx');
  });

  await schema().alterTable('ingestion_marks', t => {
    t.primary('id');
    t.index('ingestion_id', 'ingestion_mark_ingestion_id_idx');
  });

  await schema().alterTable('ingestion_mark_entities', t => {
    t.primary('id');
    t.index('ingestion_mark_id', 'ingestion_mark_entity_ingestion_mark_id_idx');
  });
};

/**
 * @param { import("knex").Knex } knex
 * @returns { Promise<void> }
 */
exports.down = async function (knex) {
  const schema = () => knex.schema.withSchema('ingestion');

  await schema().alterTable('ingestions', t => {
    t.dropIndex('provider_name', 'ingestion_provider_name_idx');
    t.dropPrimary('id');
  });

  await schema().alterTable('ingestion_marks', t => {
    t.dropIndex('ingestion_id', 'ingestion_mark_ingestion_id_idx');
    t.dropPrimary('id');
  });

  await schema().alterTable('ingestions_mark_entities', t => {
    t.dropIndex('ingestion_mark_id', 'ingestion_mark_entity_ingestion_mark_id_idx');
    t.dropPrimary('id');
  });

  await knex.raw(`DROP INDEX increment_ingestion_provider_name_idx;`);
};

WIP: POC of a new Catalog Github Module

Motivation

The current @backstage/plugin-catalog-backend-module-github is a mix of processors that evolved gradually because existing processors didn't satisfy all of the use cases. The result is a mishmash of functionality. It takes a non-trivial effort to figure out what each processor does and its limitations. As a result, each organization integrating with Github creates its version of GitHub processors. Instead, we want to have a consistent, predictable, and flexible plugin.

In this issue, I will define requirements for a POC for a new Github Plugin. We will use this POC to create an RFC in Backstage to introduce a more robust Github integration for Backstage.

Detailed Design

The new plugin will use architecture principles and a new naming convention.

Architecture Principles

A location and its URL is a root of a processing pipeline

Backstage catalog's ingestion pipeline aggregates and relates information from external systems. Backstage is responsible for processing data from a growing number of external integrations. As the number of integrations grows, so does the latency in the ingestion pipeline. An efficient ingestion pipeline aims to keep data up to date with as little latency as possible. To keep the processing latency down, the developers writing processors must design their processors to allow Backstage to optimize the processing. Backstage can optimize processing with caching and parallelization. Caching in Backstage processors is scoped to a location. Likewise, paralyzation is performed by concurrently processing locations. To reduce latency in the ingestion pipeline, developers must ensure that their processors can cache and paralyze processing based on a location. One sure way to increase the performance of your ingestion pipeline is by designing your ingestion to utilize locations.

Consider the following use case: we want to ingest all of the repositories of a Github Organization and show who's contributing to these repositories. We could write a processor that fetched a list of all repositories for the organization, iterated over returned repositories, and fetched all contributors for each repository. We would then emit each repository, relationship between repository and users, followed by inverse relationships to mark what repositories a user is contributing to.

This is a lot of work that needs to happen in a single processing job. If we encounter an error, the entire job can fail. If we handle the error gracefully, the entire job will get delayed. To improve the performance and resilience of this job, we can break it up into multiple smaller jobs by emitting a location for each repository.

The result is new locations in the catalog that can be paralyzed by the processing engine and processing of each location can be cached.

Naming Conventions

Discovery processors emit locations

Locations being such an important part of an efficient processing pipeline, it's important that we highlight where locations are created. Having a dedicated processor for emitting locations makes that very clear. The convention that I'm proposing is to designate the Discovery prefix to mean processors that emit locations. For example, GithubOrganizationDiscoveryProcessor would emit Github Organization locations. Likewise, GithubRepositoryDiscoveryProcessor would emit repositories that are owned by the organization or user.

Relevant Links

Discussion on naming convention in Backstage Discussions

Implement `workflows` command

[graphql-plugin] `@relation` directive should support `Connection` graphql type

Currently if you describe @relation directive it will resolve to an entity or an array of entities.

Setup Embedded Scaffolder workflow for onboarding docs

TODO

create template
- that creates pull request with docs directory with index.md file with # Hello World content
- adds metadata to catalog-info.yaml backstage.io/techdocs-ref: file:./docs
Replace missing metadata component with embedded scaffolder workflow that runs above template

Create Figma mockups for Environments/Variables/Logs

[GraphQL Plugin] README for GraphQL Plugin

TODO

Get local development environment working with Auth0 Simulator.

As a precursor to securing endpoints, we will need to be able to develop and test using a real authentication system. For this, we need to get an Auth0 Server up and running locally, and able to authenticate through backstage.

[Incremental Ingestion Backend] Incorrect query used for deletion

We found that the query we used to identify records to delete was incorrect. After ingestion, we were getting a query error.

We fixed the problem in the closed source version that we need to bring into the open source.

You need to change CHANGE_ENTITY_NAME.

        const removed: { entity: string; ref: string }[] = await tx('final_entities')
          .select(tx.ref('final_entity').as('entity'), tx.ref('refresh_state.entity_ref').as('ref'))
          .join(tx.raw('refresh_state ON refresh_state.entity_id = final_entities.entity_id'))
          .whereRaw(`((final_entity::json #>> '{metadata, annotations, CHANGE_ENTITY_NAME}')) = ?`, [
            provider.getProviderName(),
          ])
          .whereNotIn(
            'entity_ref',
            tx('ingestion.ingestion_marks')
              .join(
                'ingestion.ingestion_mark_entities',
                'ingestion.ingestion_marks.id',
                'ingestion.ingestion_mark_entities.ingestion_mark_id',
              )
              .select('ingestion.ingestion_mark_entities.ref')
              .where('ingestion.ingestion_marks.ingestion_id', id),
          );

[GraphQL Plugin] `viewer` query

Motivation

As a developer writing features that use the GraphQL API provided by the GraphQL Plugin, I need to be able to query for data of a user who's using the application. For example, I should be able to retrieve my components using the following query.

{
  viewer {
    components {
       id
    }
  }
}

This query should return components that are owned by the user who's querying.

Implement `info` command in idp CLI

--component=<entityRef>
- read catalog-info.yaml
- get git remote and lookup component by nameWithOwner.
- show error if unable to determine current component

Provide a way to clear useRunWorkflow state

We need a way to reset the state of useRunWorkflow.

It looks like we just need to reset taskId and taskStatus in here https://github.com/thefrontside/playhouse/blob/main/plugins/scaffolder-frontend-workflow/src/hooks/useRunWorkflow.ts#L23-L26C22

Show when Backstage dependencies get messed up

Backstage has lots of dependencies that all have to be in sync. Sometimes they get out of data and we don't notice.

We could run yarn backstage-cli versions:check to see if we got duplicate dependencies.

Making use of Webhooks with the `GithubOrgEntityProvider`

Context

The current GithubOrgEntityProvider fetches members and teams of an organization. There's only one function available for updating the database and it's read() - it fetches the users/teams and does a "full" mutation to the database.

There should be two ways of triggering an update - periodically and whenever there's a new change. In #10 I added a task scheduler to run read() once a day. And alongside the task scheduler, we also want to be able to trigger an update whenever a member/team is added or removed so that we don't need to potentially wait a day for the backstage instance to reflect the new updates.

Although GithubOrgEntityProvider only fetches data about members and teams, the Github App we're using with backstage is configured to deliver webhooks on not only members/teams events but also repository events.

What this means is if we configure our catalog builder like so:

// packagse/backend/src/plugins/catalog.ts
...
export default async function createPlugin(env) {
  ...
  const gitProvider = GithubOrgEntityProvider.fromConfig(env, options);
  builder.addEntityProvider(gitProvider);
  const { processingEngine, router } = await builder.build();
  await processingEngine.start();

  router.post("/github/webhook", async (req, res) => {
    await gitProvider.read();
    res.send("Success!");
  }

  return router;
}

The entity provider will do a full mutation for the members/teams several times a day even if the webhook event has nothing to do with members/teams.

Possible Solution

There could be a deltaRead() function that takes an argument of relevant entities:

// GithubOrgEntityProvider.ts

export class GithubOrgEntityProvider implements EntityProvider {
  ...
  async connect() {...};
  async read() {...};
  
  async deltaRead(newEntity) {
    if (newEntity) {
      await this.connection.applyMutation({
        type: "delta",
        entity: newEntity
      })
    }
  }
}

That way, in our catalog builder, we can invoke an update to the database only when we need it to:

// packagse/backend/src/plugins/catalog.ts
export default async function createPlugin(env) {
  ...
  router.post("/github/webhook", async (req, res) => {
    if ( req.event == "member" || req.event == "team" ) {
      await gitProvider.deltaRead(req.payload);
      res.send("Success!");
    }
  }
  ...
}

Backstage Harness starts GitHub api simulator

As convenient as that is, it is an unintended side effect of starting a backstage server that you will create a full blown GitHub api simulator.

We ran into this issue where we needed to write an extended GitHub API simulator and where mystified as to why it was always hanging (answer: there was already one running)

We should just remove this and let other compose it.

[GraphQL Plugin] Create sub-types based on the existing types

Current behavior

I've got a few use cases where I need to query specific properties of a Components and Resources based on the spec.type or have the ability to cast a type into a sub-type. For example, I've got a Pipeline resource like below:

{
  "apiVersion": "backstage.io/v1beta1",
  "kind": "Resource",
  "metadata": {
    "namespace": "default",
    "name": "my-pipeline",
  },
  "spec": {
    "type": "pipeline",
    "pipelineId": 1234,
    "runState": "succeeded"
  }
}

and created a type in my gql schema for it:

type Pipeline @extend(type: "Entity") {
  runState: String! @field(at: "spec.runState")
}

Running the following query doesn't return the runState property, just returns the name.

{
  entity(
    kind: "Resource"
    name: "my-pipeline"
  ) {
    name
    ... on Pipeline {
      runState
    }
  }
}

Expected behavior

Have the ability to define sub-types so we can prevent the Resource and the Component types to get bloated with properties belonging to all their different sub-types.

RFC: Composable Embeddable Scaffolder UI with Inline support

Motivation

We need a much more flexible Embeddable Scaffolder UI that supports two categories of use cases: micro-actions and customizations.

Micro Actions

Scaffolder workflows are a powerful tool that supports golden paths with reusable templates. They are commonly used to create service templates that incorporate the organization's best practices. These templates aim to provide developers with everything they need, including configuration for deployment, documentation, monitoring, and other platform features. The primary purpose of building these templates is to eliminate toil and reduce friction and cognitive load for developers. However, our current approach to providing these configurations introduces friction, making using templates more complicated and frustrating than necessary.

Let's take a few examples to see the difference in the approach:

Example 1: Relating a component to a system

A component can be part of a system. It's common to see an entity picker field in a Scaffolder Template that allows a user to choose a system that the component belongs to. The user's selection is added to catalog-info.yaml via the template. It seems intuitive, but we must ask ourselves: "Do we need this information when the component is being created? What happens when the user wants to update the system after creating the component?" The answer, in many cases, is "no."

Floating Form

Instead of designing complex form

A user can change a value by clicking on it. It will show a Scaffolder form.
Clicking on Save will trigger the workflow
While the PR is pending, it'll show a symbol with a tooltip
Hover will show that PR is pending
Clicking on the symbol will take the user to the PR

This is an example of the kind of workflow that we want to be able to implement with Embeddable Scaffolder Workflow.

Inline Actions

Some actions do not require user input. These actions could be triggered and visualized entirely inline.

Customizations

Custom chrome

chrome in this context refers to components around the form. In the following diagram, the chrome is the blue box. It includes the form's title, the description, and action buttons at the bottom.

We need to be able to completely replace these because some of our clients have strict accessibility requirements that we can not retrofit onto Material UI.

Custom Form

Many of our clients have internal design systems and component libraries that implement these design systems. Their developers are familiar with their design system, making Backstage feel like an internal tool. We want our clients to be able to use their internal component library for scaffolder forms.

This can be accomplished by creating a custom RJSF form component that provides the following templates and widgets.

Templates:

ArrayFieldItemTemplate
ArrayFieldTemplate
BaseInputTemplate
AddButton
CopyButton
MoveDownButton
MoveUpButton
RemoveButton
SubmitButton
DescriptionFieldTemplate
ErrorListTemplate
FieldErrorTemplate,
FieldHelpTemplate
FieldTemplate
ObjectFieldTemplate,
TitleFieldTemplate
WrapIfAdditionalTemplate,

Widgets:

CheckboxWidget
CheckboxesWidget
RadioWidget
RangeWidget
SelectWidget
TextareaWidget

Embedded Scaffolder Workflow should allow rendering with a custom Form component.

Approach

TBD

thefrontside / playhouse Goto Github PK

playhouse's Introduction

Backstage

Running with simulation

Running with production config

Running in minikube

playhouse's People

Contributors

Stargazers

Watchers

Forkers

playhouse's Issues

Motivation

Approach

Alternative approach

Summary

Motivation

Approach

Features

Managing Environments

Manage Secrets

View logs and metrics

Manage releases

Architecture

Adaptors

TODO

Workflows

CLI Commands

Unknowns

Motivation

Approach

Motivation

Detailed Design

Installation of a GitHub Webhook plugin

Debug-ability

TypeScript types

Extensibility

Motivation

Detailed Design

Architecture Principles

A location and its URL is a root of a processing pipeline

Naming Conventions

Discovery processors emit locations

Relevant Links

TODO

TODO

Motivation

Context

Possible Solution

Current behavior

Expected behavior

Motivation

Micro Actions

Floating Form

Inline Actions

Customizations

Custom chrome

Custom Form

Approach

Recommend Projects

Recommend Topics

Recommend Org

Jobs