Keep an eye for user information in the log output.
Running with production config
Running with simulation should be used in most situations. However, there are times where using the production config is necessary to debug, e.g. specific Auth0 configuration issues.
With access to the Frontside 1Password vault and the 1Password CLI, you may inject the secrets into a .gitignored config file and use the yarn dev:config command to pick up these values.
op inject -i app-config.1password.yaml.tpl -o app-config-credentials.yaml
Running in minikube
TAG=whatever
yarn install
yarn tsc
yarn build
yarn build-image --tag backstage:$TAG
minikube start
eval $(minikube docker-env)
minikube image load backstage:$TAG
# TODO - need to modify the charts so that the following steps are automated but for now please do the following:
* ๐ฅ comment out ./backstate/templates/certificate.yaml
* ๐ฅ move ./backstage/templates/secrets.yaml to another directory
* ๐ฅ comment out `volumeMounts` and `volumes`
* ๐ฅ modify container command to exclude `app-config.production.yaml`
PG=whatever
helm upgrade --install min-postgres-chart ./charts/postgres \
-f ./charts/postgres/Values.yaml \
--set postgresUsername=$PG \
--set postgresPassword=$PG
POSTGRES_SERVICE_PORT=5432 POSTGRES_USER=$PG POSTGRES_PASSWORD=$PG helm upgrade --install min-backstage-chart ./charts/backstage \
-f ./charts/backstage/Values.yaml \
--set backstageImage=backstage:$TAG \
--set baseUrl=http://localhost:7007
kubectl port-forward svc/backstage 7007:80
In effort to bring visibility into Backstage server process, we need to be able to see the Effection Inspector in Backstage. We can do this by creating a Backstage plugin that will provide render component in @effection/inspector-ui package.
Incremental Ingestion Backend should only attempt to insert marks when there are entities to mark. In our close source version, we added the following code
Some times you want to modify a value that comes from the backstage catalog. You probably would add a resolver for specific field, but you couldn't. It's because if you use @field or @relation directives for that field, the graphql plugin will add a resolver and your resolver will conflict with it.
Approach
We can add pre and post hooks that duplicate schema structure:
Here you can see that in pre.Entity hook we add new field id to the entity. There is no reason to add field specific pre-hooks. But maybe it should be better to rename them to interfaces and fields hooks. For the post.Component.tag field we change a type from string to { id: string, label: string } and then new value is used as a result of resolver. The idea is use pre hook for adding/removing fields from source object and post for transforming a specific field value.
During integration it to HP project I noticed that our graphql plugin has some flaws
It handles graphql schema directives only at runtime, but we want it to be able generate correct types from final schema and also provide final schema for clients
Under the hood it uses catalog client for a data loader, but we have a better and more performant batch loader
It doesn't allow to provide custom loader either. It's needed when you have multiple sources which aren't ingested into backstage catalog
This RFC proposes creating a Backstage plugin for distributing a CLI and a Backstage UI plugging for managing runtime environments, secret management, environment management, and observability tooling. The CLI and Backstage UI will use a pluggable Platform API.
Motivation
Heroku-like experience in the Cloud Native ecosystem has been an elusive mirage. Those familiar with the experience of using Heroku often refer to it as the experience developers want while using their Internal Developer Platforms. Many innovations made Heroku experience memorable. Developers could create their applications and push code to a Git repository, where the code would be automatically built and deployed to the Heroku platform. When it took months for Ops to set up a database, Heroku gave developers the ability to add databases in minutes. Developers could add a database to their application, and the Heroku platform would automatically make the connection string for the application available through the environment variables. Developers could manage secrets for their applications that would be added as environment variables without storing sensitive information in the source code. Heroku made running ambitious applications easy and enjoyable. This is why it's been a lasting inspiration for developers building Internal Developer Platforms in the Cloud Native ecosystem.
Many features of the Heroku platform are now a standard architecture of Internal Developer Platforms. Most IDPs allow developers to push their code to a Git repository provider such as GitHub, GitLab, or BitBucket. When a new commit is pushed into a branch, it's automatically built by a Continuous Integration system like GitHub Actions, GitLab CI or BitBucket Pipelines. The CI workflow creates a container image and pushes it into a container registry. A Continuous Deployment service monitors the container registry and automatically deploys the container to the platform. Once the application runs on the platform, it's autoscaled by Kubernetes, which is managed by the platform engineering team. The platform is instrumented with observability tooling that allows developers to see their logs and metrics in tools like Kibana and Prometheus. Mass adoption of git-based workflows makes it possible to provision resources like databases using configurated stored as code with standards PAWS - Platform-Agnostic Workload Specification. Modern IDPs can now automatically provision resources based on configurations in a paws.yaml file.
The Cloud Native ecosystem matured to a point where in some respects, the developer experience of using the platform is superior to that of Heroku, but there is still a gap. This gap is in the interfaces that developers have to use to interact with their Internal Developer Platforms. Heroku allowed developers to interact with their platform using a single CLI and a unified UI. CLI and UI provided feature parity in features that were important to developers. A developer could create an application, clone environments, set environment variables, and view logs from the CLI. The CLI experience was project-code-centric - the project's working directory sets the CLI's context. When a developer invoked commands from the CLI, those commands automatically assumed the Heroku application based on the code in the working directory.
The experience of interacting with the IDP from the CLI with a context set by the working directory and the ability to perform all of the same operations via a Web UI is the last remaining piece in realizing a Heroku-like experience on Internal Developer Platforms.
The emergence of Backstage as a standard developer portal is the last missing piece in the Heroku-like experience puzzle. Before Backstage, each company created its portal using a variety of technologies and different architectures. Creating a reusable toolkit that companies could use off the shelf without a consistent architecture is difficult. Backstage provides a UI framework that can be used to create a reusable user experience in a web application and a pluggable server architecture that can be used to create a reusable server backend for the web application and CLI tool.
Approach
Features
The most common features of a Heroku-like DX are the ability to manage environments, manage secrets, view logs and metrics, and manage releases.
Managing Environments
Runtime environments for services are an important part of the developer's everyday life. Developers need to see where the application is running, know how to access the service, manipulate secrets and see logs for each environment. They need to be able to create ephemeral environments that automatically deploy from branches and provision new resources. In all of these use cases, the runtime environment is the context for each activity. This necessitates for Backstage to treat Runtime Environment as a first-class concept.
Manage Secrets
Most services require credentials and tokens to connect to external integrations. Each platform has a different way of managing these secrets. Some Internal Developer Platforms, especially those going through change, may have multiple secret stores or a different secret store for each environment. Managing these secrets can be a time consuming and error prone activity. To simplify the process of managing secrets and reduce error, developers should be able to manage secrets from the Backstage portal or the CLI.
View logs and metrics
When a service is failing or when doing troubleshoot, developers need to be able to find all of the information associated with a service without jumping around different systems. Logs and metrics are critical to debugging and learning about runtime behaviour of a service. This information needs to be available at their finger tips in Backstage or in the command line interface. They should be able to switch between different components and see the observability data immediately without doing any digging.
Manage releases
Shipping software is perhaps the most important activity in the entire software development lifecycle. How that software makes it to the users varies from organization to organization, from team to team and even from software to software. Empowering developers to ship their software in an reliable and fail-safe way without introducing unnecessary meetings and obstacles is perhaps one of the biggest contributors to improving organizational DevOps maturity and the experience of shipping software on an Internal Developer Platform. Backstage and the CLI tool have the opportunity to make it easier for organizations to ship their software by providing a flexible interface to deployment requirements on the Internal Developer Platform.
Architecture
One of this problem's biggest challenges is that each Internal Developer Platform is different. One platform may use AWS Secret Manager, and another may use Hashicorp Vault. Depending on the maturity of the IDP, it's not unusual to see two different technologies being used for the same purpose on different projects. Even though the goal is to standardize, not two platforms are alike, and their implementation changes over time. This requires designing the architecture in a way that allows maximizing reuse while being flexible to the decisions of each Internal Developer Platform.
We can define a clear interface between the client and the server. The clients - the Backstage UI components and the CLI, will use a clearly defined schema to communicate to the platform API. The platform API will expose an API that implements the schema, but the implementation of each Platform API will be platform specific. For example, regardless of whether the platform uses Hashicorp Vault or AWS Secrets Manager, the UI components and CLI will make the same requests to the Platform API. The Platform API will be responsible for making appropriate calls to the Hashicorp Vault or AWS Secrets Manager, depending on the platform's use.
Adaptors
Some tools in the Cloud Native ecosystem offer a lot of value when building an Internal Developer Platform. These tools typically provide their Platform APIs and accelerate the creation of a modern IDP. Humanitec is a perfect example of such a tool. It does the hard work of dynamically generating configuration, provisioning resources, and deploying applications. For many companies, platforms like Humanitec make up a big percentage of their Internal Developer Platform. The Adapter API will allow these tools to provide a shortcut to implementing the Platform API. Each Adapter will be able to provide one or more features.
TODO
Research: How do we compile the binary?
Download the deno binary from npm
Checks for existence of compiled files
If not there, compiles the files
How do we authenticate it?
Login command
Open browser
Follow authentication flow
Get token
Store it on file system
Use it to make requests
Implement Commands
version - version of the CLI, URL of Backstage instance, version of Backstage used to compile it
project
info - read catalog info entity idp info [--component]
environments
variables
releases
templates
search
Workflows
Install Plugin
Frontend
Download CLI Page
provide curl command
execute curl command from local machine
Download the binary (make it executable)
Backend
Platform REST API
CLI Commands
version - version of the CLI, URL of Backstage instance, version of Backstage used to compile it
project
info - read catalog info entity
environments
variables
releases
templates
search
Unknowns
How do we compile the binary?
Download the deno binary from npm
Checks for existence of compiled files
If not there, compiles the files
How do we authenticate it?
Login command
Open browser
Follow authentication flow
Get token
Store it on file system
Use it to make requests
How do make determine context server configurable?
As a developer working on the Backstage portal, I need to be able to configure how the CLI determines what is the component associated with the current working directory. Ideally, this would happen without having to release a new version of the CLI.
As a developer using the GraphQL API provided by the GraphQL plugin, I need to be able to query many entities with filtering. The standard for querying many entities via a GraphQL API is described in GraphQL Cursor Connections Specification. We need to allow querying records using Schema and resolvers that conform to GraphQL Cursor Connections Specification.
Approach
There is nothing stopping someone from implementing the connection specification manually. For example, if there is type called Repositories that has the following schema,
Someone familiar with GraphQL Cursor Connections Specification could write the following types to make it possible to query repositories that are owned by a specific user.
Right now @field directive allows map only primitive types, like string/numbers/arrays or objects 1:1. But there is no way to implement that case from above.
GitHub is an important source of information for a Backstage portal. It contains all project source code which is increasingly becoming the source of truth for the configuration of assets in an engineering organization ecosystem. A GitHub repository contains configurations of CI/CD systems, package dependencies, and deployment configurations to name just a few examples. The trends of treating a repository as a source of truth will only increase with the adoption of GitOps practices.
Backstage has several mechanisms used to interact with GitHub. The GithubURLReader used to read a single or multiple files from a repository via the GitHub REST API. Various GitHub processors in @backstage/plugin-catalog-backend-module-github package used to read organization information and discover repositories. GitHubEntityProvider can be used to pull groups and users for an organizations into Backstage. Each of these mechanisms provide some functionality but they don't cover all of the use cases for ingesting data from Backstage.
Further more, Custom Processors such as GithubDiscoveryProcessor are being deprecated in favour of Entity Providers. Entity Providers are replacing Custom Processors for ingestion because custom processing ingestion pipeline proved to be inefficient when processing data from large GitHub instances with hundreds of organizations and thousands of repositories. For large GitHub instances, custom processors resulted in long delays in processing because the custom processing pipeline would attempt to indiscriminately ingest every location regardless of wether the location has new data. Entity Providers are a more scalable approach because they allow reacting to change in GitHub instances without proactively looking for changes.
The most efficient way of ingesting data from large GitHub instances is to trigger an entity provider as a response to a webhook. You can find an example of this in #10. In the most rudimentary form, we trigger the provider's read function when a specific event is triggered. Here is an example of triggering read on GithubOrgEntityProvider when a person is added to a team or an organization.
router.post('/github/webhook',async(req,_res)=>{constevent=req.headers["x-github-event"];if(event=="membership"){awaitgithubOrgEntityProvider.read();env.logger.info("Successfully triggered database update via github webhook event");}// TODO: we should forward requests to smee for local development});
This code in this webhook will have to handle all of the entity providers that will interact with GitHub. The complexity of this webhook handler will grow as the number of entity providers in this webhook increases. It will become increasingly difficult to debug this webhook and will inevitably lead to confusion. I would like for us to get ahead of this by introducing an API to make it easier to write and debug entity provider for the GitHub webhook.
The specific APIs are TBD and @cowboyd will have very good opinions on the subject. I wanted to share some thoughts to get the ball rolling.
Detailed Design
Installation of a GitHub Webhook plugin
The GitHub Webhook Plugin will allow a developer to install the webhook as a regular Backstage plugin. This plugin will mount an express route that will receive events once the webhook is added to an organization in GitHub.
Debug-ability
The goal of this plugin is to make debugging easier by giving developers a way to inspect the behavior of the entity providers that are handling events received by the webhook. Visibility into execution of the webhook will be provided by the Effection Inspector. For Effection Inspector to show execution of the entity providers, each entity provider must be written as an Effection task.
TypeScript types
We want to make it as easy as possible to write strictly typed handlers for these events. The API for extending the webhook handler should use types from https://github.com/octokit/webhooks#importing-types to guide implementors in hooking into the webhook.
Extensibility
In addition to providing the Effection Inspector, Effection provides some useful building blocks for a custom API...
Our mark and sweep strategy creates a mark for each committed entity. In the case of a large entity provider like LDAP with 100k+ users, that'll be 100k marks for every ingestion. We never clean up previous marks, which bloats the database unnecessarily.
One possible solution is to create a task that will run on an interval to delete marks from previous ingestions. Another option could be to delete older marks when computing deleted entities.
In our client's project, we found that after applying solution in #92, removing entities was causing Malformed Entities to be committed. We traced it to remove query returning strings instead of JSON objects for entities. We added the following code to remove logic.
There are lots of annotations that have comma in their keys. For example metadata.annotations.github.com/project-slug. We need to be able to retrieve this value with @field(at: "metadata.annotations.github.com/project-slug"). Currently, this doesn't work because github.com/project-slug goesn't get read correctly.
In #3 we introduced a roll back step in the release workflow for when a new deployment of backstage is not successful. This will work fine in most cases but there will be a problem when we have two merge commits happen back-to-back and if the resulting image of both those pull requests are not healthy.
This problem will occur if the first deployment workflow is at the kubectl rollout status step (which lasts for 60 seconds for a bad deployment) and the second deployment workflow (running concurrently) runs helm upgrade --install min-backstage-chart.
The upgrade of the second workflow run will extend/prolong the kubectl rollout status step of the first workflow run.
At this point we'll have a chart history that looks like this:
good deployment 1
bad deployment 2 (first workflow run)
bad deployment 3 (second workflow run)
The failing result of the kubectl rollout status will rollback from bad deployment 3 to bad deployment 2 and the chart history will become:
good chart 1
bad deployment 2
bad deployment 3
bad deployment 2 (as 4)
We have the workflow configured to prevent parallel runs but the same problem persists if the first workflow gets past the helm upgrade --install min-backstage-chart step.
We currently do not have any role assignment to various endpoints and so they are all unprotected. We need to not only a strategy for securing endpoints, but also a decision tree for which routes should be protected and which should be public. Including a policy when no decision has been made.
We improved performance of the Incremental Ingestion Backend by adding indexes to the database. We need to add the same to our open source version.
/* * (C) Copyright 2022 HP Development Company, L.P. * Confidential computer software. Valid license from HP required for possession, use or copying. * Consistent with FAR 12.211 and 12.212, Commercial Computer Software, * Computer Software Documentation, and Technical Data for Commercial Items are licensed * to the U.S. Government under vendor's standard commercial license. *//** * @param { import("knex").Knex } knex * @returns { Promise<void> } */exports.up=asyncfunction(knex){constschema=()=>knex.schema.withSchema('ingestion');awaitknex.raw(`CREATE INDEX IF NOT EXISTS increment_ingestion_provider_name_idx ON public.final_entities ((final_entity::json #>> '{metadata, annotations, CHANGE_ME}'));`,);awaitknex.raw(`DROP VIEW IF EXISTS ingestion.current_entities`);awaitschema().alterTable('ingestions',t=>{t.primary('id');t.index('provider_name','ingestion_provider_name_idx');});awaitschema().alterTable('ingestion_marks',t=>{t.primary('id');t.index('ingestion_id','ingestion_mark_ingestion_id_idx');});awaitschema().alterTable('ingestion_mark_entities',t=>{t.primary('id');t.index('ingestion_mark_id','ingestion_mark_entity_ingestion_mark_id_idx');});};/** * @param { import("knex").Knex } knex * @returns { Promise<void> } */exports.down=asyncfunction(knex){constschema=()=>knex.schema.withSchema('ingestion');awaitschema().alterTable('ingestions',t=>{t.dropIndex('provider_name','ingestion_provider_name_idx');t.dropPrimary('id');});awaitschema().alterTable('ingestion_marks',t=>{t.dropIndex('ingestion_id','ingestion_mark_ingestion_id_idx');t.dropPrimary('id');});awaitschema().alterTable('ingestions_mark_entities',t=>{t.dropIndex('ingestion_mark_id','ingestion_mark_entity_ingestion_mark_id_idx');t.dropPrimary('id');});awaitknex.raw(`DROP INDEX increment_ingestion_provider_name_idx;`);};
The current @backstage/plugin-catalog-backend-module-github is a mix of processors that evolved gradually because existing processors didn't satisfy all of the use cases. The result is a mishmash of functionality. It takes a non-trivial effort to figure out what each processor does and its limitations. As a result, each organization integrating with Github creates its version of GitHub processors. Instead, we want to have a consistent, predictable, and flexible plugin.
In this issue, I will define requirements for a POC for a new Github Plugin. We will use this POC to create an RFC in Backstage to introduce a more robust Github integration for Backstage.
Detailed Design
The new plugin will use architecture principles and a new naming convention.
Architecture Principles
A location and its URL is a root of a processing pipeline
Backstage catalog's ingestion pipeline aggregates and relates information from external systems. Backstage is responsible for processing data from a growing number of external integrations. As the number of integrations grows, so does the latency in the ingestion pipeline. An efficient ingestion pipeline aims to keep data up to date with as little latency as possible. To keep the processing latency down, the developers writing processors must design their processors to allow Backstage to optimize the processing. Backstage can optimize processing with caching and parallelization. Caching in Backstage processors is scoped to a location. Likewise, paralyzation is performed by concurrently processing locations. To reduce latency in the ingestion pipeline, developers must ensure that their processors can cache and paralyze processing based on a location. One sure way to increase the performance of your ingestion pipeline is by designing your ingestion to utilize locations.
Consider the following use case: we want to ingest all of the repositories of a Github Organization and show who's contributing to these repositories. We could write a processor that fetched a list of all repositories for the organization, iterated over returned repositories, and fetched all contributors for each repository. We would then emit each repository, relationship between repository and users, followed by inverse relationships to mark what repositories a user is contributing to.
This is a lot of work that needs to happen in a single processing job. If we encounter an error, the entire job can fail. If we handle the error gracefully, the entire job will get delayed. To improve the performance and resilience of this job, we can break it up into multiple smaller jobs by emitting a location for each repository.
The result is new locations in the catalog that can be paralyzed by the processing engine and processing of each location can be cached.
Naming Conventions
Discovery processors emit locations
Locations being such an important part of an efficient processing pipeline, it's important that we highlight where locations are created. Having a dedicated processor for emitting locations makes that very clear. The convention that I'm proposing is to designate the Discovery prefix to mean processors that emit locations. For example, GithubOrganizationDiscoveryProcessor would emit Github Organization locations. Likewise, GithubRepositoryDiscoveryProcessor would emit repositories that are owned by the organization or user.
As a precursor to securing endpoints, we will need to be able to develop and test using a real authentication system. For this, we need to get an Auth0 Server up and running locally, and able to authenticate through backstage.
As a developer writing features that use the GraphQL API provided by the GraphQL Plugin, I need to be able to query for data of a user who's using the application. For example, I should be able to retrieve my components using the following query.
{
viewer {
components {
id
}
}
}
This query should return components that are owned by the user who's querying.
The current GithubOrgEntityProvider fetches members and teams of an organization. There's only one function available for updating the database and it's read() - it fetches the users/teams and does a "full" mutation to the database.
There should be two ways of triggering an update - periodically and whenever there's a new change. In #10 I added a task scheduler to run read() once a day. And alongside the task scheduler, we also want to be able to trigger an update whenever a member/team is added or removed so that we don't need to potentially wait a day for the backstage instance to reflect the new updates.
Although GithubOrgEntityProvider only fetches data about members and teams, the Github App we're using with backstage is configured to deliver webhooks on not only members/teams events but also repository events.
What this means is if we configure our catalog builder like so:
As convenient as that is, it is an unintended side effect of starting a backstage server that you will create a full blown GitHub api simulator.
We ran into this issue where we needed to write an extended GitHub API simulator and where mystified as to why it was always hanging (answer: there was already one running)
We should just remove this and let other compose it.
I've got a few use cases where I need to query specific properties of a Components and Resources based on the spec.type or have the ability to cast a type into a sub-type. For example, I've got a Pipeline resource like below:
Have the ability to define sub-types so we can prevent the Resource and the Component types to get bloated with properties belonging to all their different sub-types.
We need a much more flexible Embeddable Scaffolder UI that supports two categories of use cases: micro-actions and customizations.
Micro Actions
Scaffolder workflows are a powerful tool that supports golden paths with reusable templates. They are commonly used to create service templates that incorporate the organization's best practices. These templates aim to provide developers with everything they need, including configuration for deployment, documentation, monitoring, and other platform features. The primary purpose of building these templates is to eliminate toil and reduce friction and cognitive load for developers. However, our current approach to providing these configurations introduces friction, making using templates more complicated and frustrating than necessary.
Let's take a few examples to see the difference in the approach:
Example 1: Relating a component to a system
A component can be part of a system. It's common to see an entity picker field in a Scaffolder Template that allows a user to choose a system that the component belongs to. The user's selection is added to catalog-info.yaml via the template. It seems intuitive, but we must ask ourselves: "Do we need this information when the component is being created? What happens when the user wants to update the system after creating the component?" The answer, in many cases, is "no."
Floating Form
Instead of designing complex form
A user can change a value by clicking on it. It will show a Scaffolder form.
Clicking on Save will trigger the workflow
While the PR is pending, it'll show a symbol with a tooltip
Hover will show that PR is pending
Clicking on the symbol will take the user to the PR
This is an example of the kind of workflow that we want to be able to implement with Embeddable Scaffolder Workflow.
Inline Actions
Some actions do not require user input. These actions could be triggered and visualized entirely inline.
Customizations
Custom chrome
chrome in this context refers to components around the form. In the following diagram, the chrome is the blue box. It includes the form's title, the description, and action buttons at the bottom.
We need to be able to completely replace these because some of our clients have strict accessibility requirements that we can not retrofit onto Material UI.
Custom Form
Many of our clients have internal design systems and component libraries that implement these design systems. Their developers are familiar with their design system, making Backstage feel like an internal tool. We want our clients to be able to use their internal component library for scaffolder forms.
This can be accomplished by creating a custom RJSF form component that provides the following templates and widgets.
Templates:
ArrayFieldItemTemplate
ArrayFieldTemplate
BaseInputTemplate
AddButton
CopyButton
MoveDownButton
MoveUpButton
RemoveButton
SubmitButton
DescriptionFieldTemplate
ErrorListTemplate
FieldErrorTemplate,
FieldHelpTemplate
FieldTemplate
ObjectFieldTemplate,
TitleFieldTemplate
WrapIfAdditionalTemplate,
Widgets:
CheckboxWidget
CheckboxesWidget
RadioWidget
RangeWidget
SelectWidget
TextareaWidget
Embedded Scaffolder Workflow should allow rendering with a custom Form component.