GithubHelp home page GithubHelp logo

iver-wharf / wharf-api Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 717 KB

Wharf backend written in Go

License: MIT License

Dockerfile 0.42% Go 98.44% Shell 0.17% Makefile 0.97%
wharf gin-gonic gorm golang gin swaggo

wharf-api's People

Contributors

alexamakans avatar applejag avatar fredx30 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

wharf-api's Issues

Rename EngineList.Default to EngineList.DefaultEngine

iver-wharf/wharf-web#53 (comment)

Motivation

As BranchList has its default branch as DefaultBranch we should have EngineList's default engine as DefaultEngine for consistency. It also prevents swaggo from generating the variable name _default, which our current lint settings don't like over at wharf-web.

Remember to change the JSON tag name to defaultEngine as well.

Note: As changing names of things can often be a breaking change I am putting this at a high priority. This change is not required and the issue may be closed if decided against.

Deprecate status ID, use strings instead

Our build responses and requests has both build status ID and string enum.

Suggest removing the ID from the REST API and only rely on the status string.

This needs deprecation warning, so could be removed first v6 if deprecated in v5

PUT endpoints should take ID from path parameter

Based on RFC-0016 (iver-wharf/rfcs#16)

The PUT method in REST is meant to represent a "replace". Similar to the GET method, the subject (which object to get/replace) should come from the URL path.

Expected

PUT /project/{projectId}
PUT /project/{projectId}/branch
PUT /provider/{providerId}
PUT /token/{tokenId}

Actual

PUT /project/
PUT /branches
PUT /provider
PUT /token

Cannot delete projects

Expected

curl -X DELETE 'https://your-wharf-instance.com/api/project/5'
>>> OK

And all branches and builds, and their associate build logs and build params, are deleted as well.

As it's quite the dangerous endpoint, maybe we want to add something like ?are-you-sure=true, or is that not necessary as it's an API anyway, meant to be used by computers?

Actual

Response given:

{
  "type": "https://iver-wharf.github.io/#/prob/api/unexpected-db-read-error",
  "title": "Error writing to database.",
  "status": 502,
  "detail": "Failed deleting project with ID 5 from database.",
  "instance": "/api/project/5",
  "errors": [
    "ERROR: update or delete on table \"project\" violates foreign key constraint \"fk_branch_project\" on table \"branch\" (SQLSTATE 23503)"
  ]
}

You must delete all dependent rows from the other tables branch and build (and in turn the log and build_param tables as well) before you can delete the project.

As there's no endpoints to delete them you must do it via SQL commands.

Move artifact blobs to separate table

Two main issues:

  • No direct way to only load the artifact metadata without the BLOB, meaning a lot of data transfer we don't usually care about

  • No way to use an external storage tool, such as an S3, to store the BLOBs. BLOBs are not really meant to be stored in databases as you can't query them and they just bloat the database leading to worse performance.

Suggested

  • Add new model database.ArtifactData
  • Change Data field in database.Artifact to be a pointer to database.ArtifactData
  • Only load the Data when needed, such as in the GET /build/{buildId}/artifact/{artifactId} endpoint
  • Add migration to move the data to the new table artifact_data

Future possibilities

  • Add S3 support. Adding migrations should be easy enough. We could change the ArtifactData.Data field contain reference to the artifact ID in the S3 storage.

    • In our internal cluster we have built-in S3 support via Rook Ceph, so we could take good advantage of this.
    • Still needs the database support as fallback for those without S3 storage available.

Add URL slug to projects

Currently you have to use ID to get a single project.

Want to add so we can look them up on some kind of normalized name, such as a URL slug dedicated for each project.

They would still have their IDs behind the scenes.

  1. Add indexed nameSlug and groupNameSlug columns to project
  2. Add migrations to add slugs to existing projects
  3. Ensure slug is added on project insertion
  4. Ensure slug is updated when the project name or group is updated

Maybe we want to have a normalized slug as well so that we can make fast case insensitive lookups.

Example project:

Column Value
name My Project
groupName My Group/Stuff
nameSlug My-Project
nameSlugNorm my-project
groupNameSlug My-Group-Stuff
groupNameSlugNorm my-group-stuff

In queries like GET /api/project?groupNameSlug=My-Group-Stuff, do the actual lookup on the groupNameSlugNorm field, but use the groupNameSlug in the response object. This way the frontend can redirect to its "canonical" slug with cases preserved.

All of this is to allow wharf-web to use project slugs in the URL instead of project IDs. So the URLs would be
http://localhost/project/My-Group-Stuff/My-Project/builds
instead of
http://localhost/project/123/builds

Need to consider how to do the project data lookups as well. We perhaps don't want to add another set of API endpoints just for using the slugs instead of the IDs. One idea is to let the wharf-web rely on querying GET /api/project?nameSlug=...&groupNameSlug=... for fetching the project metadata (including ID) via something like route resolvers and then use that in future requests for stuff that targets the project like "start a new build". (explored in iver-wharf/wharf-web#117)

Map projects per ID: DB Schema

Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.

This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.

Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".


There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.

See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"

image

A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest, which is an invalid docker image name.


There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.

Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.


Steps:

  • change db scheme

Fix database model associations

We have a lot of misconfigured associations in our pkg/model/database package.

When doing just a simple test implementation of Sqlite foreign key constraints (#145), I noticed that our invalid database model relations are really obstructing this. Maybe it will mess up our Postgres integration as well?

Actual

For example, the Project type should have many builds, but instead the Build model is set up to with a "has-one" relation to Project. I.e. an inverted model dependency

type Project struct {
	TimeMetadata
	ProjectID       uint      `gorm:"primaryKey"`
	// ...
}

type Build struct {
	TimeMetadata
	BuildID             uint                `gorm:"primaryKey"`
	ProjectID           uint                `gorm:"not null;index:build_idx_project_id"`
	Project             *Project            `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
	// ...
}

Expected

The Project model should have a "has-many" relation to the Build model, like so:

 type Project struct {
 	TimeMetadata
 	ProjectID     uint    `gorm:"primaryKey"`
+	Builds        []Build `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
 	// ...
 }
 
 type Build struct {
 	TimeMetadata
 	BuildID             uint                `gorm:"primaryKey"`
 	ProjectID           uint                `gorm:"not null;index:build_idx_project_id"`
-	Project             *Project            `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
	// ...
 }

Docs: https://gorm.io/docs/has_many.html

Add cancelling build support

  • Add status BuildCancelled
  • Add endpoint to cancel build
    • if engine is wharf-cmd: tell wharf-cmd-provisioner to cancel the build
    • otherwise: say cancelling builds isnt supported

Maybe need to extend the engines type to also allow declaring the engine type, with an enum of jenkins-generic-webhook-trigger and wharf-cmd

Map projects per ID: Prepare migration scripts

Depends on #12

Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.

This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.

Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".


There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.

See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"

image

A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest, which is an invalid docker image name.


There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.

Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.


  • prepare migration script, test it locally (16h, 123). Applying them on production (manual vs automatic) is a separate task

Unescape TestResultDetail messages

Created as per discussion over at iver-wharf/wharf-web#53 (comment)

The messages taken from .TRX files are currently escaped.
This is slightly annoying as we have to unescape them when we want to display them in wharf-web.

Initial thoughts from me is that unescaping before storing in DB seems okay, I don't know if we would ever want to have the escaped version. Also likely that if we do need it, it wouldn't be a big hassle to either re-escape the existing data on request/migration, or let unescaped messages stay unescaped.

Alternatives

  • Unescape before storing in DB.
  • Save escaped version in DB and unescape when GETting.
  • Save escaped version in DB, and add query parameter to GET /api/build/{buildId}/test-result/detail and GET /api/build/{buildId}/test-result/summary/{artifactId}/detail to retrieve unescaped or escaped.

POST should only add, and PUT should only update/replace

Based on RFC-0016 (iver-wharf/rfcs#16)

There's a mix of functionality and code duplicated across some PUT and POST endpoints. Such as PUT /project vs POST /project

They both try to act like "add or update", while POST endpoints should only add and PUT endpoints should only update/replace.


Need to be extra careful here about backward compatibility.

As PUT /project should be changed to PUT /project/{projectId} (as declared in #70), then we can just keep the old implementation on PUT /project to keep it backwards compatible.

But for POST /project, it should not get an extra path parameter and so the existing implementation needs to be fully replaced. We cannot keep backward compatibility here.

Need to make sure in all wharf-provider-... repos that the POST endpoints are not used in a way that they assume the endpoint will update an existing element. If so, we either need to patch those providers and ignore the side-effects from the backward incompatibility and just hope that everyone upgrades the providers before or at the same time as the wharf-api; or we take some other radical approach such as choosing plural instead of singular (as denoted in #69)

Project overrides

Described by RFC-0024: iver-wharf/rfcs#24

Addition of new table project_override which defines almost all of the columns in the project table that, if set, will be used in HTTP response body and left untouched by HTTP PUT requests.

Add default branch to project model

Finding the default branch is a very common practice that so far have come up in both the web-ng and api repos. Instead of duplicating code for iterating through the branches every time, let's instead add a project.DefaultBranch property.

Implementation

Let project.DefaultBranch be a foreign key. Fetching the default branch will then be automatically handled by the ORM (gorm). Ultimately we could then remove the Branch.Default column as that's redundant.

  • Let project.DefaultBranch (in the api repo) be a foreign key of the projects table pointing to a specific branch.
  • The relational constraint should be OnUpdate=Cascade; OnDelete=Set Null (see: https://gorm.io/docs/constraints.html#Foreign-Key-Constraint)
  • The migrations should do the following: (also make sure it doesn't crash if run multiple times on an already migrated table)
    1. Add the defaultBranch column to the project table.
    2. Find the (first) branch with the default column set to true per project, and assign the projects defaultBranch column to point to that branch.
    3. Remove the default column from the branch table.
  • The /branches endpoints must return new DTOs that still has the default property. That model will later be changed in #14. (Optionally, do that task in this issue too. Balance of making too many hacky solutions for not changing the models vs too big of a issue/PRs. You will have to decide on your own for this one.)
  • Update the wharf-client repo to use the new project model that has the project.DefaultBranch field.
  • Update all dependant repos (github, gitlab, azuredevops, web) to use this new DefaultBranch property where appropriate.

Changes in endpoints

This would involve changing the JSON models given to- or recieved from the following endpoints:

  • POST /branch (model would remain, but logic has to change to also update the project table.)
  • PUT /branches (either keep the model or change it so default is only specified once. Either way it needs to update the project table as well.)
  • PUT /project (allow setting default branch on update, and an automatic "create branch if it doesn't exist" on the specified default branch)
  • POST /project (allow setting default branch on creating, and an automatic "create branch if it doesn't exist" on the specified default branch)
  • GET /project/:projectid (include default branch)
  • GET /projects (include default branches per project)

Changes in repos

I.e. this is a major breaking change. Changes need to be done in the following repos:

Do a major version bump where the new code is not backwards compatible (such as relies on the new defaultBranch property) in all touched repos.

TBD move /branches endpoint to project

depends on #13

It would make more sense to obtain the project ID from the URL path.
Potentially moving this entire endpoint to project.go, and moving the default parameter out to only be specified once, and in the process invalidating the issue for if the user sends more than 1 branch marked as default, ex:

PUT /projects/:project_id/branches
{
  "default_branch": "master",
  "branches": [
    { "name": "master" },
    { "name": "dev" },
    { "name": "lts" }
  ]
}

This is from the perspective of having a nice looking API, and not just having it being easy to implement in Go. I'm quite annoyed that for example the create endpoints specify that you can set the ID's, while those values are fully ignored.
Having custom types for the endpoints isn't a bad thing IMO.

Allow storing logs in Elasticsearch

Depends on #176

  • Add interface to either store logs in the database or in a different store, such as Elasticsearch
  • Make Wharf default to store logs in the database.
  • Add migrations to transfer logs to Elasticsearch. This is perhaps best suited as a manually triggered job. Such as via CLI: #142

Elasticsearch supports logs streaming, so this could let the wharf-api be scalable as the logs is currently the big thing making the wharf-api unscalable. Currently if we were to have 2 wharf-api pods, then the SSE log streaming would only send half of the logs due to the load balancing on log insertion.

Security: Add automated penetration testing, ex PurpleTeam

As we don't have dedicated red teams trying to penetrate Wharf, we should add automatic DAST (Dynamic Application Security Testing) integrated into our CI pipeline.

This needs investigation, but one proposal that comes to mind is OWASP PurpleTeam (developed partly by some folks from OWASP https://owasp.org/) that tries to apply common and uncommon security flaws on an HTTP API. Language agnostic, but you need to supply it a list of endpoints, which we already got via Swaggo. They allow you to self-host for free, which we could set up quite easily I think.

PurpleTeam does not substitute real red teams, but it is better than nothing.


Maybe the security design of Wharf is too bad in its current state to even consider this. But we should consider it once the RFC iver-wharf/rfcs#13 has been implemented.

gRPC server for streaming logs insertions

For faster logs insertions, we add a gRPC API for streaming logs into the wharf-api.

Each wharf-cmd-aggregator only opens up 1 connection each with this streaming RPC.

Ex:

syntax = "proto3";
import "google/protobuf/timestamp.proto";

service Logger {
  rpc CreateLogStream(stream CreateLogStreamRequest) returns (CreateLogStreamResponse);
}

message CreateLogStreamRequest {
  uint64 build_id = 0;
  uint64 worker_step_id = 1;
  uint64 worker_log_id = 2;
  string log_message = 3;
  google.protobuf.Timestamp timestamp = 4;
}

message CreateLogStreamResponse {
  // empty message, but left here in case of future usage
}

Details regarding worker_step_id & worker_log_id: https://iver-wharf.github.io/rfcs/published/0025-wharf-cmd-provisioning#concept-log_id-step_id-event_id-and-artifact_id

Return latest build result in GET /projects

We want to display the latest build result in the GUI.

All project GET endpoints should return the latest build for that project.

This can then later be implemented in the frontend to show the latest status per project in the list.

Best would be to return the build model inside it and not only the build status, so we could add a link in the frontend to quickly go to that build when needed.

Full validation of .wharf-ci.yml files on import

Scope down after validating wharf-ci file in Jenkins.
Split the task in the future:

  • db changes
  • logic (backend)

Warn users of validation in frontend, but let them bypass if they so choose.

Ideas:

  • turn on/off feature
  • make column in db:
    • log bypassing action
    • error list

The code for parsing .wharf-ci.yml files are stored in the cmd repo atm. This should be extracted to the core lib repo (https://github.com/iver-wharf/wharf-core) or added to its own repo, just so this repo can then take use of it when importing

TBD: Publish generated Angular client as NPM

Idea originates from @fredx30 over at iver-wharf/wharf-web#25 (review)

The idea is that when deploying we also push an npm package of the generated client that we get from https://github.com/swagger-api/swagger-codegen

Sample script: (add this to a Makefile, a powershell script, or maybe even a GitHub Action)

# Generating swagger.json/yaml artifacts
swag init --parseDependency --parseDepth 1
export VERSION="v4.0.0"
mkdir -p dist
yq eval '.info.version=$version' --arg version "$VERSION" docs/swagger.yaml > dist/swagger.yaml
jq '.info.version=strenv(VERSION)' docs/swagger.json > dist/swagger.json

# Generating NPM package
# copy some prepared package boilerplate that includes stuff like a package.json and such
cp -r src/wharf-api-client-angular dist/api-client

docker run --rm --tty --volume "$(pwd)/dist:/dist" \
  swaggerapi/swagger-codegen-cli:2.4.19 \
  generate \
    --input-spec /dist/swagger.json \
    --lang typescript-angular \
    --output /dist/api-client/src \
    --additional-properties ngVersion=9.0.3

cd dist/api-client
npm version "$VERSION"
npm publish

cd ../..

# Releasing with swagger.json/yaml artifacts
git tag "$VERSION" --sign --message "$VERSION"
git push --tags
gh release create "$VERSION" --title "$VERSION" --draft
gh release upload "$VERSION" dist/swagger.{json,yaml}

Concerns

Needs to be discussed (meeting?) but if we add an Angular client to this repo then we should possibly also merge in the wharf-api-client-go into this repo. Or should this code live in its own repo, like wharf-api-client-angular?

Or should we dissolve this idea and then also remove the wharf-api-client-go repo and rely on generated code for that one as well?

Add option for selecting execution engine

Add a query parameter to when starting builds to select the execution engine. Such as:

POST /api/project/{projectId}/build?engine=wharf-cmd

The value possible here should come from the configs. Such as:

ci:
  defaultEngine: jenkins
  engines:
    jenkins:
      name: Jenkins
      triggerUrl: https://jenkins.local/whatever
      triggerToken: changeit
    wharf-cmd:
      name: wharf-cmd
      triggerUrl: http://wharf-cmd-provisioner
      triggerToken: changeit

Needs backwards compatbility, so if defining ci.triggerUrl and ci.triggerToken (the existing config fields) then that should be equivalent to:

ci:
  # old fields:
  triggerUrl: https://jenkins.local/whatever
  triggerToken: changeit

  # are translated to the following:
  defaultEngine: unnamed
  engines:
    unnamed:
      name: Unnamed
      triggerUrl: https://jenkins.local/whatever
      triggerToken: changeit

There should also be new endpoints for accessing these endpoints, so the frontend can display a way to select between them:

GET /api/engine

Response:

{
  "default": "jenkins",
  "engines": {
    "jenkins": {
      "name": "Jenkins",
      "triggerUrl": "https://jenkins.local/whatever"
    },
    "wharf-cmd": {
      "name": "wharf-cmd",
      "triggerUrl": "http://wharf-cmd-provisioner"
    }
  }
}

The response cannot include the tokens. But the URLs are safe to share and can be used to give more context to the user.

Add CLI subcommands and args

CLI commands are really useful for doing manual work that we don't want to expose via the web API.

This mostly includes heavy operations, such as data or database migrations.

Suggested CLI

# Start the HTTP server. This is the same as what wharf-api already did in its main before
wharf-api serve

# Migration commands. Low prio, as wharf-api has a decent enough auto-migration on boot
wharf-api migrate latest
wharf-api migrate list
wharf-api migrate rollback "2021-10-05T15:30:00Z-v5.0.0"

# Per-type migration subcommands
wharf-api migrate artifact from s3
wharf-api migrate artifact to s3
wharf-api migrate logs from elasticsearch
wharf-api migrate logs to elasticsearch

# Common flags
wharf-api --version
wharf-api --help
wharf-api --config="my-wharf-api-config.yaml"

Package files move

This means we would have to do some reorganizing in the packages, as most code lies in the main package and main packages cannot be reused by other packages.

Suggest:

main.go            # Runs cmd/root.go
cmd/*.go           # Cobra commands definitions
pkg/serve/serve.go # HTTP endpoints
pkg/data/data.go   # Data access abstraction for reading/writing from Sqlite, Elasticsearch, S3, etc.

Rename path parameters

Based on RFC-0016 (iver-wharf/rfcs#16)

According to the guidelines from https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#renamed-path-parameters, the path parameters should use camelCase, and not full lowercase.

Initialisms should only be Title cased, and not fully UPPERCASE, as that's more conventional in most languages except Go.

Expected

func (m projectModule) Register(g *gin.RouterGroup) {
    project := g.Group("/project")
    {
        project.GET("/:projectId", m.getProjectHandler)
    }
}

// @param projectId path int true "project ID"
func (m projectModule) getProjectHandler(c *gin.Context) {}

Actual

func (m projectModule) Register(g *gin.RouterGroup) {
    project := g.Group("/project")
    {
        project.GET("/:projectid", m.getProjectHandler)
    }
}

// @param projectid path int true "project ID"
func (m projectModule) getProjectHandler(c *gin.Context) {}

Transaction ID for requests

This is only in the idea phase, but would want a way to make HTTP POST requests idempotent, allowing for retry logic.

There's this somewhat conventional usage of the X-Request-ID header that can be used to provide idempotency on any request, given that the server supports it. More is explained in https://stackoverflow.com/a/54356305

A basic implementation would be to add a RequestID field to builds, and when wharf-api receives a POST /api/project/{projectId}/build, it will check in the recent builds if the same request ID has already been used, and if so then just use that build in the HTTP response instead of actually starting a new build.

Same goes for the other POST endpoints.

Alternatively, the wharf-api could hold a cache of recent request IDs and their HTTP responses in memory. But to support scaling the wharf-api, we would require some distributed cache such as Redis. Maybe worth still? The implementation would be so much simpler and wouldn't need to bloat the database.

Endpoints cleanup

Based on RFC-0016 iver-wharf/rfcs#16

Restructuring the wharf-api by changing the paths of the endpoints, adding and renaming path parameters, and changing the request and response models around.

While we will keep backward compatibility for at least one major version, we will be changing so much that this needs a full major version bump.

Please see the RFC for further details: RFC-0016

Test Results Summary

Right now the test results are taken for each build separately and calculated each time by parsing artifact files.

A new table needs to be created to keep the test results summary. That table should have relations N:1 to build table and 1:1 relation to an artifact table.

Table should contain the number of tests run, passed, failed and skipped.

The values to this table should be calculated once when the artifact is uploaded into the backend.

Create a separate method to upload test results apart from artifacts

Modify the get method for builds to include the total test numbers in the build entity. That number should be a sum of all entities in the test result summary table related to that build.

The get test results method should be modified to return the list of results + a link to an artifact that those results originated from (artifact Id should be enough)

Change the frontend to use the new way of getting test results instead of making those additional calls per build.

image


  • Make the database changes
  • Fill out the data from existing artifacts
  • Create an Upload TestResult endpoint. This will parse the test result files (UTs)
  • Modify the existing wharf-ci configuration to use new endpoint (in our other internal projects that use Wharf with test results)
  • Extend the Build entity with TestResultSummary
  • Use new data from Build entity on the Frontend
  • Remove /testResults endpoint

Stream using websockets instead of SSE

Server sent events (SSE) use the same connection pool as other content. If your browser has a limit on ~4 per domain, then opening more than 4 logs will simply hault the entire browser page.

Websockets does not have such a connection limit in barely any browser, not even in mobile browsers.

While we still only want a 1-way connection where the server sends all the data and the client sits quiet, the limitation of SSE is enough to use websockets for such a use case.

The GET /build/{buildid}/stream endpoint shall be deprecated.

Doing this right might be a more complicated story. Do we have 1 channel and let the client request the data it needs, or multiple channels that only deliver a single type of data? Such as comparing GET /ws GET /ws/builds/{buildid} vs GET /ws/builds/{buildid}/logs

Taking this route may also have us dig down in the rabbit hole of AsyncAPI (https://www.asyncapi.com/) as neither Swagger nor OpenAPI v3 supports websocket specifications. With this considered, doing gRPC might not be that bad of an idea either? Just giving it a thought.

Authentication, first iteration

We need authentication. This is "easier said than done", as we need to update ALL services to propagate the authentication.

We want user logins, but for first iteration we'll stick with a single auth token that's used in frontend and backend.

Updates per repo:

  • web:

    • add way to set auth token for when making requests. Easiest and most secure way would be to have a simple login field where you paste the token, and the token is saved IN MEMORY instead of in cookies or local/session storage. This is for security sake as the token does not change. Users can still store the token in their password managers so it shouldn't be that big of a hassle for the users.

    • this login page should be presented when the user opens Wharf, as you cannot fetch the list of projects without being authenticated

  • api:

    • assert that the requests are authenticated in EVERY endpoint. To start with we're doing full lockdown of Wharf.
  • providers:

    • no need to assert the authentication, the API deals with that; but pass the key along to the Wharf Go client so it is sent onwards to the API.
  • go client:

    • this might already be implemented here, but assert that the auth is passed along in all of the endpoints.

We could do JWT-based auth, but that feels overkill for now. Just go with a basic key that acts as just a password and nothing else. The validation of the key should be a basic request.Header.Auth == os.GetEnv("API_KEY")

Fine-tune Codacy duplication checking

The code duplication from Codacy is more in the way than helpful. I do not think we should disable it, because it does improve the code in most cases, but we're in danger of "alert fatigue" if we keep ignoring some of the duplication warnings.

Some short searching lead me to the following resources that seem like they may be relevant:

The Codacy documentation sadly doesn't describe how to configure the duplication check:

Needs further research, and perhaps good idea to contact Codacy's support to make sure. From past experience, they're quite accomodative and friendly

Remove RabbitMQ integration

Our RabbitMQ integration is half-fast done. It was added to Wharf to enable two-way communication between wharf-api and the providers, but we've since started planning on a different solution (Hide providers behind API)

The RabbitMQ solution is only bloating the wharf-api at the moment with code scattered around and extensive list of environment variables.

We've also planned a notification service for sending emails on events (iver-wharf/iver-wharf.github.io#33), something that could become the new home for RabbitMQ if we so decide to reintroduce it.

This needs a discussion meeting with the team to confirm.

Rework of test result handling

Based on RFC-0014 iver-wharf/rfcs#14

Changing handling of test results in wharf-api.

Parse the .trx files once and insert into database instead of parsing each time it's requested.
Also gives the possibility to get more detailed summaries, as well as more detailed individual test results.

Please see the RFC for further details: RFC-0014

Need automated testing via `go test`

Based on iver-wharf/iver-wharf.github.io#75

Need to run Go tests and goimports formatting tests on commits and pull requests automatically.

As Wharf cannot do this yet, we should aim at using GitHub Actions.

Either we use the starter-workflow for Go https://github.com/actions/starter-workflows/blob/1d8891efc2151b2290b1d93e8489f9b1f41bd047/ci/go.yml which simply runs go test

Or we could look into a better integrated solution that could report failing tests as annotations inside the pull requests, such as:

Replace "POST search"+JSON body with "GET"+query params

Based on RFC-0016 (iver-wharf/rfcs#16)

As discussed in the "Alternative solutions":

While [using POST for search endpoints] can be useful, it’s not required for today’s use cases. Not banning POST searches for future use, but for these simpler search queries they do not fit well.

https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#alternative-solutions

Expected

GET /project/{projectId}/build
 NAME              PARAM    TYPE               REQUIRED?  DESCRIPTION
 projectId         (path)   integer            true
 limit             (query)  integer            false      Max number of items returned.
 offset            (query)  integer            false      Shifts the window returned.
 orderby           (query)  array[string]      false      Alphabetically, or order by ID?
 environment       (query)  string             false      Filter on environment hard match
 environmentMatch  (query)  string             false      Filter on environment soft match
 finishedAfter     (query)  string[date-time]  false      Filter on finishedOn
 finishedBefore    (query)  string[date-time]  false      Filter on finishedOn
 gitBranch         (query)  string             false      Filter on gitBranch hard match
 gitBranchMatch    (query)  string             false      Filter on gitBranch soft match
 isInvalid         (query)  boolean            false      Filter on isInvalid
 scheduledAfter    (query)  string[date-time]  false      Filter on scheduledOn
 scheduledBefore   (query)  string[date-time]  false      Filter on scheduledOn
 stage             (query)  string             false      Filter on stage hard match
 stageMatch        (query)  string             false      Filter on stage soft match
 status            (query)  string[enum]       false      Filter on status by enum string
 statusId          (query)  integer            false      Filter on status by ID

The difference between a soft match (params ending with -Match) vs hard match is that:

  • hard match: verbatim match on the whole string. Useful when other components or cronjobs want to search for items.
  • soft match: query can be a substring of the real value, or even allow typos. Useful when humans want to search for items.

Idea is that if the query parameter is not used, then it will not be searched on. While if it's used then even if the parameter is empty then it's searched on (only relevant for the hard-match filters). Which means that this:

GET /project/{projectId}/build?stage=

...resolves to:

SELECT * FROM build WHERE project_id=@projectId AND stage=''

...while:

GET /project/{projectId}/build?

...resolves to:

SELECT * FROM build WHERE project_id=@projectId

The following endpoints will also have to be implemented, to fulfil the consistency gap:

Actual

POST /build/search

And the HTTP request body:

{
  "projectId": 123,
  "environment": "dev",
  "stage": "build",
  //etc...
}

Map projects per ID: Apply migrations

Depends on #12 & #15

Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.

This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.

Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".


There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.

See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"

image

A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest, which is an invalid docker image name.


There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.

Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.


Apply migrations (automatic or running throught container command) (6h, 124)

Replace `Build.IsInvalid` with `Failed` status

We have duplication of ways to represent a similar process.

A while ago we introduced Build.IsInvalid to be used when a build fails to start but has already been created in the database, such as if the Jenkins call fails.

Suggest instead to only use the Failed build status. Migration could be done easily by a single SQL query of something like:

UPDATE build SET status=3 WHERE isInvalid=1

And then drop the isInvalid column

We could introduce a FailedStatus column that is a free-text field of why it failed, and then we could use that to show to the user the reason for it failing (when we know the cause, such as connection refused on contacting Jenkins).

Remove Provider.UploadURL

Old and unused field on the provider model.

For GitHub, the API URL is https://api.github.com/, but to upload assets to a release you use https://uploads.github.com. This URL is however provided by the "Create a release" endpoint and should be able to be provided from there instead.

I've not researched the other providers, but I do know that the upload URL field is unused throught Wharf.

Migrations are needed. GORM doesn't removes columns automatically, it only adds them automatically.

Test Results Details

In comparison to #11 , this issue is for full test results. Meaning each single test should be stored with its state and messages.

Should be populated when a test result is uploaded

Need endpoints to be able to fetch these.

Could probably settle for only storing non-successful tests

Import projects via config file

Declaring what to import already from a config file / k8s Secret.

Use case: Spin up new environments that is pre-populated

This is a feature request from a user who had to reset their postgres because KubeDB f*cked up. But the "spin up pre-populated environment of Wharf" is a promising feature.

This should get an RFC!

Concerns:

  • There will be secrets in these configs.

  • This would involve the API having to speak with the providers. We have to be careful here not to introduce a circular dependency chain as the providers currently depend on the API. Or the providers read the config instead

  • Either we watch for updates in the configfile, allowing hot reloading, or we only import on boot. Config could have settings such as:

    importOnStartup: true
    importOnConfigChange: true

Add Swagger operation IDs to all endpoints

Based on RFC-0016 (iver-wharf/rfcs#16)

Swagger allows for so called "operation IDs", which are string IDs used to identify endpoints.

These IDs are usually used when generating API clients, such as the swagger-codegen that we use for wharf-web.

So these IDs should be descriptive.

Expected

Names should follow the guidelines specified in https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#swaggeropenapi-endpoint-ids

Example:

// @id getProjectList
func (m projectModule) getProjectsHandler(c *gin.Context) {}

// @id getProject
func (m projectModule) getProjectHandler(c *gin.Context) {}

// @id createProject
func (m projectModule) postProjectHandler(c *gin.Context) {}

To later generate the TypeScript methods:

ProjectService.getProjectList() {}
ProjectService.getProject(projectId: number) {}
ProjectService.createProject(project: MainProject) {}

For consistency, the handler methods in Go should mimic the naming convention, so changing getProjectsHandlergetProjectListHandler

Actual

Swagger-codegen tries to generate its own method names as no IDs exists.

These generated names use the path segments, suffixed with the HTTP method.

Example:

ProjectService.projectGet() {}

ProjectService.projectsProjectidGet(projectid: number) {}

ProjectService.projectPost(project: MainProject) {}

Sqlite foreign keys

Depends on:

Newer versions of Sqlite supports foreign key constraints, as does the newer versions of the github.com/mattn/go-sqlite3 package.

Should add this in as we sometimes rely on the constraints to CASCADE DELETE rows, without which we currently get orphan rows in the local wharf-api.db database file.

Commit ID in build details

We want to show the commit ID of the build

Possibly easier said than done. Currently we don't know the commit until mid-build when Jenkins checks it.

Proposed solution

  1. Get the latest commit of the target branch/committish (see iver-wharf/iver-wharf.github.io#45) from the provider (needs the unidirectional communication already established, see iver-wharf/iver-wharf.github.io#51)
  2. Store target commit with the rest of the build data at the same time as everything else, like build input variables, is inserted into the database.
  3. Start the build with said commit as target, instead of target branch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.