iver-wharf / wharf-api Goto Github PK
View Code? Open in Web Editor NEWWharf backend written in Go
License: MIT License
Wharf backend written in Go
License: MIT License
As BranchList
has its default branch as DefaultBranch
we should have EngineList
's default engine as DefaultEngine
for consistency. It also prevents swaggo from generating the variable name _default
, which our current lint settings don't like over at wharf-web
.
Remember to change the JSON tag name to defaultEngine
as well.
Note: As changing names of things can often be a breaking change I am putting this at a high priority. This change is not required and the issue may be closed if decided against.
Based on iver-wharf/iver-wharf.github.io#72
Tip is to look at https://github.com/iver-wharf/wharf-api/security/code-scanning/setup and see if there's any templates
Could probably just do make docker
and inspect the result
Our build responses and requests has both build status ID and string enum.
Suggest removing the ID from the REST API and only rely on the status string.
This needs deprecation warning, so could be removed first v6 if deprecated in v5
Based on RFC-0016 (iver-wharf/rfcs#16)
The PUT method in REST is meant to represent a "replace". Similar to the GET method, the subject (which object to get/replace) should come from the URL path.
PUT /project/{projectId}
PUT /project/{projectId}/branch
PUT /provider/{providerId}
PUT /token/{tokenId}
PUT /project/
PUT /branches
PUT /provider
PUT /token
curl -X DELETE 'https://your-wharf-instance.com/api/project/5'
>>> OK
And all branches and builds, and their associate build logs and build params, are deleted as well.
As it's quite the dangerous endpoint, maybe we want to add something like ?are-you-sure=true
, or is that not necessary as it's an API anyway, meant to be used by computers?
Response given:
{
"type": "https://iver-wharf.github.io/#/prob/api/unexpected-db-read-error",
"title": "Error writing to database.",
"status": 502,
"detail": "Failed deleting project with ID 5 from database.",
"instance": "/api/project/5",
"errors": [
"ERROR: update or delete on table \"project\" violates foreign key constraint \"fk_branch_project\" on table \"branch\" (SQLSTATE 23503)"
]
}
You must delete all dependent rows from the other tables branch
and build
(and in turn the log
and build_param
tables as well) before you can delete the project.
As there's no endpoints to delete them you must do it via SQL commands.
Two main issues:
No direct way to only load the artifact metadata without the BLOB, meaning a lot of data transfer we don't usually care about
No way to use an external storage tool, such as an S3, to store the BLOBs. BLOBs are not really meant to be stored in databases as you can't query them and they just bloat the database leading to worse performance.
database.ArtifactData
Data
field in database.Artifact
to be a pointer to database.ArtifactData
Data
when needed, such as in the GET /build/{buildId}/artifact/{artifactId}
endpointartifact_data
Add S3 support. Adding migrations should be easy enough. We could change the ArtifactData.Data
field contain reference to the artifact ID in the S3 storage.
Currently you have to use ID to get a single project.
Want to add so we can look them up on some kind of normalized name, such as a URL slug dedicated for each project.
They would still have their IDs behind the scenes.
nameSlug
and groupNameSlug
columns to projectMaybe we want to have a normalized slug as well so that we can make fast case insensitive lookups.
Example project:
Column | Value |
---|---|
name | My Project |
groupName | My Group/Stuff |
nameSlug | My-Project |
nameSlugNorm | my-project |
groupNameSlug | My-Group-Stuff |
groupNameSlugNorm | my-group-stuff |
In queries like GET /api/project?groupNameSlug=My-Group-Stuff
, do the actual lookup on the groupNameSlugNorm
field, but use the groupNameSlug
in the response object. This way the frontend can redirect to its "canonical" slug with cases preserved.
All of this is to allow wharf-web to use project slugs in the URL instead of project IDs. So the URLs would be
http://localhost/project/My-Group-Stuff/My-Project/builds
instead of
http://localhost/project/123/builds
Need to consider how to do the project data lookups as well. We perhaps don't want to add another set of API endpoints just for using the slugs instead of the IDs. One idea is to let the wharf-web rely on querying GET /api/project?nameSlug=...&groupNameSlug=...
for fetching the project metadata (including ID) via something like route resolvers and then use that in future requests for stuff that targets the project like "start a new build". (explored in iver-wharf/wharf-web#117)
Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.
This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.
Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".
There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.
See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"
A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest
, which is an invalid docker image name.
There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.
Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.
Steps:
We have a lot of misconfigured associations in our pkg/model/database
package.
When doing just a simple test implementation of Sqlite foreign key constraints (#145), I noticed that our invalid database model relations are really obstructing this. Maybe it will mess up our Postgres integration as well?
For example, the Project
type should have many builds, but instead the Build
model is set up to with a "has-one" relation to Project
. I.e. an inverted model dependency
type Project struct {
TimeMetadata
ProjectID uint `gorm:"primaryKey"`
// ...
}
type Build struct {
TimeMetadata
BuildID uint `gorm:"primaryKey"`
ProjectID uint `gorm:"not null;index:build_idx_project_id"`
Project *Project `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
// ...
}
The Project
model should have a "has-many" relation to the Build
model, like so:
type Project struct {
TimeMetadata
ProjectID uint `gorm:"primaryKey"`
+ Builds []Build `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
// ...
}
type Build struct {
TimeMetadata
BuildID uint `gorm:"primaryKey"`
ProjectID uint `gorm:"not null;index:build_idx_project_id"`
- Project *Project `gorm:"foreignKey:ProjectID;constraint:OnUpdate:CASCADE,OnDelete:CASCADE"`
// ...
}
Maybe need to extend the engines type to also allow declaring the engine type, with an enum of jenkins-generic-webhook-trigger
and wharf-cmd
Based on RFC-0029: iver-wharf/rfcs#29, published: https://iver-wharf.github.io/rfcs/published/0029-wharf-api-migrations
Add migrations table to track what migrations has been applied or not, and add ability to skip migrations if the database is already up to date.
Sample data:
migration_id |
---|
2016-08-30T14:00:00Z-v5.0.0 |
2016-08-30T14:15:00Z-v5.0.0 |
2016-08-30T14:30:00Z-v5.1.0 |
Depends on #12
Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.
This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.
Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".
There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.
See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"
A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest
, which is an invalid docker image name.
There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.
Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.
Created as per discussion over at iver-wharf/wharf-web#53 (comment)
The messages taken from .TRX files are currently escaped.
This is slightly annoying as we have to unescape them when we want to display them in wharf-web.
Initial thoughts from me is that unescaping before storing in DB seems okay, I don't know if we would ever want to have the escaped version. Also likely that if we do need it, it wouldn't be a big hassle to either re-escape the existing data on request/migration, or let unescaped messages stay unescaped.
GET /api/build/{buildId}/test-result/detail
and GET /api/build/{buildId}/test-result/summary/{artifactId}/detail
to retrieve unescaped or escaped.Based on RFC-0016 (iver-wharf/rfcs#16)
There's a mix of functionality and code duplicated across some PUT and POST endpoints. Such as PUT /project
vs POST /project
They both try to act like "add or update", while POST endpoints should only add and PUT endpoints should only update/replace.
Need to be extra careful here about backward compatibility.
As PUT /project
should be changed to PUT /project/{projectId}
(as declared in #70), then we can just keep the old implementation on PUT /project
to keep it backwards compatible.
But for POST /project
, it should not get an extra path parameter and so the existing implementation needs to be fully replaced. We cannot keep backward compatibility here.
Need to make sure in all wharf-provider-... repos that the POST endpoints are not used in a way that they assume the endpoint will update an existing element. If so, we either need to patch those providers and ignore the side-effects from the backward incompatibility and just hope that everyone upgrades the providers before or at the same time as the wharf-api; or we take some other radical approach such as choosing plural instead of singular (as denoted in #69)
Follow example from iver-wharf/wharf-core#2 once it has been merged
Based on RFC-0016 (iver-wharf/rfcs#16)
See note from RFC:
Deprecated. Has not been moved, but instead planned to be removed. Was not implemented in v4, and its usage is replaced instead by the
GET /project/{projectId}/branch
endpoint.https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#get-branchbranchid
Described by RFC-0024: iver-wharf/rfcs#24
Addition of new table project_override
which defines almost all of the columns in the project
table that, if set, will be used in HTTP response body and left untouched by HTTP PUT requests.
Finding the default branch is a very common practice that so far have come up in both the web-ng and api repos. Instead of duplicating code for iterating through the branches every time, let's instead add a project.DefaultBranch
property.
Let project.DefaultBranch
be a foreign key. Fetching the default branch will then be automatically handled by the ORM (gorm). Ultimately we could then remove the Branch.Default
column as that's redundant.
project.DefaultBranch
(in the api repo) be a foreign key of the projects table pointing to a specific branch.OnUpdate=Cascade; OnDelete=Set Null
(see: https://gorm.io/docs/constraints.html#Foreign-Key-Constraint)defaultBranch
column to the project
table.default
column set to true
per project, and assign the projects defaultBranch
column to point to that branch.default
column from the branch table.default
property. That model will later be changed in #14. (Optionally, do that task in this issue too. Balance of making too many hacky solutions for not changing the models vs too big of a issue/PRs. You will have to decide on your own for this one.)project.DefaultBranch
field.DefaultBranch
property where appropriate.This would involve changing the JSON models given to- or recieved from the following endpoints:
/branch
(model would remain, but logic has to change to also update the project table.)/branches
(either keep the model or change it so default
is only specified once. Either way it needs to update the project table as well.)/project
(allow setting default branch on update, and an automatic "create branch if it doesn't exist" on the specified default branch)/project
(allow setting default branch on creating, and an automatic "create branch if it doesn't exist" on the specified default branch)/project/:projectid
(include default branch)/projects
(include default branches per project)I.e. this is a major breaking change. Changes need to be done in the following repos:
Do a major version bump where the new code is not backwards compatible (such as relies on the new defaultBranch property) in all touched repos.
depends on #13
It would make more sense to obtain the project ID from the URL path.
Potentially moving this entire endpoint to project.go, and moving the default parameter out to only be specified once, and in the process invalidating the issue for if the user sends more than 1 branch marked as default, ex:
PUT /projects/:project_id/branches
{
"default_branch": "master",
"branches": [
{ "name": "master" },
{ "name": "dev" },
{ "name": "lts" }
]
}
This is from the perspective of having a nice looking API, and not just having it being easy to implement in Go. I'm quite annoyed that for example the create endpoints specify that you can set the ID's, while those values are fully ignored.
Having custom types for the endpoints isn't a bad thing IMO.
Depends on #176
Elasticsearch supports logs streaming, so this could let the wharf-api be scalable as the logs is currently the big thing making the wharf-api unscalable. Currently if we were to have 2 wharf-api pods, then the SSE log streaming would only send half of the logs due to the load balancing on log insertion.
As we don't have dedicated red teams trying to penetrate Wharf, we should add automatic DAST (Dynamic Application Security Testing) integrated into our CI pipeline.
This needs investigation, but one proposal that comes to mind is OWASP PurpleTeam (developed partly by some folks from OWASP https://owasp.org/) that tries to apply common and uncommon security flaws on an HTTP API. Language agnostic, but you need to supply it a list of endpoints, which we already got via Swaggo. They allow you to self-host for free, which we could set up quite easily I think.
PurpleTeam does not substitute real red teams, but it is better than nothing.
Maybe the security design of Wharf is too bad in its current state to even consider this. But we should consider it once the RFC iver-wharf/rfcs#13 has been implemented.
Based on RFC-0016 (iver-wharf/rfcs#16)
Explained in the RFC:
Starting with v5, the wharf-api no longer accepts nor returns database models from ANY of its endpoints. Instead, there are now 3 new packages of models inside the wharf-api:
pkg/models/database
: /.../pkg/models/request
: /.../pkg/models/response
: /.../
For faster logs insertions, we add a gRPC API for streaming logs into the wharf-api.
Each wharf-cmd-aggregator only opens up 1 connection each with this streaming RPC.
Ex:
syntax = "proto3";
import "google/protobuf/timestamp.proto";
service Logger {
rpc CreateLogStream(stream CreateLogStreamRequest) returns (CreateLogStreamResponse);
}
message CreateLogStreamRequest {
uint64 build_id = 0;
uint64 worker_step_id = 1;
uint64 worker_log_id = 2;
string log_message = 3;
google.protobuf.Timestamp timestamp = 4;
}
message CreateLogStreamResponse {
// empty message, but left here in case of future usage
}
Details regarding worker_step_id
& worker_log_id
: https://iver-wharf.github.io/rfcs/published/0025-wharf-cmd-provisioning#concept-log_id-step_id-event_id-and-artifact_id
Based on RFC-0016 (iver-wharf/rfcs#16)
For the sake of consistency, all endpoints should follow a more defined format. Currently (in v4.2.0) the endpoints mix between singular and plural.
All endpoints should be in singular.
The full list of proposed changes can be found at https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#renamed-path-parameters
We want to display the latest build result in the GUI.
All project GET endpoints should return the latest build for that project.
This can then later be implemented in the frontend to show the latest status per project in the list.
Best would be to return the build model inside it and not only the build status, so we could add a link in the frontend to quickly go to that build when needed.
Scope down after validating wharf-ci file in Jenkins.
Split the task in the future:
Warn users of validation in frontend, but let them bypass if they so choose.
Ideas:
The code for parsing .wharf-ci.yml files are stored in the cmd repo atm. This should be extracted to the core lib repo (https://github.com/iver-wharf/wharf-core) or added to its own repo, just so this repo can then take use of it when importing
Add new feature to send notifications, ex via email, about build results. And include test results in that notification.
This component should be done first: iver-wharf/iver-wharf.github.io#33
Idea originates from @fredx30 over at iver-wharf/wharf-web#25 (review)
The idea is that when deploying we also push an npm package of the generated client that we get from https://github.com/swagger-api/swagger-codegen
Sample script: (add this to a Makefile
, a powershell script, or maybe even a GitHub Action)
# Generating swagger.json/yaml artifacts
swag init --parseDependency --parseDepth 1
export VERSION="v4.0.0"
mkdir -p dist
yq eval '.info.version=$version' --arg version "$VERSION" docs/swagger.yaml > dist/swagger.yaml
jq '.info.version=strenv(VERSION)' docs/swagger.json > dist/swagger.json
# Generating NPM package
# copy some prepared package boilerplate that includes stuff like a package.json and such
cp -r src/wharf-api-client-angular dist/api-client
docker run --rm --tty --volume "$(pwd)/dist:/dist" \
swaggerapi/swagger-codegen-cli:2.4.19 \
generate \
--input-spec /dist/swagger.json \
--lang typescript-angular \
--output /dist/api-client/src \
--additional-properties ngVersion=9.0.3
cd dist/api-client
npm version "$VERSION"
npm publish
cd ../..
# Releasing with swagger.json/yaml artifacts
git tag "$VERSION" --sign --message "$VERSION"
git push --tags
gh release create "$VERSION" --title "$VERSION" --draft
gh release upload "$VERSION" dist/swagger.{json,yaml}
Needs to be discussed (meeting?) but if we add an Angular client to this repo then we should possibly also merge in the wharf-api-client-go
into this repo. Or should this code live in its own repo, like wharf-api-client-angular
?
Or should we dissolve this idea and then also remove the wharf-api-client-go
repo and rely on generated code for that one as well?
Add a query parameter to when starting builds to select the execution engine. Such as:
POST /api/project/{projectId}/build?engine=wharf-cmd
The value possible here should come from the configs. Such as:
ci:
defaultEngine: jenkins
engines:
jenkins:
name: Jenkins
triggerUrl: https://jenkins.local/whatever
triggerToken: changeit
wharf-cmd:
name: wharf-cmd
triggerUrl: http://wharf-cmd-provisioner
triggerToken: changeit
Needs backwards compatbility, so if defining ci.triggerUrl
and ci.triggerToken
(the existing config fields) then that should be equivalent to:
ci:
# old fields:
triggerUrl: https://jenkins.local/whatever
triggerToken: changeit
# are translated to the following:
defaultEngine: unnamed
engines:
unnamed:
name: Unnamed
triggerUrl: https://jenkins.local/whatever
triggerToken: changeit
There should also be new endpoints for accessing these endpoints, so the frontend can display a way to select between them:
GET /api/engine
Response:
{
"default": "jenkins",
"engines": {
"jenkins": {
"name": "Jenkins",
"triggerUrl": "https://jenkins.local/whatever"
},
"wharf-cmd": {
"name": "wharf-cmd",
"triggerUrl": "http://wharf-cmd-provisioner"
}
}
}
The response cannot include the tokens. But the URLs are safe to share and can be used to give more context to the user.
CLI commands are really useful for doing manual work that we don't want to expose via the web API.
This mostly includes heavy operations, such as data or database migrations.
# Start the HTTP server. This is the same as what wharf-api already did in its main before
wharf-api serve
# Migration commands. Low prio, as wharf-api has a decent enough auto-migration on boot
wharf-api migrate latest
wharf-api migrate list
wharf-api migrate rollback "2021-10-05T15:30:00Z-v5.0.0"
# Per-type migration subcommands
wharf-api migrate artifact from s3
wharf-api migrate artifact to s3
wharf-api migrate logs from elasticsearch
wharf-api migrate logs to elasticsearch
# Common flags
wharf-api --version
wharf-api --help
wharf-api --config="my-wharf-api-config.yaml"
This means we would have to do some reorganizing in the packages, as most code lies in the main
package and main
packages cannot be reused by other packages.
Suggest:
main.go # Runs cmd/root.go
cmd/*.go # Cobra commands definitions
pkg/serve/serve.go # HTTP endpoints
pkg/data/data.go # Data access abstraction for reading/writing from Sqlite, Elasticsearch, S3, etc.
Based on RFC-0016 (iver-wharf/rfcs#16)
According to the guidelines from https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#renamed-path-parameters, the path parameters should use camelCase, and not full lowercase.
Initialisms should only be Title cased, and not fully UPPERCASE, as that's more conventional in most languages except Go.
func (m projectModule) Register(g *gin.RouterGroup) {
project := g.Group("/project")
{
project.GET("/:projectId", m.getProjectHandler)
}
}
// @param projectId path int true "project ID"
func (m projectModule) getProjectHandler(c *gin.Context) {}
func (m projectModule) Register(g *gin.RouterGroup) {
project := g.Group("/project")
{
project.GET("/:projectid", m.getProjectHandler)
}
}
// @param projectid path int true "project ID"
func (m projectModule) getProjectHandler(c *gin.Context) {}
This is only in the idea phase, but would want a way to make HTTP POST
requests idempotent, allowing for retry logic.
There's this somewhat conventional usage of the X-Request-ID
header that can be used to provide idempotency on any request, given that the server supports it. More is explained in https://stackoverflow.com/a/54356305
A basic implementation would be to add a RequestID
field to builds, and when wharf-api receives a POST /api/project/{projectId}/build
, it will check in the recent builds if the same request ID has already been used, and if so then just use that build in the HTTP response instead of actually starting a new build.
Same goes for the other POST
endpoints.
Alternatively, the wharf-api could hold a cache of recent request IDs and their HTTP responses in memory. But to support scaling the wharf-api, we would require some distributed cache such as Redis. Maybe worth still? The implementation would be so much simpler and wouldn't need to bloat the database.
Based on RFC-0016 iver-wharf/rfcs#16
Restructuring the wharf-api by changing the paths of the endpoints, adding and renaming path parameters, and changing the request and response models around.
While we will keep backward compatibility for at least one major version, we will be changing so much that this needs a full major version bump.
Please see the RFC for further details: RFC-0016
Right now the test results are taken for each build separately and calculated each time by parsing artifact files.
A new table needs to be created to keep the test results summary. That table should have relations N:1 to build table and 1:1 relation to an artifact table.
Table should contain the number of tests run, passed, failed and skipped.
The values to this table should be calculated once when the artifact is uploaded into the backend.
Create a separate method to upload test results apart from artifacts
Modify the get method for builds to include the total test numbers in the build entity. That number should be a sum of all entities in the test result summary table related to that build.
The get test results method should be modified to return the list of results + a link to an artifact that those results originated from (artifact Id should be enough)
Change the frontend to use the new way of getting test results instead of making those additional calls per build.
Server sent events (SSE) use the same connection pool as other content. If your browser has a limit on ~4 per domain, then opening more than 4 logs will simply hault the entire browser page.
Websockets does not have such a connection limit in barely any browser, not even in mobile browsers.
While we still only want a 1-way connection where the server sends all the data and the client sits quiet, the limitation of SSE is enough to use websockets for such a use case.
The GET /build/{buildid}/stream
endpoint shall be deprecated.
Doing this right might be a more complicated story. Do we have 1 channel and let the client request the data it needs, or multiple channels that only deliver a single type of data? Such as comparing GET /ws
GET /ws/builds/{buildid}
vs GET /ws/builds/{buildid}/logs
Taking this route may also have us dig down in the rabbit hole of AsyncAPI (https://www.asyncapi.com/) as neither Swagger nor OpenAPI v3 supports websocket specifications. With this considered, doing gRPC might not be that bad of an idea either? Just giving it a thought.
We need authentication. This is "easier said than done", as we need to update ALL services to propagate the authentication.
We want user logins, but for first iteration we'll stick with a single auth token that's used in frontend and backend.
Updates per repo:
web:
add way to set auth token for when making requests. Easiest and most secure way would be to have a simple login field where you paste the token, and the token is saved IN MEMORY instead of in cookies or local/session storage. This is for security sake as the token does not change. Users can still store the token in their password managers so it shouldn't be that big of a hassle for the users.
this login page should be presented when the user opens Wharf, as you cannot fetch the list of projects without being authenticated
api:
providers:
go client:
We could do JWT-based auth, but that feels overkill for now. Just go with a basic key that acts as just a password and nothing else. The validation of the key should be a basic request.Header.Auth == os.GetEnv("API_KEY")
The code duplication from Codacy is more in the way than helpful. I do not think we should disable it, because it does improve the code in most cases, but we're in danger of "alert fatigue" if we keep ignoring some of the duplication warnings.
Some short searching lead me to the following resources that seem like they may be relevant:
The Codacy documentation sadly doesn't describe how to configure the duplication check:
Needs further research, and perhaps good idea to contact Codacy's support to make sure. From past experience, they're quite accomodative and friendly
Our RabbitMQ integration is half-fast done. It was added to Wharf to enable two-way communication between wharf-api and the providers, but we've since started planning on a different solution (Hide providers behind API)
The RabbitMQ solution is only bloating the wharf-api at the moment with code scattered around and extensive list of environment variables.
We've also planned a notification service for sending emails on events (iver-wharf/iver-wharf.github.io#33), something that could become the new home for RabbitMQ if we so decide to reintroduce it.
This needs a discussion meeting with the team to confirm.
Problem type was moved to wharf-core in iver-wharf/wharf-core#9 so this repo needs to update to reference that one instead.
Based on RFC-0014 iver-wharf/rfcs#14
Changing handling of test results in wharf-api.
Parse the .trx files once and insert into database instead of parsing each time it's requested.
Also gives the possibility to get more detailed summaries, as well as more detailed individual test results.
Please see the RFC for further details: RFC-0014
Based on iver-wharf/iver-wharf.github.io#75
Need to run Go tests and goimports formatting tests on commits and pull requests automatically.
As Wharf cannot do this yet, we should aim at using GitHub Actions.
Either we use the starter-workflow for Go https://github.com/actions/starter-workflows/blob/1d8891efc2151b2290b1d93e8489f9b1f41bd047/ci/go.yml which simply runs go test
Or we could look into a better integrated solution that could report failing tests as annotations inside the pull requests, such as:
Based on RFC-0016 (iver-wharf/rfcs#16)
As discussed in the "Alternative solutions":
While [using POST for search endpoints] can be useful, it’s not required for today’s use cases. Not banning POST searches for future use, but for these simpler search queries they do not fit well.
https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#alternative-solutions
GET /project/{projectId}/build
NAME PARAM TYPE REQUIRED? DESCRIPTION
projectId (path) integer true
limit (query) integer false Max number of items returned.
offset (query) integer false Shifts the window returned.
orderby (query) array[string] false Alphabetically, or order by ID?
environment (query) string false Filter on environment hard match
environmentMatch (query) string false Filter on environment soft match
finishedAfter (query) string[date-time] false Filter on finishedOn
finishedBefore (query) string[date-time] false Filter on finishedOn
gitBranch (query) string false Filter on gitBranch hard match
gitBranchMatch (query) string false Filter on gitBranch soft match
isInvalid (query) boolean false Filter on isInvalid
scheduledAfter (query) string[date-time] false Filter on scheduledOn
scheduledBefore (query) string[date-time] false Filter on scheduledOn
stage (query) string false Filter on stage hard match
stageMatch (query) string false Filter on stage soft match
status (query) string[enum] false Filter on status by enum string
statusId (query) integer false Filter on status by ID
The difference between a soft match (params ending with -Match
) vs hard match is that:
Idea is that if the query parameter is not used, then it will not be searched on. While if it's used then even if the parameter is empty then it's searched on (only relevant for the hard-match filters). Which means that this:
GET /project/{projectId}/build?stage=
...resolves to:
SELECT * FROM build WHERE project_id=@projectId AND stage=''
...while:
GET /project/{projectId}/build?
...resolves to:
SELECT * FROM build WHERE project_id=@projectId
The following endpoints will also have to be implemented, to fulfil the consistency gap:
GET /build
(not to be confused with GET /project/{projectId}/build
)POST /build/search
And the HTTP request body:
{
"projectId": 123,
"environment": "dev",
"stage": "build",
//etc...
}
Projects are currently mapped to their corresponding provider via name. Name can change, such as when a project is moved, but ID does not.
This is such a fundamentally impactful change that we need to be cautious and perhaps make a bunch of migration scripts.
Suggested change would be that the project in Wharfs DB is mapped to the git server and the project ID within that server. For example "gitlab.local" and "193".
There has come up a case where the current implementation caused a bug. Especially since the current GitLab importer actually maps the auth token with the project display name.
See the code (redacted link, it was outdated anyway, but pointed to somewhere in: https://github.com/iver-wharf/wharf-provider-gitlab/blob/master/import.go)
It would make more sense if it used the "path" instead of "name" property of the repo. To show the difference of the two, here a screenshot of GitLabs docs with an example project called "Diaspora Client" but with the path "dispora-client"
A project that is damaged by this is Foo Bar, which is accessed via the path "foo-bar" but has the name "Foo Bar" https://gitlab.local/default/foo-bar
When it tries to build docker images it builds them with the destination
harbor.local/default/Foo Bar:latest
, which is an invalid docker image name.
There so much auto-magic and assumed relations tied together here. We need to find a way to migrate the data in as a painless and future-proof way as possible.
Migrating to basing projects off the project ID and server domain instead is a good step in my (@jilleJr's) opinion.
Then we can also import meta data for it such as it's full path so that we later can use that value for default docker image names instead of these home-baked spagetti relations.
Apply migrations (automatic or running throught container command) (6h, 124)
We have duplication of ways to represent a similar process.
A while ago we introduced Build.IsInvalid
to be used when a build fails to start but has already been created in the database, such as if the Jenkins call fails.
Suggest instead to only use the Failed
build status. Migration could be done easily by a single SQL query of something like:
UPDATE build SET status=3 WHERE isInvalid=1
And then drop the isInvalid
column
We could introduce a FailedStatus
column that is a free-text field of why it failed, and then we could use that to show to the user the reason for it failing (when we know the cause, such as connection refused on contacting Jenkins).
Old and unused field on the provider model.
For GitHub, the API URL is https://api.github.com/, but to upload assets to a release you use https://uploads.github.com. This URL is however provided by the "Create a release" endpoint and should be able to be provided from there instead.
I've not researched the other providers, but I do know that the upload URL field is unused throught Wharf.
Migrations are needed. GORM doesn't removes columns automatically, it only adds them automatically.
In comparison to #11 , this issue is for full test results. Meaning each single test should be stored with its state and messages.
Should be populated when a test result is uploaded
Need endpoints to be able to fetch these.
Could probably settle for only storing non-successful tests
Declaring what to import already from a config file / k8s Secret.
Use case: Spin up new environments that is pre-populated
This is a feature request from a user who had to reset their postgres because KubeDB f*cked up. But the "spin up pre-populated environment of Wharf" is a promising feature.
This should get an RFC!
Concerns:
There will be secrets in these configs.
This would involve the API having to speak with the providers. We have to be careful here not to introduce a circular dependency chain as the providers currently depend on the API. Or the providers read the config instead
Either we watch for updates in the configfile, allowing hot reloading, or we only import on boot. Config could have settings such as:
importOnStartup: true
importOnConfigChange: true
Based on RFC-0016 (iver-wharf/rfcs#16)
Swagger allows for so called "operation IDs", which are string IDs used to identify endpoints.
These IDs are usually used when generating API clients, such as the swagger-codegen that we use for wharf-web.
So these IDs should be descriptive.
Names should follow the guidelines specified in https://iver-wharf.github.io/rfcs/published/0016-wharf-api-endpoints-cleanup#swaggeropenapi-endpoint-ids
Example:
// @id getProjectList
func (m projectModule) getProjectsHandler(c *gin.Context) {}
// @id getProject
func (m projectModule) getProjectHandler(c *gin.Context) {}
// @id createProject
func (m projectModule) postProjectHandler(c *gin.Context) {}
To later generate the TypeScript methods:
ProjectService.getProjectList() {}
ProjectService.getProject(projectId: number) {}
ProjectService.createProject(project: MainProject) {}
For consistency, the handler methods in Go should mimic the naming convention, so changing getProjectsHandler
→ getProjectListHandler
Swagger-codegen tries to generate its own method names as no IDs exists.
These generated names use the path segments, suffixed with the HTTP method.
Example:
ProjectService.projectGet() {}
ProjectService.projectsProjectidGet(projectid: number) {}
ProjectService.projectPost(project: MainProject) {}
Depends on:
Newer versions of Sqlite supports foreign key constraints, as does the newer versions of the github.com/mattn/go-sqlite3
package.
Should add this in as we sometimes rely on the constraints to CASCADE DELETE rows, without which we currently get orphan rows in the local wharf-api.db
database file.
We want to show the commit ID of the build
Possibly easier said than done. Currently we don't know the commit until mid-build when Jenkins checks it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.