thoth-station / core Goto Github PK

View Code? Open in Web Editor NEW

28.0 12.0 25.0 3.57 MB

Using Artificial Intelligence to analyse and recommend Software Stacks for Artificial Intelligence applications.

Home Page: https://thoth-station.github.io/

License: GNU General Public License v3.0

Makefile 3.19% Go 96.81%

thoth aistacks artificial-intelligence hacktoberfest

core's People

Contributors

Stargazers

Watchers

core's Issues

Timing of graph-refresh, package-releases and graph-sync cronjobs

It's worth to setup cronjobs correctly, so here's a summary how we should do that:

the package-releases cronjob has to be scheduled to correctly sample new releases on PyPI based on RSS feed that has limited number of entries - last 40. The current setup runs package-releases every 6 minutes that seems to be OK based on frequency of PyPI package releases.
graph-refresh job and graph-sync job should be scheduled in a way that graph-refresh job does not schedule duplicate solver analyses - that means between each and every graph-refresh job runs has to be run one graph-sync job. We can set these two to be run on same frequency. The lower limit for scheduling here is that possibly slow graph syncs have to finish between runs.

Deprovision script should deprovision from all namespaces used

middletier, frontend, middlent, ...

Implement Pagination of git_clone_repos.yaml and git_update_repos.yaml

Cloning repositories from thoth-station is maximum number of repositories == 30.

Document graph-refresh cronjob

We should document its presents in the README file.

introduce THOTH_SOLVER_DEBUG

put THOTH_SOLVER_DEBUG into a configmap, add it to ansible playbooks so that we can use it at https://github.com/thoth-station/graph-refresh-job/blob/master/app.py#L114

Add/check

configmap template
ansible playbooks and roles
deployment config and cronjobs

Do not ignore namespace creation failure in Ansible playbooks

Currently, we use ignore: true to guard when namespace creation fails (e.g. deploying to same namespace). To be more precise, we should run the given ansible task only if the requested namespace (for tiers) do not exist and remove ignore: true statement.

Rename THOTH_CEPH_HOST to THOTH_S3_ENDPOINT_URL

Rename THOTH_CEPH_HOST to THOTH_S3_ENDPOINT_URL to respect its semantics.

Revisit structure of ansible playbooks

This is mostly design decision on how we would like to handle deployment of the application.

We could keep a single openshift template that keeps all objects definition per repository so when we make changes in the application, there is one template that states how to deploy that part of application. This will also reduce size of playbooks we need to maintain and we do not miss object that needs to be created in the playbooks (templates will act like a black box - ansible playbooks will know only about applications that need to be deployed, but not about objects itself).

@goern any thoughts?

graph refresh job deployment is missing from palybook

It's creation seems to be missing from https://github.com/thoth-station/core/blob/master/playbooks/roles/thoth-frontend/tasks/main.yaml

This obsoletes #35

Document graph-sync cronjob

We should make sure it is present in the README file documentation.

move CronJobs out of template into their own repositories

Move CronJobs and maybe related ImageStreams out of template.yaml and into {cleanup-job|graph-sync-job}

'needs-rebase' label should block a merge by sesheta

Sesheta must not merge any PR that is carrying the "needs-rebase" label. this should be a simple addition to https://github.com/thoth-station/core/blob/master/bots/sesheta/common.py#L37

common-logging branch deprecated?

@fridex Can we delete the common-logging branch?

Provide a way to specify user-api 'app-secret-key' secret

It would be nice to have an option in ansible's provision.yaml file to specify user API secret used on admin API endpoints. This must be refactored in secrets-template.yaml too.

Create roles for analyzer service account

Is your feature request related to a problem? Please describe.

As of now, we assign cluster admin to the analyzer service account. It would be better to create a fine-granted access control for this service account.

Describe the solution you'd like

Have roles defined on deployment (from top of my head, some might be missing):

create jobs (later CRDs) in middletier, backend namespace
create configmap (later CRDs) in middletier, backend namespace
list templates in infra namespace

Describe alternatives you've considered

Stick with the current solution - having a edit cluster role.

Additional context

https://kubernetes.io/docs/reference/access-authn-authz/rbac/

Rename THOTH_MIDDLEEND_NAMESPACE to THOTH_MIDDLETIER_NAMESPACE

Unable to re-deploy thoth

Describe the bug

When running deprovision and provision again, I get the following error:

TASK [thoth-infra-buildconfigs : make sure to use project fpokorny-thoth-dev] ****************************
ok: [localhost]

TASK [thoth-infra-buildconfigs : create Secret for Zuul's incoming build triggers] ***********************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "create", "secret", "generic", "zuul-incoming-webhook", "--from-literal=WebHookSecretKey="], "delta": "0:00:01.638268", "end": "2018-09-16 21:59:06.478950", "msg": "non-zero return code", "rc": 1, "start": "2018-09-16 21:59:04.840682", "stderr": "Error from server (AlreadyExists): secrets \"zuul-incoming-webhook\" already exists", "stderr_lines": ["Error from server (AlreadyExists): secrets \"zuul-incoming-webhook\" already exists"], "stdout": "", "stdout_lines": []}
	to retry, use: --limit @/home/fpokorny/git/thoth-station/core/playbooks/provision.retry

PLAY RECAP ***********************************************************************************************
localhost                  : ok=44   changed=39   unreachable=0    failed=1

To Reproduce
Steps to reproduce the behavior:

Provision Thoth in a clean namespace using ansible playbooks
Deprovision Thoth using ansible playbooks
Provision Thoth into the namespace
See error

Expected behavior

Re-provisioning should be done correctly.

provision playbook does not deploy user-api related objects

... as user-api templates are in user-api/ repository

Create a base image for API services

As we are using swagger to document all the endpoints, it could be good to consider creating an own s2i image for API services. The base image would be based on Python3's s2i and would look for swagger.yaml that will act like an entrypoint. The swagger definition (in case of connexion) states where the implementation sits so the base image can directly load swagger.yaml and run the server. All the API server related configuration (setting up logging, setting up listing endpoints, liveness/readiness probes) can be abstracted away to the base image.

configmap template has defaults for rsyslog which should be empty

Sentry Issue: THOTH-STAGE-1

gaierror: [Errno -2] Name or service not known
  File "logging.py", line 107, in init_logging
    address=(_RSYSLOG_HOST, int(_RSYSLOG_PORT)))
  File "handler.py", line 124, in __init__
    super(Rfc5424SysLogHandler, self).__init__(address, facility, socktype)
  File "handlers.py", line 831, in __init__
    ress = socket.getaddrinfo(host, port, 0, socktype)
  File "socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

RSYSLOG_HOST and RSYSLOG_PORT have been set but rsyslog:10514 cannot be reached

Remove unused entries in configmap

These entries should be now part of job templates, do not use them from configmap:

https://github.com/thoth-station/core/blob/master/openshift/configmap-template.yaml#L31-L32

Use save-config for oc create

Describe the solution you'd like

As we are using OpenShift as a naming service, we should consider creating passing --save-config to oc create as we will most probably do oc apply to propagate changes made in the template to the cluster.

add Fedora imagestreams to playbooks

so that they can be used by builds

Use oc process | oc apply instead of oc new-app in playbooks

Is your feature request related to a problem? Please describe.

As we are currently using oc new-app, we cannot easily update deployment without deleting templates from openshift.

Describe the solution you'd like

If we could use oc process | oc apply instead, we could run the playbooks multiple times without failing that the given object already exists in openshift.

Describe alternatives you've considered

Create an update playbook that would do basically the same. But I think we could substitute oc process | oc apply in playbooks to do it only once.

result-api does not return solver results

  Scenario: Get one of the for currently available Solver Results  # features/result_api.feature:25
    Given I am using the TEST environement                         # features/steps/result_api_steps.py:29 0.000s
    When I query the Solver API for a list of solver results       # features/steps/result_api_steps.py:47 0.780s
    And I get one of the solver results                            # features/steps/result_api_steps.py:75 0.493s
      Assertion Failed: 
      Expected: <HTTPStatus.OK>
           but: was <404>

ApiException: (403)

Sentry Issue: THOTH-3SX

ForbiddenError: 403
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fef1982b-a52e-41e1-86e2-4ea40f4d12d6', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Sat, 16 Mar 2019 07:56:56 GMT', 'Content-Length': '477'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in the namespace \\"thoth-backend-stage\\": User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in project \\"thoth-backend-stage\\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}\n'
Original traceback: 
  File "/opt/app-root/lib/python3.6/site-packages/openshift/dynamic/client.py", line 71, in inner
    resp = func(self, resource, *args, **kwargs)

  File "/opt/app-root/lib/python3.6/site-packages/openshift/...
(3 additional frame(s) were not displayed)
...
  File "click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "app.py", line 177, in cli
    _do_cleanup(cleanup_namespace)
  File "app.py", line 87, in _do_cleanup
    for item in resources.get(label_selector=_CLEANUP_LABEL_SELECTOR, namespace=cleanup_namespace).items:
  File "openshift/dynamic/client.py", line 73, in inner
    raise api_exception(e)

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fef1982b-a52e-41e1-86e2-4ea40f4d12d6', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Sat, 16 Mar 2019 07:56:56 GMT', 'Content-Length': '477'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in the namespace \\"thoth-backend-stage\\": User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in project \\"thoth-backend-stage\\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}\n'

(3 additional frame(s) were not displayed)
...
  File "kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)

Introduce multiple configmaps respecting semantics

As of now we have one large configmap. We should create multiple configmaps respecting their semantics and use envFrom directive to insert all the configuration options from the given configmap. This way, if we change something for example in logging (e.g. introduce new functionality), we do not need to manually change each and every openshift template.

Candidates are:

thoth-logging - all the related configuration to logging (rsyslog host/port, sentry, elk host...)
thoth-ceph - configuration related to ceph (S3 adapter)
thoth-namespaces - name of namespaces (there is a possibility to introduce new ones in the future)

References:

https://trello.com/c/KRYZgfKs

Remove jenkins roles?

Do we still need these roles?
core/playbooks/roles/thoth-jenkins/tasks/

wait for init-job to be successfully finished"

core/playbooks/initialize.yaml

Lines 30 to 34 in 213c701

 msg: "TODO wait for init-job to be successfully finished" 

 check_mode: yes 

 - name: "delete templates used by Init Job" 

 shell: oc delete templates --namespace "{{ infra_namespace }}" --selector "component=init-job"

This issue was generated by todo based on a `TODO` comment in `213c701` when #184 was merged. cc @goern.

check if this works with branches that are not included in a PR

core/Jenkinsfile

Lines 58 to 63 in 04008f4

 serviceAccount OPENSHIFT_SERVICE_ACCOUNT 

 containerTemplate { 

 name 'jnlp' 

 args '${computer.jnlpmac} ${computer.name}' 

 image DOCKER_REPO_URL + '/'+ CI_NAMESPACE +'/jenkins-aicoe-slave:latest' 

 ttyEnabled false

This issue was generated by todo based on a `TODO` comment in `04008f4` when #81 was merged. cc @goern.

validate needs to check all namespaces for configmaps

see https://github.com/thoth-station/core/blob/master/playbooks/roles/validate-thoth-core/tasks/main.yaml#L20

use JanusGraph Service

Is your feature request related to a problem? Please describe.
We want JanusGraph to be exposed as a Service, so that all components could use it via ENV or DNS. At the same time it should be transparent if the service is deployed on the same project or is in external service.

Describe the solution you'd like
JanusGraph service in thoth project

Describe alternatives you've considered
hard coded use of janusgraph:80 -> to be removed

Additional context
n/a

create playbook to reconfigure configmap and secrets

As a ThothOps,
I want to run a playbook,
so that ConfigMaps and Secrets get reconfigured with new keys and/or values.

Deprovision playbook do not respect namespace configuration

The current implementation of the deprovision.yaml script requires THOTH_NAMESPACE to be defined, but it does not use it. It rather deprovisions thoth from the current namespace.

Fix job skews we are observing in the test deployment

$ oc describe cronjobs/package-releases
<snip>
  Warning  FailedNeedsStart  3m (x64068 over 7d)  cronjob-controller  Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
</snip>

use-api-service-port is a generic value

we can conclude from ENV rather than configuring it at https://github.com/fridex/thoth-core/blob/master/openshift/configmap-template.yaml#L39

central logging, monitoring and metrics

Let's implement a central logging, monitoring and metrics infrastructure component, so that we for example see conditions like the unpullable image in thoth-station/cleanup-job#12 (comment)

Revisit service account for services

Currently, there is assigned edit cluster role to all services that have app: thoth-core. We should revisit this configuration and check if only necessary services have this cluster role so no security risks are possible. Namely, only the following should have edit cluster role:

user-api
cleaup-job
graph-refresh-job

Create janusgraph configuration entry in ConfigMap

Is your feature request related to a problem? Please describe.

If we decide to host graph database elsewhere, we should be able to configure it for all the services and pods of thoth at one place. Currently, we hardcoded THOTH_JANUSGRAPH_HOST and THOTH_JANUSGRAPH_HOST to each template that is not nice at all.

Describe the solution you'd like

We should move it to configmap and ideally make it parametrizable from ansible-playbooks.

Introduce caching of results

Is your feature request related to a problem? Please describe.

As a user can submit multiple requests and can ask for the same result multiple times, it would be nice to introduce a caching layer so, if not stated otherwise, results are picked from cache instead of computing them again. Cache will be invalidated after certain period of time.

Describe the solution you'd like

We can use Ceph as a cache where we can link computed digests of Pipfile and Pipfile.lock and container image hashes to results of analyses. Cache of container image analysis do not need to be invalidated (if not explicitly requested on code change) as these scans do not change over time (assuming code will not change over time).

Describe alternatives you've considered

We could also use other dbs like Redis for this, but it can be easier to start with Ceph as we have already deployment configured for it.

Document package-releases-job

Once package-releases-job will be part of the deployment, we should document its presents in the README file.

a space too much

Various files work with 'THOTH_S3_ENDPOINT_URL', the string has been including a trailing space that needs to be removed. See

core/openshift/configmap-template.yaml

Line 37 in e40f650

ceph-host: ${THOTH_S3_ENDPOINT_URL }

Revisit README file - some links/sections are not relevant anymore

review BuildConfig, ImageStreams, CronJobs to use same nameing scheme

Some CronJobs and their ImageStreams and BuildConfig use different naming schemata for the Images.

fridolin btw we should design naming in cronjobs - sometimes there are -cronjobs sometimes -jobs suffixes
goern ja, I know stumbled about that a few times last week I guess the image should be -job as we can run it manually or via cronjob the BC should be -job, the CronJob without any postfix

approver sets needs-rebase label on merged PR

approver needs to check if a PR is not mergeable due to the fact that it has been merged. If a PR is not merged and not mergeable, than 'needs-rebase' label should be set.

document naming schema for API endpoints and their objects

We have several alternatives and should decide which one we prefer and use...

Do not ignore errors on secrets/configmaps creation

We shouldn't ignore errors on secrets/configmaps creation. The ignoring was created due to possible collision of due to already existing secrets/configmaps (due to deployment to same namespace). We should avoid this logic - there could be errors when configmaps/secrets couldn't be created (e.g. missing parameters).

implement auto_approver bot component

Implement an auto_approver bot component that will /approve a PR if

it is mergeable
the latest jenkins ci tests is green

could be implemented as a cronjob

Revisit templates - check labels section

In most of the templates we have, there is a copy-pasta error where labels are not part of metadata section. Let's check which templates are affected and change this behavior so we can use label selectors correctly.

the following tasks could be moved into their own roles....

core/playbooks/provision.yaml

Lines 155 to 160 in 5302fe4

 # TODO the following tasks could be moved into their own roles.... 

 - name: "create Graph Sync Scheduler ImageStream" 

 shell: oc process --namespace "{{ infra_namespace }}" "{{ item }}" | oc apply --namespace "{{ frontend_namespace }}" -f - 

 with_items: 

 - "graph-sync-scheduler-imagestream"

	msg: "TODO wait for init-job to be successfully finished"
	check_mode: yes

	- name: "delete templates used by Init Job"
	shell: oc delete templates --namespace "{{ infra_namespace }}" --selector "component=init-job"

	serviceAccount OPENSHIFT_SERVICE_ACCOUNT
	containerTemplate {
	name 'jnlp'
	args '${computer.jnlpmac} ${computer.name}'
	image DOCKER_REPO_URL + '/'+ CI_NAMESPACE +'/jenkins-aicoe-slave:latest'
	ttyEnabled false

	# TODO the following tasks could be moved into their own roles....

	- name: "create Graph Sync Scheduler ImageStream"
	shell: oc process --namespace "{{ infra_namespace }}" "{{ item }}" \| oc apply --namespace "{{ frontend_namespace }}" -f -
	with_items:
	- "graph-sync-scheduler-imagestream"

thoth-station / core Goto Github PK

core's People

Contributors

Stargazers

Watchers

Forkers

core's Issues

This issue was generated by todo based on a TODO comment in 213c701 when #184 was merged. cc @goern.

This issue was generated by todo based on a TODO comment in 04008f4 when #81 was merged. cc @goern.

This issue was generated by todo based on a TODO comment in 5302fe4 when #199 was merged. cc @goern.

Recommend Projects

Recommend Topics

Recommend Org

Jobs

This issue was generated by todo based on a `TODO` comment in `213c701` when #184 was merged. cc @goern.

This issue was generated by todo based on a `TODO` comment in `04008f4` when #81 was merged. cc @goern.

This issue was generated by todo based on a `TODO` comment in `5302fe4` when #199 was merged. cc @goern.