GithubHelp home page GithubHelp logo

thoth-station / core Goto Github PK

View Code? Open in Web Editor NEW
28.0 28.0 25.0 3.57 MB

Using Artificial Intelligence to analyse and recommend Software Stacks for Artificial Intelligence applications.

Home Page: https://thoth-station.github.io/

License: GNU General Public License v3.0

Makefile 3.19% Go 96.81%
aistacks artificial-intelligence hacktoberfest thoth

core's People

Contributors

ace2107 avatar bissenbay avatar bjoernh2000 avatar codificat avatar fridex avatar gkrumbach07 avatar goern avatar gregory-pereira avatar harshad16 avatar hemajv avatar humairak avatar kpostoffice avatar mayacostantini avatar oindrillac avatar pacospace avatar schwesig avatar sesheta avatar shreekarss avatar tlegen-k avatar tumido avatar vannten avatar xtuchyna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

core's Issues

Unable to re-deploy thoth

Describe the bug

When running deprovision and provision again, I get the following error:

TASK [thoth-infra-buildconfigs : make sure to use project fpokorny-thoth-dev] ****************************
ok: [localhost]

TASK [thoth-infra-buildconfigs : create Secret for Zuul's incoming build triggers] ***********************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "create", "secret", "generic", "zuul-incoming-webhook", "--from-literal=WebHookSecretKey="], "delta": "0:00:01.638268", "end": "2018-09-16 21:59:06.478950", "msg": "non-zero return code", "rc": 1, "start": "2018-09-16 21:59:04.840682", "stderr": "Error from server (AlreadyExists): secrets \"zuul-incoming-webhook\" already exists", "stderr_lines": ["Error from server (AlreadyExists): secrets \"zuul-incoming-webhook\" already exists"], "stdout": "", "stdout_lines": []}
	to retry, use: --limit @/home/fpokorny/git/thoth-station/core/playbooks/provision.retry

PLAY RECAP ***********************************************************************************************
localhost                  : ok=44   changed=39   unreachable=0    failed=1   

To Reproduce
Steps to reproduce the behavior:

  1. Provision Thoth in a clean namespace using ansible playbooks
  2. Deprovision Thoth using ansible playbooks
  3. Provision Thoth into the namespace
  4. See error

Expected behavior

Re-provisioning should be done correctly.

Introduce caching of results

Is your feature request related to a problem? Please describe.

As a user can submit multiple requests and can ask for the same result multiple times, it would be nice to introduce a caching layer so, if not stated otherwise, results are picked from cache instead of computing them again. Cache will be invalidated after certain period of time.

Describe the solution you'd like

We can use Ceph as a cache where we can link computed digests of Pipfile and Pipfile.lock and container image hashes to results of analyses. Cache of container image analysis do not need to be invalidated (if not explicitly requested on code change) as these scans do not change over time (assuming code will not change over time).

Describe alternatives you've considered

We could also use other dbs like Redis for this, but it can be easier to start with Ceph as we have already deployment configured for it.

Do not ignore namespace creation failure in Ansible playbooks

Currently, we use ignore: true to guard when namespace creation fails (e.g. deploying to same namespace). To be more precise, we should run the given ansible task only if the requested namespace (for tiers) do not exist and remove ignore: true statement.

result-api does not return solver results

  Scenario: Get one of the for currently available Solver Results  # features/result_api.feature:25
    Given I am using the TEST environement                         # features/steps/result_api_steps.py:29 0.000s
    When I query the Solver API for a list of solver results       # features/steps/result_api_steps.py:47 0.780s
    And I get one of the solver results                            # features/steps/result_api_steps.py:75 0.493s
      Assertion Failed: 
      Expected: <HTTPStatus.OK>
           but: was <404>

configmap template has defaults for rsyslog which should be empty

Sentry Issue: THOTH-STAGE-1

gaierror: [Errno -2] Name or service not known
  File "logging.py", line 107, in init_logging
    address=(_RSYSLOG_HOST, int(_RSYSLOG_PORT)))
  File "handler.py", line 124, in __init__
    super(Rfc5424SysLogHandler, self).__init__(address, facility, socktype)
  File "handlers.py", line 831, in __init__
    ress = socket.getaddrinfo(host, port, 0, socktype)
  File "socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

RSYSLOG_HOST and RSYSLOG_PORT have been set but rsyslog:10514 cannot be reached

Timing of graph-refresh, package-releases and graph-sync cronjobs

It's worth to setup cronjobs correctly, so here's a summary how we should do that:

  • the package-releases cronjob has to be scheduled to correctly sample new releases on PyPI based on RSS feed that has limited number of entries - last 40. The current setup runs package-releases every 6 minutes that seems to be OK based on frequency of PyPI package releases.

  • graph-refresh job and graph-sync job should be scheduled in a way that graph-refresh job does not schedule duplicate solver analyses - that means between each and every graph-refresh job runs has to be run one graph-sync job. We can set these two to be run on same frequency. The lower limit for scheduling here is that possibly slow graph syncs have to finish between runs.

Revisit structure of ansible playbooks

This is mostly design decision on how we would like to handle deployment of the application.

We could keep a single openshift template that keeps all objects definition per repository so when we make changes in the application, there is one template that states how to deploy that part of application. This will also reduce size of playbooks we need to maintain and we do not miss object that needs to be created in the playbooks (templates will act like a black box - ansible playbooks will know only about applications that need to be deployed, but not about objects itself).

@goern any thoughts?

ApiException: (403)

Sentry Issue: THOTH-3SX

ForbiddenError: 403
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fef1982b-a52e-41e1-86e2-4ea40f4d12d6', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Sat, 16 Mar 2019 07:56:56 GMT', 'Content-Length': '477'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in the namespace \\"thoth-backend-stage\\": User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in project \\"thoth-backend-stage\\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}\n'
Original traceback: 
  File "/opt/app-root/lib/python3.6/site-packages/openshift/dynamic/client.py", line 71, in inner
    resp = func(self, resource, *args, **kwargs)

  File "/opt/app-root/lib/python3.6/site-packages/openshift/...
(3 additional frame(s) were not displayed)
...
  File "click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "app.py", line 177, in cli
    _do_cleanup(cleanup_namespace)
  File "app.py", line 87, in _do_cleanup
    for item in resources.get(label_selector=_CLEANUP_LABEL_SELECTOR, namespace=cleanup_namespace).items:
  File "openshift/dynamic/client.py", line 73, in inner
    raise api_exception(e)

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fef1982b-a52e-41e1-86e2-4ea40f4d12d6', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Sat, 16 Mar 2019 07:56:56 GMT', 'Content-Length': '477'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in the namespace \\"thoth-backend-stage\\": User \\"system:serviceaccount:thoth-frontend-stage:cleanup-job-thoth-middletier-stage\\" cannot list jobs.batch in project \\"thoth-backend-stage\\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}\n'

(3 additional frame(s) were not displayed)
...
  File "kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)

Use oc process | oc apply instead of oc new-app in playbooks

Is your feature request related to a problem? Please describe.

As we are currently using oc new-app, we cannot easily update deployment without deleting templates from openshift.

Describe the solution you'd like

If we could use oc process | oc apply instead, we could run the playbooks multiple times without failing that the given object already exists in openshift.

Describe alternatives you've considered

Create an update playbook that would do basically the same. But I think we could substitute oc process | oc apply in playbooks to do it only once.

Revisit templates - check labels section

In most of the templates we have, there is a copy-pasta error where labels are not part of metadata section. Let's check which templates are affected and change this behavior so we can use label selectors correctly.

Revisit service account for services

Currently, there is assigned edit cluster role to all services that have app: thoth-core. We should revisit this configuration and check if only necessary services have this cluster role so no security risks are possible. Namely, only the following should have edit cluster role:

  • user-api
  • cleaup-job
  • graph-refresh-job

Use save-config for oc create

Describe the solution you'd like

As we are using OpenShift as a naming service, we should consider creating passing --save-config to oc create as we will most probably do oc apply to propagate changes made in the template to the cluster.

Create janusgraph configuration entry in ConfigMap

Is your feature request related to a problem? Please describe.

If we decide to host graph database elsewhere, we should be able to configure it for all the services and pods of thoth at one place. Currently, we hardcoded THOTH_JANUSGRAPH_HOST and THOTH_JANUSGRAPH_HOST to each template that is not nice at all.

Describe the solution you'd like

We should move it to configmap and ideally make it parametrizable from ansible-playbooks.

Fix job skews we are observing in the test deployment

$ oc describe cronjobs/package-releases
<snip>
  Warning  FailedNeedsStart  3m (x64068 over 7d)  cronjob-controller  Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
</snip>

Do not ignore errors on secrets/configmaps creation

We shouldn't ignore errors on secrets/configmaps creation. The ignoring was created due to possible collision of due to already existing secrets/configmaps (due to deployment to same namespace). We should avoid this logic - there could be errors when configmaps/secrets couldn't be created (e.g. missing parameters).

implement auto_approver bot component

Implement an auto_approver bot component that will /approve a PR if

  • it is mergeable
  • the latest jenkins ci tests is green

could be implemented as a cronjob

Create a base image for API services

As we are using swagger to document all the endpoints, it could be good to consider creating an own s2i image for API services. The base image would be based on Python3's s2i and would look for swagger.yaml that will act like an entrypoint. The swagger definition (in case of connexion) states where the implementation sits so the base image can directly load swagger.yaml and run the server. All the API server related configuration (setting up logging, setting up listing endpoints, liveness/readiness probes) can be abstracted away to the base image.

Create roles for analyzer service account

Is your feature request related to a problem? Please describe.

As of now, we assign cluster admin to the analyzer service account. It would be better to create a fine-granted access control for this service account.

Describe the solution you'd like

Have roles defined on deployment (from top of my head, some might be missing):

  • create jobs (later CRDs) in middletier, backend namespace
  • create configmap (later CRDs) in middletier, backend namespace
  • list templates in infra namespace

Describe alternatives you've considered

Stick with the current solution - having a edit cluster role.

Additional context

https://kubernetes.io/docs/reference/access-authn-authz/rbac/

the following tasks could be moved into their own roles....

# TODO the following tasks could be moved into their own roles....
- name: "create Graph Sync Scheduler ImageStream"
shell: oc process --namespace "{{ infra_namespace }}" "{{ item }}" | oc apply --namespace "{{ frontend_namespace }}" -f -
with_items:
- "graph-sync-scheduler-imagestream"


This issue was generated by todo based on a TODO comment in 5302fe4 when #199 was merged. cc @goern.

review BuildConfig, ImageStreams, CronJobs to use same nameing scheme

Some CronJobs and their ImageStreams and BuildConfig use different naming schemata for the Images.

fridolin btw we should design naming in cronjobs - sometimes there are -cronjobs sometimes -jobs suffixes
goern ja, I know stumbled about that a few times last week I guess the image should be -job as we can run it manually or via cronjob the BC should be -job, the CronJob without any postfix

use JanusGraph Service

Is your feature request related to a problem? Please describe.
We want JanusGraph to be exposed as a Service, so that all components could use it via ENV or DNS. At the same time it should be transparent if the service is deployed on the same project or is in external service.

Describe the solution you'd like
JanusGraph service in thoth project

Describe alternatives you've considered
hard coded use of janusgraph:80 -> to be removed

Additional context
n/a

Introduce multiple configmaps respecting semantics

As of now we have one large configmap. We should create multiple configmaps respecting their semantics and use envFrom directive to insert all the configuration options from the given configmap. This way, if we change something for example in logging (e.g. introduce new functionality), we do not need to manually change each and every openshift template.

Candidates are:

  • thoth-logging - all the related configuration to logging (rsyslog host/port, sentry, elk host...)
  • thoth-ceph - configuration related to ceph (S3 adapter)
  • thoth-namespaces - name of namespaces (there is a possibility to introduce new ones in the future)

References:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.