GithubHelp home page GithubHelp logo

azimuth's Issues

UI: server active tasks means table is too wide

when you reboot servers, the table width seems all wrong, and on chrome you don't get scroll bars or any obvious clues, just get buttons hiding on the right hand side.

Maybe we need to fix the width of something better here?

Enable per appliance filtering or other specification of applicable flavours

It would be most useful to have the capability of filtering the list of flavours, or otherwise specifying a list of applicable flavours, per appliance type. A trivial examples:

  • Present only "Intel" flavours in the dropdown for appliances dedicated to Intel workloads
  • Present only flavours with RAM > 128 GB for appliances requiring large allocations

Update slurm with changed number of compute nodes doesn't deploy additional node

  • Create a cluster with 1x compute node
  • Press Update and add another compute node.
  • No additional node was created. The stackhpc.terraform.infra : Provision infrastructure using Terraform step was an "OK" (not changed).
  • From K8s:
$ kubectl -n az-rcp-cloud-portal-demo describe clusters.caas.azimuth.stackhpc.com slurm-v1

Status:
  Applied Extra Vars:
  ...
    compute_count:                      1
    ...

so looks like this didn't get updated.

This did used to work.

Azimuth can present FIPs for selection which aren't actually available

Selected an IP on Arcus when creating a platform with a FIP, creation failedwith a popup:
{"parameter_values":{"cluster_floating_ip":"External IP is not available."}}

Horizon showed the FIP as Active but not mapped to a fixed IP. However the CLI showed it was mapped to a port on the ilab-60 network which was not owned by the project being used.

Some errors make the screen go blank, need to be more defensive?

For example, if you use the CRD driver, you can delete a cluster type, but still have a cluster of that type, and the whole screen goes blank. There are some react error messages suggesting we are missing some error handlers to pop up a more friendly error message, so we fail more gracefully. These are mostly unexpected cases, like the choices param not being well formed in ui meta causing the screen to go blank.

Feature: event handler for irreconcilable (k8s) clusters

Description

There are situations where a kubernetes cluster administrator may use the UI to
create / update a cluster that is beyond the scope of the underlying cloud
infrastructure. Such a request may be considered "irreconcilable". Like certain
types of romantic poetry, it is full of subtle longing for things that will
likely never come to pass.
One such situation is where the user submits a request to update the cluster,
and in doing so specifies a set of resources that exceed the given quota for the
tenancy where the cluster is deployed. After such a request has been issued, the
update button for that cluster becomes unusable and the only ways to get back to
a good state is to resolve the issue on the openstack side, delete the
cluster, or intervene using the azimuth cli.
Note that the k8s cluster under management and the applications it runs remain
functional during this blip.

Desired behaviour

The user has the option to roll back to a previously valid cluster configuration after a timeout.
For instance, after the timeout has been exceeded, the user would be presented with a modal
that gives a choice of either to continue to wait or to revert the cluster to the last known good state.

One possible Implementation suggestion

  • Add a representation for "previously good k8s cluster form state" to the store
  • Also add a state to represent "irreconcilable k8s cluster"
  • Modify KubernetesClusterModalForm and useKubernetesClusterFormState to
    use a "copy-on-write" style pattern to store the result of
    initialState(kubernetesCluster) before attempting the update.
  • In the case that we enter the "irreconcilable k8s cluster", allow the user to
    re-apply the stored "previously good k8s cluster form state"

Nunjucks templated values don't update in UI

Appliance properties displayed in the UI via nunjucks templates don't update when the appliance is updated. For example, we allow the user to update the number of IPUs their machine has available. If the user changes the number, and updates the appliance, the new IPU count isn't displayed in the UI (the IPU count is also populated into the instance metadata, and is correctly updated there).

There appears to be nothing the user can do to force an update: refreshing the details pane doesn't help; refreshing the main dashboard page doesn't help; logging out and back in doesn't help.

Advance create platform dialog as soon as platform has been selected

We've had a question from our Azimuth users whether, when an appliance has been selected in the "Create a new platform / Pick a platform type" dialog, can it automatically progress to the "Configure platform" page? This would be rather than the current workflow that requires clicking the "Next" button. Thanks

Create cluster failed in 500

FO] [2023-02-23 14:10:37,613] [azimuth.cluster_engine.drivers.awx.driver:108] [ThreadPoolExecutor-0_1] [[email protected]] [[email protected]] Found 2 inventories

  | Log labelsappazimuthcomponentapicontainerapifilename/var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.loginstanceazimuthjobazimuth/azimuthnamespaceazimuthnode_nameazimuth-cl1-md-0-5e2cfe39-sbz78podazimuth-api-665cccdc4b-m2c7zstreamstdoutDetected fieldsTime1677161437613tsNs1677161437613671193 | Log labels |   | app | azimuth |   | component | api |   | container | api |   | filename | /var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.log |   | instance | azimuth |   | job | azimuth/azimuth |   | namespace | azimuth |   | node_name | azimuth-cl1-md-0-5e2cfe39-sbz78 |   | pod | azimuth-api-665cccdc4b-m2c7z |   | stream | stdout | Detected fields |   | Time | 1677161437613 |   | tsNs | 1677161437613671193
Log labels
  | app | azimuth
  | component | api
  | container | api
  | filename | /var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.log
  | instance | azimuth
  | job | azimuth/azimuth
  | namespace | azimuth
  | node_name | azimuth-cl1-md-0-5e2cfe39-sbz78
  | pod | azimuth-api-665cccdc4b-m2c7z
  | stream | stdout
Detected fields
  | Time | 1677161437613
  | tsNs | 1677161437613671193
  |   | 2023-02-23 14:10:37 | 172.21.36.128 - - [23/Feb/2023:14:10:37 +0000] "POST /api/tenancies/3a7dd6b6832a4dc2bf0d1cf3784f943b/clusters/ HTTP/1.1" 500 145 "https://portal.apps.gbnwp-cl1.ipu.graphcore.ai/tenancies/3a7dd6b6832a4dc2bf0d1cf3784f943b/platforms" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0"
  |   | 2023-02-23 14:10:37 | KeyError: 'cluster_name'
  |   | 2023-02-23 14:10:37 | name = params.pop("cluster_name")
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/cluster_engine/drivers/awx/driver.py", line 383, in _from_inventory
  |   | 2023-02-23 14:10:37 | _ = self._from_inventory(inventory, ctx)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/cluster_engine/drivers/awx/driver.py", line 550, in create_cluster
  |   | 2023-02-23 14:10:37 | return f(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/cluster_engine/drivers/awx/driver.py", line 37, in wrapper
  |   | 2023-02-23 14:10:37 | cluster = self._driver.create_cluster(
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/cluster_engine/engine.py", line 280, in create_cluster
  |   | 2023-02-23 14:10:37 | cluster = cluster_manager.create_cluster(
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/views.py", line 971, in clusters
  |   | 2023-02-23 14:10:37 | return view(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/views.py", line 61, in wrapper
  |   | 2023-02-23 14:10:37 | return view(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/views.py", line 113, in wrapper
  |   | 2023-02-23 14:10:37 | return view(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/views.py", line 183, in wrapper
  |   | 2023-02-23 14:10:37 | return view(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/application/azimuth/views.py", line 141, in wrapper
  |   | 2023-02-23 14:10:37 | return func(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/rest_framework/decorators.py", line 50, in handler
  |   | 2023-02-23 14:10:37 | response = handler(request, *args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
  |   | 2023-02-23 14:10:37 | raise exc
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
  |   | 2023-02-23 14:10:37 | self.raise_uncaught_exception(exc)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
  |   | 2023-02-23 14:10:37 | response = self.handle_exception(exc)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
  |   | 2023-02-23 14:10:37 | return self.dispatch(request, *args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/django/views/generic/base.py", line 84, in view
  |   | 2023-02-23 14:10:37 | return view_func(*args, **kwargs)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
  |   | 2023-02-23 14:10:37 | response = wrapped_callback(request, *callback_args, **callback_kwargs)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
  |   | 2023-02-23 14:10:37 | response = get_response(request)
  |   | 2023-02-23 14:10:37 | File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 55, in inner
  |   | 2023-02-23 14:10:37 | Traceback (most recent call last):
  |   | 2023-02-23 14:10:37 | [ERROR] [2023-02-23 14:10:37,106] [django.request:241] [ThreadPoolExecutor-0_0] Internal Server Error: /api/tenancies/3a7dd6b6832a4dc2bf0d1cf3784f943b/clusters/
  | Log labelsappazimuthcomponentapicontainerapifilename/var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.loginstanceazimuthjobazimuth/azimuthnamespaceazimuthnode_nameazimuth-cl1-md-0-5e2cfe39-sbz78podazimuth-api-665cccdc4b-m2c7zstreamstderrDetected fieldsTime1677161437137tsNs1677161437137744717 | Log labels |   | app | azimuth |   | component | api |   | container | api |   | filename | /var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.log |   | instance | azimuth |   | job | azimuth/azimuth |   | namespace | azimuth |   | node_name | azimuth-cl1-md-0-5e2cfe39-sbz78 |   | pod | azimuth-api-665cccdc4b-m2c7z |   | stream | stderr | Detected fields |   | Time | 1677161437137 |   | tsNs | 1677161437137744717
Log labels
  | app | azimuth
  | component | api
  | container | api
  | filename | /var/log/pods/azimuth_azimuth-api-665cccdc4b-m2c7z_c6dc809a-f1b3-4400-9e0a-5a9c276cc828/api/0.log
  | instance | azimuth
  | job | azimuth/azimuth
  | namespace | azimuth
  | node_name | azimuth-cl1-md-0-5e2cfe39-sbz78
  | pod | azimuth-api-665cccdc4b-m2c7z
  | stream | stderr
Detected fields
  | Time | 1677161437137
  | tsNs | 1677161437137744717
  |   | 2023-02-23 14:10:37 | 172.18.120.192 - - [23/Feb/2023:14:10:37 +0000] "GET /api/tenancies/4f19cf5df697497e9222505361cf75b8/quotas/ HTTP/1.1" 200 370 "https://portal.apps.gbnwp-cl1.ipu.graphcore.ai/tenancies/4f19cf5df697497e9222505361cf75b8/platforms" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.15"

Contributors guide?

Hey folks. Are there any contribution guidelines and/or Azimuth development getting started docs?

Servers with multiple IPs only show up with one internal IP

While we don't support creating servers with two networks attached, when you see servers created outside of azimuth, they only see to show as having one IP. It might be nice to show a list of IPs in that case? Total edge case though.

Helm chart fails when adding awx with helm upgrade

If you don't read the bit about adding clusters until quite late, you end up trying to add awx as a helm updated. When you do this happens:
Error: UPGRADE FAILED: unable to recognize "": no matches for kind "AWX" in version "awx.ansible.com/v1beta1"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.