GithubHelp home page GithubHelp logo

sapcc / openstack-nannies Goto Github PK

View Code? Open in Web Editor NEW
8.0 40.0 6.0 872 KB

nannies to keep nova, cinder and the vcenter healthy

License: Apache License 2.0

Shell 8.68% Python 91.19% Makefile 0.13%

openstack-nannies's Introduction

Openstack Nannies

This repository contains so called nannies, which take care of keeping nova, cinder and the vcenter clean and healthy. They contain a growing number of jobs, which find and cleanup inconsistencies in the nova, manila and cinder db, sync quota values, spot orphaned resources and so on. They are still in their early stages and some of the more disruptive functionalities are still in some kind of "reporting only" mode so far.

the provided Dockerfiles to build the corresponding containers contain a placeholder for the image they are based on - the placeholders will have to be replaced by a corresponding image. the below list is what we are using:

openstack-nannies's People

Contributors

carthaca avatar chuan137 avatar chuangowork avatar dependabot[bot] avatar developer-abhi avatar galkindmitrii avatar grandchild avatar joker-at-work avatar kpawar-sap avatar kuckkuck avatar majewsky avatar mariusleu avatar mblo avatar notandy avatar rajivmucheli avatar thgrs avatar viennaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openstack-nannies's Issues

[manila] manila-share-sync share instances should respect replicas

shares.join(instances, shares.c.id == instances.c.share_id))\
joins share instances on share_id and implicitly assumes this is a 1-1 relation, but a share can have multiple share instances, e.g. in a replication setup one instance exists with replica_state active and one or more instances may exist with replica_state not active [in_sync, out_of_sync ..] which could lead to surprises on the returned share_instance_id

[manila] api limit 1000

The api returns a max of 1000 entities. We have places where we list objects for all_projects and may hit that limit.

e.g.

I suggest to solve by pagination (sort + offset params) or by directly querying the database instead of API.

[manila] inconsistency cleanup should avoid working on recent items

One example:

The nanny is cleaning up inconsistencies.. But apparently a bit too eager ๐Ÿ˜„ We got a race condition, chances are quite low, but we got (un-)lucky.
What happened:
Customer deletes share network ->manila does cascading delete of share servers -> manila deletes all related network allocations aka neutron ports.
The last step fails, the network allocation cannot be found in manila, because it had been deleted before.
Reason: At the same time the hourly manila db consistency check runs and sees network allocations where the corresponding share server has been (soft-)deleted and (soft-)deletes those.

Proposed fix:
Only touch inconsistencies of items that have been deleted for at least 2 hours.

[manila] snapshots: renew client

copy the snippet to the snapshot nanny:

# TODO: implement proper re-auth after token lifetime ended
# Need to recreate manila client each run, because of session timeout
self.renew_manila_client()

otherwise if the token times out, the nanny stops working until the next container restart

and: check if other scripts needs the same workaround

[manila] fix snapshot instance inconsistencies

share snapshot instances can get into the situation that they have a null value in share_instance_id column, which is inconsistent and leads to internal server errors

current workaround is below sql

update share_snapshot_instances as ssi \
join share_snapshots as ss on ssi.snapshot_id=ss.id \
join share_instances as si on si.share_id=ss.share_id \
set share_instance_id=si.id \
where ssi.share_instance_id is NULL and ssi.deleted='False';

add this to the inconsistency nanny

bonus points: find the cause - it occurred quite recently, so it may be related to sapcc/manila#114

[manila] share sync TASK_SHARE_STATE does not respect some 'creating' shares

Looking at the following example of a share that is pending in 'creating':

$ manila share-instance-show <share_instance_id>
+------------------------+--------------------------------------+
| Property               | Value                                |
+------------------------+--------------------------------------+
| availability_zone      | None                                 |
| created_at             | 2022-11-24T07:57:53.591416           |
| updated_at             | None                                 |
| host                   |                                      |
| status                 | creating                             |
| share_server_id        | None                                 |
| access_rules_status    | active                               |
| replica_state          | None                                 |
| progress               | None                                 |
| export_locations       | []                                   |
...
+------------------------+--------------------------------------+

[manila] share sync - sync_share_size: spam on shares in state != 'available'

Rework 04029b4
changed _query_shares to include all shares instead of shares with state 'available'.

Therefore all shares are watched for size changes, but the fixing method set_share_size is only operating on available shares.
Such shares having size differences are logged forever, but not corrected. Log spam would only end if externally the size/state would be fixed.

vm_balance_nanny* metrics not initialized

Hi, most of the metrics on

mymetrics.set_metrics('vm_balance_nanny_host_size_bytes', 'des:vm_balance_nanny_host_size_bytes', ['nodename'])
mymetrics.set_metrics('vm_balance_nanny_host_size_consume_all_vm_bytes',
'des:vm_balance_nanny_host_size_consume_all_vm_bytes', ['nodename'])
mymetrics.set_metrics('vm_balance_nanny_host_size_consume_big_vm_bytes',
'des:vm_balance_nanny_host_size_consume_big_vm_bytes', ['nodename'])
mymetrics.set_metrics('vm_balance_nanny_suggestion_bytes', 'des:vm_balance_nanny_suggestion_bytes',
['source_node', 'target_node', 'big_vm_name', 'big_vm_size'])
mymetrics.set_metrics('vm_balance_nanny_manual_suggestion_bytes', 'des:vm_balance_nanny_manual_suggestion_bytes',
['source_node', 'target_node', 'big_vm_name', 'big_vm_size'])
mymetrics.set_metrics('vm_balance_building_block_consume_all_vm_bytes',
'des:vm_balance_building_block_consume_all_vm_bytes',
['Building_block'])
mymetrics.set_metrics('vm_balance_building_block_consume_big_vm_bytes',
'des:vm_balance_building_block_consume_big_vm_bytes',
['Building_block'])
mymetrics.set_metrics('vm_balance_building_block_total_size_bytes',
'des:vm_balance_building_block_total_siz_bytes',
['Building_block'])
mymetrics.set_metrics('vm_balance_error_count','des:vm_balance_error_count',['error_type'])
mymetrics.set_metrics('vm_balance_too_full_building_block','des:vm_balance_too_full_building_block', ['consolidated_needed'])
only are present if the nanny was working on something or if an error occurred.

E.g. if big_vm_to_move_list is empty

if len(big_vm_to_move_list) > 0:
, vm_balance_error_count will not be present, but I would expect it to be there with value '0'

[manila] reset reserved quota

Reserved quota is set when share is being created or deleted. It's value should be reset when the process finishes. If the process were broken, the values may stay non-zero. The nanny should reset the values. Check the -ing shares while reset the values.

[nova] nanny fails reading the nova config

After Nova was upgraded to Xena release the nova-nanny fails when reading the nova configuration. Either with error 2023-05-23 06:56:54,498 ERROR: Check Nova configuration file. or with sed: can't read /etc/nova/nova.conf.d/db.conf: No such file or directory

[manila] snapshot reset state error

2023-06-21 07:15:26,027 share_snapshot_instance_reset_state(snapshot_instance_id=<ID>, state=error): API version '2.7' is not supported on 'manilaclient.v2.share_snapshot_instances.ShareSnapshotInstanceManager.reset_state' method.
Traceback (most recent call last):
  File "/scripts/manilananny.py", line 112, in share_snapshot_instance_reset_state
    self.manilaclient.share_snapshot_instances.reset_state(snapshot_instance_id, state)
  File "/var/lib/openstack/lib/python3.8/site-packages/manilaclient/api_versions.py", line 390, in substitution
    raise exceptions.UnsupportedVersion(
manilaclient.common.apiclient.exceptions.UnsupportedVersion: API version '2.7' is not supported on 'manilaclient.v2.share_snapshot_instances.ShareSnapshotInstanceManager.reset_state' method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.