Comments (3)
Did you see this behavior with CentOS7 ?
Can you please add a yaml output of one of the machines ? There should be more info than just "Failed".
from metal3-dev-env.
This is independent of the OS. Here are the machines in a failed CI run :
apiVersion: v1
items:
- apiVersion: cluster.x-k8s.io/v1alpha3
kind: Machine
metadata:
creationTimestamp: "2020-03-22T08:26:40Z"
finalizers:
- machine.cluster.x-k8s.io
generation: 3
labels:
cluster.x-k8s.io/cluster-name: test1
cluster.x-k8s.io/control-plane: ""
kubeadm.controlplane.cluster.x-k8s.io/hash: "3288331389"
name: test1-controlplane-lpnxq
namespace: metal3
ownerReferences:
- apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
blockOwnerDeletion: true
controller: true
kind: KubeadmControlPlane
name: test1-controlplane
uid: 834ec5bd-3cbf-4b87-9ccc-f8b5b07140d4
resourceVersion: "6836"
selfLink: /apis/cluster.x-k8s.io/v1alpha3/namespaces/metal3/machines/test1-controlplane-lpnxq
uid: 6be1adb7-c185-4779-be4c-d04b0e396f2e
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfig
name: test1-controlplane-bzpmb
namespace: metal3
uid: 5c3bbc07-b8f3-4ccb-98f5-be299edf226e
dataSecretName: test1-controlplane-bzpmb
clusterName: test1
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: Metal3Machine
name: test1-controlplane-jfnpf
namespace: metal3
uid: 1cc69570-8c3d-4fbd-a8d6-c7d25118cb31
providerID: metal3://d99b9c2b-5aee-45e6-9249-e7b42f7ccc76
version: v1.17.0
status:
addresses:
- address: 192.168.111.21
type: InternalIP
- address: 172.22.0.91
type: InternalIP
- address: node-1
type: Hostname
- address: node-1
type: InternalDNS
bootstrapReady: true
failureMessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1alpha3,
Kind=Metal3Machine with name "test1-controlplane-jfnpf": Failed to associate
the BaremetalHost to the Metal3Machine'
failureReason: CreateError
infrastructureReady: true
lastUpdated: "2020-03-22T08:34:41Z"
nodeRef:
name: node-1
uid: 49aa2fb7-b645-4da3-9600-64e31074bf81
phase: Failed
- apiVersion: cluster.x-k8s.io/v1alpha3
kind: Machine
metadata:
creationTimestamp: "2020-03-22T08:26:21Z"
finalizers:
- machine.cluster.x-k8s.io
generateName: test1-md-0-78d55dc456-
generation: 3
labels:
cluster.x-k8s.io/cluster-name: test1
machine-template-hash: "3481187012"
nodepool: nodepool-0
name: test1-md-0-78d55dc456-5b86v
namespace: metal3
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1alpha3
blockOwnerDeletion: true
controller: true
kind: MachineSet
name: test1-md-0-78d55dc456
uid: 7b2e3dbc-8af2-4f43-8945-6968bc12bd76
resourceVersion: "9982"
selfLink: /apis/cluster.x-k8s.io/v1alpha3/namespaces/metal3/machines/test1-md-0-78d55dc456-5b86v
uid: 3826ad1f-b1ed-4504-b265-2dde934055e9
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfig
name: test1-md-0-5hhsj
namespace: metal3
uid: 2d396de3-c3f4-45cc-baff-d915ebcfb20d
dataSecretName: test1-md-0-5hhsj
clusterName: test1
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: Metal3Machine
name: test1-md-0-hr5jw
namespace: metal3
uid: 3477a996-d531-4901-a20e-bd9c93b1a8e6
providerID: metal3://6fc41e7b-cc2b-4a06-a9d4-883c563d0c47
version: v1.17.0
status:
addresses:
- address: 172.22.0.87
type: InternalIP
- address: 192.168.111.20
type: InternalIP
- address: node-0
type: Hostname
- address: node-0
type: InternalDNS
bootstrapReady: true
failureMessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1alpha3,
Kind=Metal3Machine with name "test1-md-0-hr5jw": Failed to associate the BaremetalHost
to the Metal3Machine'
failureReason: CreateError
infrastructureReady: true
lastUpdated: "2020-03-22T08:45:24Z"
nodeRef:
name: node-0
uid: 8d146172-c593-455e-bf3c-c5c1dcb1bb73
phase: Failed
kind: List
metadata:
resourceVersion: ""
selfLink: ""
The problem is that CAPM3 reports an error when it fails to associate the machine (i.e. requeues) and CAPI picks it up, but does not pick up the change back to normal state. CAPM3 needs to be modified to not set the Metal3Machine to error when it is a transient error.
This will be tackled in metal3-io/cluster-api-provider-metal3#30
/close
from metal3-dev-env.
@maelk: Closing this issue.
In response to this:
This is independent of the OS. Here are the machines in a failed CI run :
apiVersion: v1 items: - apiVersion: cluster.x-k8s.io/v1alpha3 kind: Machine metadata: creationTimestamp: "2020-03-22T08:26:40Z" finalizers: - machine.cluster.x-k8s.io generation: 3 labels: cluster.x-k8s.io/cluster-name: test1 cluster.x-k8s.io/control-plane: "" kubeadm.controlplane.cluster.x-k8s.io/hash: "3288331389" name: test1-controlplane-lpnxq namespace: metal3 ownerReferences: - apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 blockOwnerDeletion: true controller: true kind: KubeadmControlPlane name: test1-controlplane uid: 834ec5bd-3cbf-4b87-9ccc-f8b5b07140d4 resourceVersion: "6836" selfLink: /apis/cluster.x-k8s.io/v1alpha3/namespaces/metal3/machines/test1-controlplane-lpnxq uid: 6be1adb7-c185-4779-be4c-d04b0e396f2e spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfig name: test1-controlplane-bzpmb namespace: metal3 uid: 5c3bbc07-b8f3-4ccb-98f5-be299edf226e dataSecretName: test1-controlplane-bzpmb clusterName: test1 infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: Metal3Machine name: test1-controlplane-jfnpf namespace: metal3 uid: 1cc69570-8c3d-4fbd-a8d6-c7d25118cb31 providerID: metal3://d99b9c2b-5aee-45e6-9249-e7b42f7ccc76 version: v1.17.0 status: addresses: - address: 192.168.111.21 type: InternalIP - address: 172.22.0.91 type: InternalIP - address: node-1 type: Hostname - address: node-1 type: InternalDNS bootstrapReady: true failureMessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1alpha3, Kind=Metal3Machine with name "test1-controlplane-jfnpf": Failed to associate the BaremetalHost to the Metal3Machine' failureReason: CreateError infrastructureReady: true lastUpdated: "2020-03-22T08:34:41Z" nodeRef: name: node-1 uid: 49aa2fb7-b645-4da3-9600-64e31074bf81 phase: Failed - apiVersion: cluster.x-k8s.io/v1alpha3 kind: Machine metadata: creationTimestamp: "2020-03-22T08:26:21Z" finalizers: - machine.cluster.x-k8s.io generateName: test1-md-0-78d55dc456- generation: 3 labels: cluster.x-k8s.io/cluster-name: test1 machine-template-hash: "3481187012" nodepool: nodepool-0 name: test1-md-0-78d55dc456-5b86v namespace: metal3 ownerReferences: - apiVersion: cluster.x-k8s.io/v1alpha3 blockOwnerDeletion: true controller: true kind: MachineSet name: test1-md-0-78d55dc456 uid: 7b2e3dbc-8af2-4f43-8945-6968bc12bd76 resourceVersion: "9982" selfLink: /apis/cluster.x-k8s.io/v1alpha3/namespaces/metal3/machines/test1-md-0-78d55dc456-5b86v uid: 3826ad1f-b1ed-4504-b265-2dde934055e9 spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfig name: test1-md-0-5hhsj namespace: metal3 uid: 2d396de3-c3f4-45cc-baff-d915ebcfb20d dataSecretName: test1-md-0-5hhsj clusterName: test1 infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: Metal3Machine name: test1-md-0-hr5jw namespace: metal3 uid: 3477a996-d531-4901-a20e-bd9c93b1a8e6 providerID: metal3://6fc41e7b-cc2b-4a06-a9d4-883c563d0c47 version: v1.17.0 status: addresses: - address: 172.22.0.87 type: InternalIP - address: 192.168.111.20 type: InternalIP - address: node-0 type: Hostname - address: node-0 type: InternalDNS bootstrapReady: true failureMessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1alpha3, Kind=Metal3Machine with name "test1-md-0-hr5jw": Failed to associate the BaremetalHost to the Metal3Machine' failureReason: CreateError infrastructureReady: true lastUpdated: "2020-03-22T08:45:24Z" nodeRef: name: node-0 uid: 8d146172-c593-455e-bf3c-c5c1dcb1bb73 phase: Failed kind: List metadata: resourceVersion: "" selfLink: ""
The problem is that CAPM3 reports an error when it fails to associate the machine (i.e. requeues) and CAPI picks it up, but does not pick up the change back to normal state. CAPM3 needs to be modified to not set the Metal3Machine to error when it is a transient error.
This will be tackled in metal3-io/cluster-api-provider-metal3#30
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from metal3-dev-env.
Related Issues (20)
- Add pagination when requesting github API HOT 2
- Failed to install the dev environment due to a timeout in one of the kubectl commands HOT 3
- draft HOT 3
- Remove ansible feature tests HOT 4
- minikube delete hangs after bumping to v1.31.1 HOT 4
- lib/common.sh has passwordless sudo check HOT 8
- RegistrationError, Failed to get power state for node Error: IPMI call failed: power status. HOT 11
- [Flake] Deprovisioning failure in release-1.3 CentOS CI job HOT 3
- Properly manage modular libvirtd on rhel-9 and cs-9 HOT 5
- Error installing Ubuntu packages in case Docker repository already setup HOT 2
- Enhancement suggestion for image prepull HOT 4
- Prototype a new CLI bmhvm HOT 4
- Use GOPROXY to get latest releases instead of git APIs HOT 2
- "make" doesn't check if all containers are Ready HOT 1
- Applying BMH Should Be Part of the Test Setup HOT 1
- Use minikube in Ubuntu based tests HOT 1
- Add periodic tests for local ironic scenario HOT 4
- Creation of VMs is never checked or waited for HOT 1
- Full stack build fails after installing ansible in virtualenv HOT 3
- Add support for Ubuntu 24.04 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metal3-dev-env.