GithubHelp home page GithubHelp logo

f5networks / f5-cloud-failover-extension Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 2.0 24.62 MB

F5 Cloud Failover Extension

License: Apache License 2.0

Dockerfile 0.03% Makefile 1.36% HTML 0.25% CSS 0.01% Batchfile 0.99% Shell 0.85% JavaScript 96.49% Mermaid 0.03%

f5-cloud-failover-extension's People

Contributors

andreykashcheev avatar crosbygw avatar f5-gasingh avatar f5debbie avatar garrettdieckmann avatar jsevedge avatar mikeshimkus avatar nmenant avatar tuanf5 avatar tuann15 avatar vtrippel avatar wduongf5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

f5-cloud-failover-extension's Issues

A POST on /trigger with the body {"action": "execute"} does not trigger CFE or DSC failover operation

Abstract

A POST on /trigger with the body {"action": "execute"} does not trigger failover

description

when you Post a { "action": "execute" } on https://{{bigip1}}/mgmt/shared/cloud-failover/trigger, CFE return a "message": "Failover Complete" but no failover operation happens and no action on DSC.

{
"taskState": "SUCCEEDED",
"message": "Failover Complete",
"timestamp": "2020-06-04T15:04:10.306Z",
"instance": "AL-P2-F51-test.com",
"failoverOperations": {
"addresses": {},
"routes": {}
}
}

the only action that triggers CFE is a DSC failover (via GUI, CLI or iControl API)

ASK

Please clarify behavior of POST /trigger function or fix

severity

3

documentation azure - manual installation procedure - gateway for metadata service

Description

I would like to get confirmation on the following steps in Azure manual procedure for CFE installation.
https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/azure.html#using-tmsh

the documentation says:
"to configure the route on BIG-IP to talk to Azure’s Instance Metadata Services, use either of the following commands:"
it does not say one has to adapt the gateway listed in the TMSH command to his own environment.
the gateway in the documentation is 192.0.2.1
is this change required or not? or does one have to copy the tmsh command as it is?

I tried twice this configuration and even though my templates had the DCHP routes I had to add that specific route for the metadata service. using it with the gateway mentioned in the documentation - strictly following the documentation - gave me the ECONNREFUSED error in restnoded.log (which is by the way not easily readable: does it mean the service was reached but connection was refused or that the service was not reached?) changing the gateway to my own vnet appropriate gateway did the trick.

please confirm and update the documentation accordingly if needed.

Thanks
best regards

GCE API based failover template issue

The GCE API-based failover templates refer to a link to download the failover plugin via the github releases, whereas the rest of the downloads occur from cdn.f5.com; this makes it tough for locked down internet access to be opened up.

'curl -s -f --retry 20 -o /config/cloud/f5-cloud-libs.tar.gz https://cdn.f5.com/product/cloudsolutions/f5-cloud-libs/v4.22.0/f5-cloud-libs.tar.gz', 'curl -s -f --retry 20 -o /config/cloud/f5-cloud-libs-gce.tar.gz https://cdn.f5.com/product/cloudsolutions/f5-cloud-libs-gce/v2.6.0/f5-cloud-libs-gce.tar.gz', 'curl -s -f --retry 20 -o /config/cloud/f5-appsvcs-3.20.0-3.noarch.rpm https://cdn.f5.com/product/cloudsolutions/f5-appsvcs-extension/v3.20.0/f5-appsvcs-3.20.0-3.noarch.rpm', 'curl -s -f -L --retry 20 -o /config/cloud/f5-cloud-failover-1.4.0-0.noarch.rpm https://github.com/F5Networks/f5-cloud-failover-extension/releases/download/v1.4.0/f5-cloud-failover-1.4.0-0.noarch.rpm',

The script has a -L on the last "curl" for a reason, to follow the 302 from github to wherever that may lead - 2 problems with that:

  1. If someone were to allow access to all of github to even get the link to follow; that's pretty wide open and
  2. The location header can change: sometimes github serves from its' own domain, sometimes from an S3 bucket, likely sometimes somewhere else.

When these scripts fail to download one of the files during the initial boot; they get stuck in a neverending loop.

Any chance of getting the release tarballs published to the F5 CDN?

Cannot read property name of undefined fwdRules.forEach

Hitting an error when performing failover in GCP. I stood up my environment with terraform, proper labels, pre-reqs, etc. I installed CFE and the POST declaration is 200 OK. I perform a failover, but items are not moving. Routes do not move, alias IPs do not move, and forwarding rules are not moving. It appears from the restnoded logs that the forwarding rule logic is causing issues.

specific error from unit 1...

Sun, 14 Jun 2020 18:39:43 GMT - severe: [f5-cloud-failover] Cannot read property 'name' of undefined TypeError: Cannot read property 'name' of undefined
    at fwdRules.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:827:80)
    at Array.forEach (<anonymous>)
    at Promise.all.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:808:31)
    at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
    at PromiseArray._resolve (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:126:19)
    at PromiseArray._promiseFulfilled (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:144:14)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:574:26)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)

Full Logs

unit 1 during failover to become active...

Sun, 14 Jun 2020 18:39:40 GMT - fine: [f5-cloud-failover] HTTP Request - POST /trigger
Sun, 14 Jun 2020 18:39:40 GMT - fine: [f5-cloud-failover] Performing failover - initialization
Sun, 14 Jun 2020 18:39:40 GMT - fine: [f5-cloud-failover] config: {"class":"Cloud_Failover","environment":"gcp","externalStorage":{"scopingTags":{"f5_cloud_failover_label":"mydeployment"}},"failoverAddresses":{"enabled":true,"scopingTags":{"f5_cloud_failover_label":"mydeployment"}},"failoverRoutes":{"enabled":true,"scopingTags":{"f5_cloud_failover_label":"mydeployment"},"scopingAddressRanges":[{"range":"0.0.0.0/0"}],"defaultNextHopAddresses":{"discoveryType":"static","items":["10.1.10.31","10.1.10.30"]}},"controls":{"class":"Controls","logLevel":"silly"},"schemaVersion":"1.3.0"}
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] bucket name: f5-bigip-storage-jchambers-failover bucket labels: [{"f5_cloud_failover_label":"jchambers-failover","goog-dm":"jchambers-failover"}]
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] bucket name: giroux77-storage bucket labels: [{"f5_cloud_failover_label":"mydeployment"}]
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] bucket name: jgiroux123 bucket labels: [{"f5_cloud_failover_label":"jg-f5-tf-ha"}]
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] deployment bucket name: giroux77-storage
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] Getting GCP resources
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] getFwdRules called with nextPageToken 
Sun, 14 Jun 2020 18:39:41 GMT - fine: [f5-cloud-failover] Cloud provider found vms: {"0":{"id":"2631417376537304696","creationTimestamp":"2020-06-14T10:38:31.698-07:00","name":"giroux5-f5vm01","tags":{"items":["appfw-giroux5","mgmtfw-giroux5"],"fingerprint":"EzXXUUv52l0="},"machineType":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/machineTypes/n1-standard-8","status":"RUNNING","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","canIpForward":true,"networkInterfaces":[{"network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/giroux77-net-ext","subnetwork":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/subnetworks/giroux77-subnet-ext","networkIP":"10.1.10.30","name":"nic0","accessConfigs":[{"type":"ONE_TO_ONE_NAT","name":"external-nat","natIP":"34.83.60.181","networkTier":"PREMIUM","kind":"compute#accessConfig"}],"fingerprint":"BeZpU0Gaeeg=","kind":"compute#networkInterface"},{"network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/giroux77-net-mgmt","subnetwork":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/subnetworks/giroux77-subnet-mgmt","networkIP":"10.1.1.30","name":"nic1","accessConfigs":[{"type":"ONE_TO_ONE_NAT","name":"external-nat","natIP":"34.82.153.142","networkTier":"PREMIUM","kind":"compute#accessConfig"}],"fingerprint":"NRO7HIAH7ug=","kind":"compute#networkInterface"}],"disks":[{"type":"PERSISTENT","mode":"READ_WRITE","source":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/disks/giroux5-f5vm01","deviceName":"persistent-disk-0","index":0,"boot":true,"autoDelete":true,"licenses":["https://www.googleapis.com/compute/v1/projects/f5-7626-networks-public/global/licenses/f5-big-ip-adc-hourly-best-1gbps-updated"],"interface":"SCSI","diskSizeGb":"128","kind":"compute#attachedDisk"}],"metadata":{"fingerprint":"6syWP22fb64=","items":[{"key":"startup-script",
--snippet...lots of data from VM metadata script--
Sun, 14 Jun 2020 18:39:41 GMT - fine: [f5-cloud-failover] Cloud provider found fwdRules: {"0":{"id":"8978073301247843922","creationTimestamp":"2020-06-14T10:39:09.863-07:00","name":"giroux5-forwarding-rule","description":"","region":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1","IPAddress":"34.105.114.14","IPProtocol":"TCP","portRange":"1-65535","target":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/forwardingRules/giroux5-forwarding-rule","loadBalancingScheme":"EXTERNAL","networkTier":"PREMIUM","fingerprint":"7Val-dXL4oM=","kind":"compute#forwardingRule"}}
Sun, 14 Jun 2020 18:39:41 GMT - fine: [f5-cloud-failover] Cloud provider found targetInstances: {"0":{"id":"8868401732253947533","creationTimestamp":"2020-06-04T09:59:14.883-07:00","name":"bigip2-jchambers-failover-ti","description":"bigip2-jchambers-failover","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","natPolicy":"NO_NAT","instance":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/instances/bigip2-jchambers-failover","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/bigip2-jchambers-failover-ti","kind":"compute#targetInstance"},"1":{"id":"1074036512448568926","creationTimestamp":"2020-06-14T10:38:57.734-07:00","name":"giroux5-ti","description":"","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","natPolicy":"NO_NAT","instance":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/instances/giroux5-f5vm02","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","kind":"compute#targetInstance"}}
Sun, 14 Jun 2020 18:39:41 GMT - finest: [f5-cloud-failover] Cloud Provider initialization complete
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] Device initialization complete
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] Failover initialization complete
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] Download stateFile: {"taskState":"SUCCEEDED","message":"Failover Complete","timestamp":"2020-06-14T18:38:22.345Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] taskState: {"taskState":"SUCCEEDED","message":"Failover Complete","timestamp":"2020-06-14T18:38:22.345Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}
Sun, 14 Jun 2020 18:39:42 GMT - fine: [f5-cloud-failover] Address operations enabled?  true
Sun, 14 Jun 2020 18:39:42 GMT - fine: [f5-cloud-failover] Route operations enabled?  true
Sun, 14 Jun 2020 18:39:42 GMT - info: [f5-cloud-failover] Performing failover - execute
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] State file data:  {"taskState":"SUCCEEDED","message":"Failover Complete","timestamp":"2020-06-14T18:38:22.345Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] Data will be uploaded to f5cloudfailoverstate.json:  {"taskState":"RUNNING","message":"Failover running","timestamp":"2020-06-14T18:39:42.616Z","instance":"giroux5-f5vm01.c.f5-4136-mspteam-dev.internal","failoverOperations":{}}
Sun, 14 Jun 2020 18:39:42 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Sun, 14 Jun 2020 18:39:42 GMT - fine: [f5-cloud-failover] Getting failover addresses using selfAddresses  {"0":{"address":"10.1.10.30","trafficGroup":"/Common/traffic-group-local-only","trafficGroupMatch":false}}  and floatingAddresses  {"0":{"address":"10.1.10.105"},"1":{"address":"34.105.114.14"}}
Sun, 14 Jun 2020 18:39:42 GMT - fine: [f5-cloud-failover] Retrieved local addresses {"0":"10.1.10.30"}
Sun, 14 Jun 2020 18:39:42 GMT - fine: [f5-cloud-failover] Retrieved failover addresses  {"0":"10.1.10.105","1":"34.105.114.14"}
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] updateAddresses:  {"localAddresses":["10.1.10.30"],"failoverAddresses":["10.1.10.105","34.105.114.14"],"discoverOnly":true}
Sun, 14 Jun 2020 18:39:42 GMT - finest: [f5-cloud-failover] getRoutes called with nextPageToken 
Sun, 14 Jun 2020 18:39:43 GMT - fine: [f5-cloud-failover] Filtered Routes [object Object]
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] Discovered routes:  {"0":{"id":"1937764816610122535","creationTimestamp":"2020-06-14T11:00:40.677-07:00","name":"jg-route2-external","description":"f5_cloud_failover_labels={\"f5_cloud_failover_label\":\"mydeployment\"}","network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext","destRange":"0.0.0.0/0","priority":1000,"nextHopIp":"10.1.10.31","warnings":[{"code":"NEXT_HOP_ADDRESS_NOT_ASSIGNED","message":"Next hop ip address '10.1.10.31' is not assigned to the primary IP address of any instance on 'https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext'.  Please ensure that the address is assigned to the primary IP address of an instance on the route's network.","data":[{"key":"ip_address","value":"10.1.10.31"},{"key":"route_network","value":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext"}]}],"selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/routes/jg-route2-external","kind":"compute#route"}}
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] Next hop address: 10.1.10.30
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] Route to be updated {"id":"1937764816610122535","creationTimestamp":"2020-06-14T11:00:40.677-07:00","name":"jg-route2-external","description":"f5_cloud_failover_labels={\"f5_cloud_failover_label\":\"mydeployment\"}","network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext","destRange":"0.0.0.0/0","priority":1000,"nextHopIp":"10.1.10.31","warnings":[{"code":"NEXT_HOP_ADDRESS_NOT_ASSIGNED","message":"Next hop ip address '10.1.10.31' is not assigned to the primary IP address of any instance on 'https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext'.  Please ensure that the address is assigned to the primary IP address of an instance on the route's network.","data":[{"key":"ip_address","value":"10.1.10.31"},{"key":"route_network","value":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/jgiroux-net-ext"}]}],"selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/routes/jg-route2-external","kind":"compute#route"}
Sun, 14 Jun 2020 18:39:43 GMT - fine: [f5-cloud-failover] Failover addresses to discover {"0":"10.1.10.105","1":"34.105.114.14"}
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] VM name: giroux5-f5vm02
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] getFwdRules called with nextPageToken 
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] updateFwdRules matched rule: {"id":"8978073301247843922","creationTimestamp":"2020-06-14T10:39:09.863-07:00","name":"giroux5-forwarding-rule","description":"","region":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1","IPAddress":"34.105.114.14","IPProtocol":"TCP","portRange":"1-65535","target":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/forwardingRules/giroux5-forwarding-rule","loadBalancingScheme":"EXTERNAL","networkTier":"PREMIUM","fingerprint":"7Val-dXL4oM=","kind":"compute#forwardingRule"}
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] Discovered our target instance  undefined
Sun, 14 Jun 2020 18:39:43 GMT - severe: [f5-cloud-failover] Cannot read property 'name' of undefined TypeError: Cannot read property 'name' of undefined
    at fwdRules.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:827:80)
    at Array.forEach (<anonymous>)
    at Promise.all.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:808:31)
    at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
    at PromiseArray._resolve (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:126:19)
    at PromiseArray._promiseFulfilled (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:144:14)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:574:26)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)
Sun, 14 Jun 2020 18:39:43 GMT - finest: [f5-cloud-failover] Data will be uploaded to f5cloudfailoverstate.json:  {"taskState":"FAILED","message":"Failover failed because Cannot read property 'name' of undefined","timestamp":"2020-06-14T18:39:43.499Z","instance":"giroux5-f5vm01.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":null,"routes":null}}

unit 2 during failover to become active...

Sun, 14 Jun 2020 19:04:36 GMT - finest: socket 240 opened
Sun, 14 Jun 2020 19:04:36 GMT - fine: [f5-cloud-failover] HTTP Request - POST /trigger
Sun, 14 Jun 2020 19:04:36 GMT - fine: [f5-cloud-failover] Performing failover - initialization
Sun, 14 Jun 2020 19:04:36 GMT - fine: [f5-cloud-failover] config: {"class":"Cloud_Failover","environment":"gcp","externalStorage":{"scopingTags":{"f5_cloud_failover_label":"mydeployment"}},"failoverAddresses":{"enabled":true,"scopingTags":{"f5_cloud_failover_label":"mydeployment"}},"failoverRoutes":{"enabled":true,"scopingTags":{"f5_cloud_failover_label":"mydeployment"},"scopingAddressRanges":[{"range":"192.0.2.0/24"}],"defaultNextHopAddresses":{"discoveryType":"static","items":["10.1.10.31","10.1.10.30"]}},"controls":{"class":"Controls","logLevel":"silly"},"schemaVersion":"1.3.0"}
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] bucket name: f5-bigip-storage-jchambers-failover bucket labels: [{"f5_cloud_failover_label":"jchambers-failover","goog-dm":"jchambers-failover"}]
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] bucket name: giroux77-storage bucket labels: [{"f5_cloud_failover_label":"mydeployment"}]
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] bucket name: jgiroux123 bucket labels: [{"f5_cloud_failover_label":"jg-f5-tf-ha"}]
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] deployment bucket name: giroux77-storage
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] Getting GCP resources
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] getFwdRules called with nextPageToken 
Sun, 14 Jun 2020 19:04:37 GMT - fine: [f5-cloud-failover] Cloud provider found vms: {"0":{"id":"2631417376537304696","creationTimestamp":"2020-06-14T10:38:31.698-07:00","name":"giroux5-f5vm01","tags":{"items":["appfw-giroux5","mgmtfw-giroux5"],"fingerprint":"EzXXUUv52l0="},"machineType":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/machineTypes/n1-standard-8","status":"RUNNING","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","canIpForward":true,"networkInterfaces":[{"network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/giroux77-net-ext","subnetwork":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/subnetworks/giroux77-subnet-ext","networkIP":"10.1.10.30","name":"nic0","accessConfigs":[{"type":"ONE_TO_ONE_NAT","name":"external-nat","natIP":"34.83.60.181","networkTier":"PREMIUM","kind":"compute#accessConfig"}],"fingerprint":"BeZpU0Gaeeg=","kind":"compute#networkInterface"},{"network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/giroux77-net-mgmt","subnetwork":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/subnetworks/giroux77-subnet-mgmt","networkIP":"10.1.1.30","name":"nic1","accessConfigs":[{"type":"ONE_TO_ONE_NAT","name":"external-nat","natIP":"34.82.153.142","networkTier":"PREMIUM","kind":"compute#accessConfig"}],"fingerprint":"NRO7HIAH7ug=","kind":"compute#networkInterface"}],"disks":[{"type":"PERSISTENT","mode":"READ_WRITE","source":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/disks/giroux5-f5vm01","deviceName":"persistent-disk-0","index":0,"boot":true,"autoDelete":true,"licenses":["https://www.googleapis.com/compute/v1/projects/f5-7626-networks-public/global/licenses/f5-big-ip-adc-hourly-best-1gbps-updated"],"interface":"SCSI","diskSizeGb":"128","kind":"compute#attachedDisk"}],"metadata":{"fingerprint":"6syWP22fb64=","items":[{"key":"startup-script"
--snippet...lots of data from VM metadata script--
Sun, 14 Jun 2020 19:04:37 GMT - fine: [f5-cloud-failover] Cloud provider found fwdRules: {"0":{"id":"8978073301247843922","creationTimestamp":"2020-06-14T10:39:09.863-07:00","name":"giroux5-forwarding-rule","description":"","region":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1","IPAddress":"34.105.114.14","IPProtocol":"TCP","portRange":"1-65535","target":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/forwardingRules/giroux5-forwarding-rule","loadBalancingScheme":"EXTERNAL","networkTier":"PREMIUM","fingerprint":"7Val-dXL4oM=","kind":"compute#forwardingRule"}}
Sun, 14 Jun 2020 19:04:37 GMT - fine: [f5-cloud-failover] Cloud provider found targetInstances: {"0":{"id":"8868401732253947533","creationTimestamp":"2020-06-04T09:59:14.883-07:00","name":"bigip2-jchambers-failover-ti","description":"bigip2-jchambers-failover","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","natPolicy":"NO_NAT","instance":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/instances/bigip2-jchambers-failover","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/bigip2-jchambers-failover-ti","kind":"compute#targetInstance"},"1":{"id":"1074036512448568926","creationTimestamp":"2020-06-14T10:38:57.734-07:00","name":"giroux5-ti","description":"","zone":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b","natPolicy":"NO_NAT","instance":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/instances/giroux5-f5vm02","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","kind":"compute#targetInstance"}}
Sun, 14 Jun 2020 19:04:37 GMT - finest: [f5-cloud-failover] Cloud Provider initialization complete
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Device initialization complete
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Failover initialization complete
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Download stateFile: {"taskState":"SUCCEEDED","message":"Failover state file was reset","timestamp":"2020-06-14T19:04:28.230Z","instance":"none","failoverOperations":{}}
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] taskState: {"taskState":"SUCCEEDED","message":"Failover state file was reset","timestamp":"2020-06-14T19:04:28.230Z","instance":"none","failoverOperations":{}}
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Address operations enabled?  true
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Route operations enabled?  true
Sun, 14 Jun 2020 19:04:38 GMT - info: [f5-cloud-failover] Performing failover - execute
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] State file data:  {"taskState":"SUCCEEDED","message":"Failover state file was reset","timestamp":"2020-06-14T19:04:28.230Z","instance":"none","failoverOperations":{}}
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Data will be uploaded to f5cloudfailoverstate.json:  {"taskState":"RUNNING","message":"Failover running","timestamp":"2020-06-14T19:04:38.555Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{}}
Sun, 14 Jun 2020 19:04:38 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Getting failover addresses using selfAddresses  {"0":{"address":"10.1.10.31","trafficGroup":"/Common/traffic-group-local-only","trafficGroupMatch":false}}  and floatingAddresses  {"0":{"address":"10.1.10.105"},"1":{"address":"34.105.114.14"}}
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Retrieved local addresses {"0":"10.1.10.31"}
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Retrieved failover addresses  {"0":"10.1.10.105","1":"34.105.114.14"}
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] updateAddresses:  {"localAddresses":["10.1.10.31"],"failoverAddresses":["10.1.10.105","34.105.114.14"],"discoverOnly":true}
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] getRoutes called with nextPageToken 
Sun, 14 Jun 2020 19:04:38 GMT - fine: [f5-cloud-failover] Filtered Routes [object Object]
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Discovered routes:  {"0":{"id":"266466546359716558","creationTimestamp":"2020-06-14T12:01:53.239-07:00","name":"jg-route2-external","description":"f5_cloud_failover_labels={\"f5_cloud_failover_label\":\"mydeployment\"}","network":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/networks/giroux77-net-ext","destRange":"192.0.2.0/24","priority":1000,"nextHopIp":"10.1.10.31","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/global/routes/jg-route2-external","kind":"compute#route"}}
Sun, 14 Jun 2020 19:04:38 GMT - finest: [f5-cloud-failover] Next hop address: 10.1.10.31
Sun, 14 Jun 2020 19:04:39 GMT - fine: [f5-cloud-failover] Failover addresses to discover {"0":"10.1.10.105","1":"34.105.114.14"}
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] VM name: giroux5-f5vm01
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] getFwdRules called with nextPageToken 
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] updateFwdRules matched rule: {"id":"8978073301247843922","creationTimestamp":"2020-06-14T10:39:09.863-07:00","name":"giroux5-forwarding-rule","description":"","region":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1","IPAddress":"34.105.114.14","IPProtocol":"TCP","portRange":"1-65535","target":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/regions/us-west1/forwardingRules/giroux5-forwarding-rule","loadBalancingScheme":"EXTERNAL","networkTier":"PREMIUM","fingerprint":"7Val-dXL4oM=","kind":"compute#forwardingRule"}
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] Discovered our target instance  {"name":"giroux5-ti","selfLink":"https://www.googleapis.com/compute/v1/projects/f5-4136-mspteam-dev/zones/us-west1-b/targetInstances/giroux5-ti"}
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] fwdRulesToUpdate:  {}
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] Data will be uploaded to f5cloudfailoverstate.json:  {"taskState":"RUNNING","message":"Failover running","timestamp":"2020-06-14T19:04:39.458Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}
Sun, 14 Jun 2020 19:04:39 GMT - info: [f5-cloud-failover] Performing Failover - update
Sun, 14 Jun 2020 19:04:39 GMT - fine: [f5-cloud-failover] Address discovery: {"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}}
Sun, 14 Jun 2020 19:04:39 GMT - finest: [f5-cloud-failover] updateAddresses:  {"updateOperations":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}}}
Sun, 14 Jun 2020 19:04:39 GMT - fine: [f5-cloud-failover] Route discovery: {"operations":[]}
Sun, 14 Jun 2020 19:04:39 GMT - fine: [f5-cloud-failover] updateRoutes operations:  {}
Sun, 14 Jun 2020 19:04:39 GMT - info: [f5-cloud-failover] No route operations to run
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] _updateAddresses interface operations:  {"disassociate":[],"associate":[]}
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] _updateAddresses forwarding rules operations:  {"operations":[]}
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] updateAddresses disassociate operations:  {}
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] updateAddresses associate operations:  {}
Sun, 14 Jun 2020 19:04:40 GMT - info: [f5-cloud-failover] Disassociate NICs successful.
Sun, 14 Jun 2020 19:04:40 GMT - info: [f5-cloud-failover] Updated forwarding rules successfully
Sun, 14 Jun 2020 19:04:40 GMT - info: [f5-cloud-failover] Associate NICs successful.
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] Data will be uploaded to f5cloudfailoverstate.json:  {"taskState":"SUCCEEDED","message":"Failover Complete","timestamp":"2020-06-14T19:04:40.191Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}
Sun, 14 Jun 2020 19:04:40 GMT - info: [f5-cloud-failover] Failover Complete
Sun, 14 Jun 2020 19:04:40 GMT - finest: [f5-cloud-failover] Download stateFile: {"taskState":"SUCCEEDED","message":"Failover Complete","timestamp":"2020-06-14T19:04:40.191Z","instance":"giroux5-f5vm02.c.f5-4136-mspteam-dev.internal","failoverOperations":{"addresses":{"publicAddresses":{},"interfaces":{"disassociate":[],"associate":[]},"loadBalancerAddresses":{"operations":[]}},"routes":{"operations":[]}}}

Error: No valid S3 Buckets found

Hi,

I’m trying to use CFE in AWS and I get the error below in the CFE logs (Mode Silly).

severe: [f5-cloud-failover] Failover initialization failed: No valid S3 Buckets found! Error: No valid S3 Buckets found!

The S3 bucket is existing with the correct tag.

A VPC Gateway End-Point for service com.amazonaws.us-east-1.s3 is existing too. That Gateway End Point is attached to the main route table of the VPC where the BIGIPs instances are deployed.

I’ve also created the required IAM role for my BIGIP instances (I least it seems to be a good role with the correct policy attached to it).

Can you help me to understand/debug what’s wrong with my setup ?

The setup is:
Region: us-east-1
BIGIP: v15.1.0.2
CFE: 1.3.0

Complete Logs from CFE are below:

Sun, 31 May 2020 10:47:33 GMT - finest: socket 233 opened
Sun, 31 May 2020 10:47:33 GMT - fine: [f5-cloud-failover] HTTP Request - POST /declare
Sun, 31 May 2020 10:47:33 GMT - fine: [f5-cloud-failover] Successfully validated declaration
Sun, 31 May 2020 10:47:33 GMT - info: [f5-cloud-failover] Global logLevel set to 'silly'
Sun, 31 May 2020 10:47:33 GMT - finest: [f5-cloud-failover] Modifying existing data group f5-cloud-failover-state with body {"name":"f5-cloud-failover-state","type":"string","records":[{"name":"state","data":"eyJjb25maWciOnsiY2xhc3MiOiJDbG91ZF9GYWlsb3ZlciIsImVudmlyb25tZW50IjoiYXdzIiwiZXh0ZXJuYWxTdG9yYWdlIjp7InNjb3BpbmdUYWdzIjp7ImY1X2Nsb3VkX2ZhaWxvdmVyX2xhYmVsIjoiaGFycnlrLWNmZSJ9fSwiZmFpbG92ZXJBZGRyZXNzZXMiOnsiZW5hYmxlZCI6ZmFsc2UsInNjb3BpbmdUYWdzIjp7ImY1X2Nsb3VkX2ZhaWxvdmVyX2xhYmVsIjoiaGFycnlrLWNmZSJ9fSwiZmFpbG92ZXJSb3V0ZXMiOnsiZW5hYmxlZCI6dHJ1ZSwic2NvcGluZ1RhZ3MiOnsiZjVfY2xvdWRfZmFpbG92ZXJfbGFiZWwiOiJoYXJyeWstY2ZlIn0sInNjb3BpbmdBZGRyZXNzUmFuZ2VzIjpbeyJyYW5nZSI6IjEwLjAuMC4wLzI0In1dLCJkZWZhdWx0TmV4dEhvcEFkZHJlc3NlcyI6eyJkaXNjb3ZlcnlUeXBlIjoic3RhdGljIiwiaXRlbXMiOlsiMTcyLjQyLjMwLjExIiwiMTcyLjQyLjMwLjEyIl19fSwiY29udHJvbHMiOnsiY2xhc3MiOiJDb250cm9scyIsImxvZ0xldmVsIjoic2lsbHkifSwic2NoZW1hVmVyc2lvbiI6IjEuMy4wIn19"}]}
Sun, 31 May 2020 10:47:36 GMT - info: [f5-cloud-failover] Successfully wrote Failover trigger scripts to filesystem
Sun, 31 May 2020 10:47:36 GMT - fine: [f5-cloud-failover] Performing failover - initialization
Sun, 31 May 2020 10:47:36 GMT - fine: [f5-cloud-failover] config: {"class":"Cloud_Failover","environment":"aws","externalStorage":{"scopingTags":{"f5_cloud_failover_label":"harryk-cfe"}},"failoverAddresses":{"enabled":false,"scopingTags":{"f5_cloud_failover_label":"harryk-cfe"}},"failoverRoutes":{"enabled":true,"scopingTags":{"f5_cloud_failover_label":"harryk-cfe"},"scopingAddressRanges":[{"range":"10.0.0.0/24"}],"defaultNextHopAddresses":{"discoveryType":"static","items":["172.42.30.11","172.42.30.12"]}},"controls":{"class":"Controls","logLevel":"silly"},"schemaVersion":"1.3.0"}
Sun, 31 May 2020 10:47:38 GMT - fine: [f5-cloud-failover] Filtered Buckets: {}
Sun, 31 May 2020 10:47:38 GMT - severe: [f5-cloud-failover] Failover initialization failed: No valid S3 Buckets found! Error: No valid S3 Buckets found!
at _getAllS3Buckets.then.then.then.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/aws/cloud.js:1113:43)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
at Promise._resolveCallback (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:454:14)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:524:17)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
at Promise._resolveCallback (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:454:14)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:524:17)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
at PromiseArray._resolve (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:126:19)
at PromiseArray._promiseFulfilled (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:144:14)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:574:26)
Sun, 31 May 2020 10:47:38 GMT - severe: [f5-cloud-failover] Sending telemetry failed: Digital asset id of ff423876-1d37-504b-ab92-8f277c36465d is already registered
Sun, 31 May 2020 10:47:43 GMT - finest: socket 233 closed

Thanks

Harry

RPM not available after BIG-IP upgrade

The cloud failover extension disaappears after you upgrade the BIG-IP.

Workaround

You must re-install the cloud failover RPM in order for the failover to work properly. The failover declaration JSON is still intact. However, without the RPM/extension installed then you will receive a 404 error due to endpoint missing.

[f5-cloud-failover] Recovery operations are empty, advise reset via the API "failoverOperations":{"addresses":null,"routes":null}}

https://devcentral.f5.com/s/articles/Using-VPC-Endpoints-with-Cloud-Failover-Extension

We Created S3 Endpoint and added it to both F5 DMZ and Trust AWS Subnets with full policy access

Created EC2 EndPoint, enabled DNS for endpoint, and attached the bigip internal security group.

Getting a "Recovery operations are empty" error.
[root@ip-10-10-8-28:Standby:In Sync] config # tail -f /var/log/restnoded/restnoded.log
Mon, 04 May 2020 21:04:17 GMT - info: [f5-cloud-failover] Performing failover - execute
Mon, 04 May 2020 21:04:17 GMT - warning: [f5-cloud-failover] Performing Failover - recovery
Mon, 04 May 2020 21:04:17 GMT - severe: [f5-cloud-failover] Recovery operations are empty, advise reset via the API Error: Recovery operations are empty, advise reset via the API
at FailoverClient._getRecoveryOperations (/var/config/rest/iapps/f5-cloud-failover/nodejs/failover.js:373:19)
at _getDeviceObjects.then.then.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/failover.js:124:33)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:133:16)
at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:794:20)
at tryOnImmediate (timers.js:752:5)
at processImmediate [as _immediateCallback] (timers.js:729:5)

==================================================================
The Following JSON declartion is successful from both F51 and F52.

{
"message": "success",
"declaration": {
"class": "Cloud_Failover",
"environment": "aws",
"externalStorage": {
"scopingTags": {
"f5_cloud_failover_label": "bigip-nonprod"
}
},
"failoverAddresses": {
"enabled": true,
"scopingTags": {
"f5_cloud_failover_nic_map_eth1": "NonProd-eth1-external",
"f5_cloud_failover_nic_map_eth2": "NonProd-eth2-internal",
"f5_cloud_failover_nic_map_eth3": "NonProd-eth3-internal2"
},
"failoverRoutes": {
"enabled": true,
"scopingTags": {
"f5_cloud_failover_label": "bigip-nonprod-prod"
},
"scopingAddressRanges": [
{
"range": "10.10.116.0/24, 10.10.117.0/24, 10.200.116.0/24, "
}
],
"defaultNextHopAddresses": {
"discoveryType": "static",
"items": [
"10.200.116.105",
"10.200.116.116",
"10.10.116.88",
"10.10.116.56",
"10.10.117.94",
"10.10.117.232"
]
}
},
"controls": {
"class": "Controls",
"logLevel": "silly"
}
},
"schemaVersion": "1.2.0"
}
}

And the dig does return a valid internal IP of the S3 Endpoint:
[root@ip-10-10-8-10:Standby:In Sync] config # dig ec2.us-east-2.amazonaws.com
hmac_link.c:350: FIPS mode is 1: MD5 is only supported if the value is 0.
Please disable either FIPS mode or MD5.
; <<>> DiG 9.11.8 <<>> ec2.us-east-2.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63858
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;ec2.us-east-2.amazonaws.com. IN A
;; ANSWER SECTION:
ec2.us-east-2.amazonaws.com. 60 IN A 10.200.116.158
;; Query time: 3 msec
;; SERVER: 10.10.112.2#53(10.10.112.2)
;; WHEN: Mon May 04 16:05:47 CDT 2020
;; MSG SIZE rcvd: 61

#Check F5 metadata service on both F52 instance so the IAM role can be applied
[root@ip-10-10-8-28:Active:In Sync] config # curl http://169.254.169.254/latest/meta-data/iam/info
{
"Code" : "Success",
"LastUpdated" : "2020-05-05T03:05:01Z",
"InstanceProfileArn" : "arn:aws:iam::471729006672:instance-profile/bigip-nonprod-bigipServiceDiscoveryProfile-7C6M9RR6HBAJ",
"InstanceProfileId" : "AIPAW3VJ5NBIKL4YLXK7Y"

#AWS API is reachable from F52:
[root@ip-10-10-8-28:Standby:In Sync] config # curl -sI https://ec2.us-east-2.amazonaws.com | grep Server
Server: AmazonEC2

#Check F5 metadata service on both F51 instance so the IAM role can be applied
[root@ip-10-10-8-10:Active:In Sync] config # curl http://169.254.169.254/latest/meta-data/iam/info
{
"Code" : "Success",
"LastUpdated" : "2020-05-05T03:01:26Z",
"InstanceProfileArn" : "arn:aws:iam::471729006672:instance-profile/bigip-nonprod-bigipServiceDiscoveryProfile-7C6M9RR6HBAJ",
"InstanceProfileId" : "AIPAW3VJ5NBIKL4YLXK7Y"

#AWS API is reachable from F52:
[root@ip-10-10-8-10:Active:In Sync] config # curl -sI https://ec2.us-east-2.amazonaws.com | grep Server
Server: AmazonEC2

Dynamically created management routes

RFE idea, it would be awesome if CFE created a management-route dynamically if ECONREFUSED is encountered and retries the connection. This could be a simple flag in the config such as "dynamic_mgmt_route_creation":"enabled". It would follow the same gateway as the default management route with a /32 network using the IP in the error message.

documentation for AWS instructions for tagging NICs

The current clouddocs for AWS instructions say "Choose the set of instructions to follow based on whether you are provisioning for same network or across network." Then, if you look at the section called For Across Network Topology it only says you need to tag Elastic IP's. From my experience, you must also tag NIC's. For example, I am updating a route only (no EIP remapping) so the NIC tagging that is mentioned for "Same Network Topology" is also required for Across Network.

intended use of f5_cloud_failover_nic_map tags is not documented

The CFE clouddocs state that the f5_cloud_failover_nic_map tags are required:

NIC mapping tag: the name is static but the value is user-provided (f5_cloud_failover_nic_map:) and must match the
corresponding NIC on the secondary BIG-IP. The example below uses f5_cloud_failover_nic_map:external.
This name/value tag will correspond to the name/value tag you use in the failoverAddresses.scopingTags section of
the CFE declaration.

However, their purpose and intended use of these tags is not documented. Furthermore, the current version (v5.7.0) of the AWS CFTs do not set these tags and API-based failover seems to operate successfully without them.

Note that the Azure templates do set these additional tags, see F5Networks/f5-azure-arm-templates#172 (comment)

Routes failover with CFE needs the FailoverAddresses set to True even if only routes need to be updated

This was tested in Azure and in AWS. It was tested with Static and dynamic declarations for routes (routeTag).
For the routes failover to happen (i.e. route update) with CFE it requires :
"failoverAddresses": {
set to
"enabled": true,

If only "failoverRoutes" is set to True and "failoverAddresses" is set to false, the routes will not be updated.
the logs show the following message:
info: [f5-cloud-failover] No route operations to run

as soon as the "failoverAddresses" is set to True (of course with no secondary IPs or PIP to move over) the routes are updated. no other setting is changed in the declaration.

Can you explain why this setting is needed and if this is mandatory, could you update the documentation accordingly.
Thanks

Using management NIC to connect to Cloud API

By default AWS in multi-NIC templates (e.g. https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/failover/across-net/via-api/3nic/existing-stack/payg) cause CFE connectivity to AWS APIs to be established via External interface (eth1) rather than the Management interface (eth0). This leads to the following issues with CFE implementation:

  • EIPs (public IPs) corresponding to the external SelfIPs cannot be removed from the stack (without using more complex design based on AWS Endpoints) when EIPs for VIPs are required. Such removal would have improved security and would reduce the AWS subscription costs.
  • CFE initiation could occur quicker improving the chances to be completed by the time a BIG-IP instance within the HA cluster needs to assume an active role (see #36 (comment)).

It is not clear from the CFE documentation whether it is an inherent limitation of CFE that dictates using the external interface for API calls. If it is the case, please consider this an "enhancement request". If it is not the case, the CFE documentation should be updated to show additional steps required to use the management interface for this purpose.

GCP: Remove requirement for TARGET-INSTANCE object in AliasIP only use case

(Issue moved from: f5devcentral/f5-cloud-failover-extension#15)
Reported by: @f5-applebaum

Use case is that you would like to failover only AliasIps (vs Forwarding Rules).

Currently, the failover requires TARGET_INSTANCE objects to be created which is only needed for Forwarding Rules.

Workaround is creating target instance objects:

https://cloud.google.com/sdk/gcloud/reference/compute/target-instances/create

gcloud compute target-instances create NAME --instance=INSTANCE [--description=DESCRIPTION] [--instance-zone=INSTANCE_ZONE] [--zone=ZONE] [GCLOUD_WIDE_FLAG …]

Some problems when using in AWS China

(Issue moved from: f5devcentral/f5-cloud-failover-extension#14)
Reported by: @guqingyuan

Here are two problems in AWS China.
In my case, when the failover occurs, I just want to switch Elastic IP to another device.
So my declaration is

{
"class": "Cloud_Failover",
"environment": "aws",
"externalStorage": {
"scopingTags": {
"f5_cloud_failover_label": "testdeploy"
}
},
"failoverAddresses": {
"scopingTags": {
"f5_cloud_failover_label": "testdeploy"
}
},
"controls": {
"class": "Controls",
"logLevel": "info"
}
}
First problem: there is some error log
When I try to click force to standby in bigip GUI, I can see error log in /var/log/restnoded/restnoded.log on new active device.

Sun, 16 Feb 2020 11:45:57 GMT - finest: socket 185 opened
Sun, 16 Feb 2020 11:46:07 GMT - info: [f5-cloud-failover] Performing failover - execute
Sun, 16 Feb 2020 11:46:07 GMT - info: [f5-cloud-failover] Performing Failover - recovery
Sun, 16 Feb 2020 11:46:07 GMT - warning: [f5-cloud-failover] Recovering previous task: {"addresses":null,"routes":null}
Sun, 16 Feb 2020 11:46:07 GMT - info: [f5-cloud-failover] Performing Failover - update
Sun, 16 Feb 2020 11:46:07 GMT - severe: [f5-cloud-failover] failover.execute() error: The filter 'null' is invalid InvalidParameterValue: The filter 'null' is invalid
at Request.extractError (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/services/ec2.js:50:35)
at Request.callListeners (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:38:9)
at Request. (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:685:12)
at Request.callListeners (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
at Request.emit (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:38:9)
at Request. (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/request.js:685:12)
at Request.callListeners (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
at callNextListener (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/sequential_executor.js:96:12)
at IncomingMessage.onEnd (/var/config/rest/iapps/f5-cloud-failover/node_modules/aws-sdk/lib/event_listeners.js:307:13)
at emitNone (events.js:111:20)
at IncomingMessage.emit (events.js:208:7)
Sun, 16 Feb 2020 11:46:09 GMT - info: [f5-cloud-failover] Association of Elastic IP addresses successful
Sun, 16 Feb 2020 11:46:09 GMT - info: [f5-cloud-failover] Addresses reassociated successfully
Sun, 16 Feb 2020 11:46:12 GMT - finest: socket 185 closed

And in my S3 bucket, here is a file named f5cloudfailoverstate.json.

UnauthorizedAccess You are not authorized to perform this operation A7E92938BD4B20C5 BZiIvpBWaYsTwU8+dLiHNAR0wRQYYZ9LB0wh57qrhj8O5ciPQnBXJ5omhGrrZfRywIxEVlkaDYU= But in AWS console, I can see Elastic IP is already bound to another Private IP. Does this error have an effect?

Second problem: It takes too much time to switch elastic ip
I click force to standby in bigip GUI, and after about two minutes, I can see the ealstic ip bound to another Private IP.
I think it took too long. Is this normal?

Documentation: example GCP route has extra 24 mask x.x.x.x-24-24

The clouddocs have an example to create a GCP route, but the cidr mask is shown twice so command will fail.

https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/gcp.html#label-the-user-defined-routes-in-gcp

currently (wrong)

 gcloud compute routes create example-route --destination-range=192.168.1.0/24/24 --network=example-network --next-hop-address=192.0.2.10 --description='f5_cloud_failover_labels={"f5_cloud_failover_label":"mydeployment"}'

should be

 gcloud compute routes create example-route --destination-range=192.168.1.0/24 --network=example-network --next-hop-address=192.0.2.10 --description='f5_cloud_failover_labels={"f5_cloud_failover_label":"mydeployment"}'

GCP: Failover is broken when dst is 'ANY' in a VS

(Issue moved from f5devcentral/f5-cloud-failover-extension#25)
Reported By: @abhishek-batapati

We have multiple VS in our configuration with few VS having destination as 'ANY'(0.0.0.0/0).
VS with dst 'ANY' is used for source NAT.

When we trigger a failover, it is failing because of VS with dst as 'ANY'. We would have expect it to skip this VS for failover.

Error log:
Mon, 06 Apr 2020 04:17:41 GMT - info: [f5-cloud-failover] Performing failover - execute
Mon, 06 Apr 2020 04:17:41 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Mon, 06 Apr 2020 04:17:42 GMT - severe: [f5-cloud-failover] failover.execute() error: ipaddr: the address has neither IPv6 nor IPv4 format Error: ipaddr: the address has neither IPv6 nor IPv4 format
at Object.ipaddr.parse (/var/config/rest/iapps/f5-cloud-failover/node_modules/ipaddr.js/lib/ipaddr.js:632:13)
at ipsFilter.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:516:51)
at Array.forEach ()
at ips.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:512:23)
at Array.forEach ()
at Cloud._matchIps (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:505:13)
at vm.networkInterfaces.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:640:51)
at Array.forEach ()
at theirVms.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:636:34)
at Array.forEach ()
at Cloud._discoverNicOperations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:634:18)
at Cloud._discoverAddressOperations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:590:18)
at _getVmsByTags.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:173:33)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:582:21)
at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)

ECONNREFUSED in logs showing unknown public IP addresses

I'm trying to make the initial configuration declaration using CFE v1.2 and I'm seeing this error in the logs:

Thu, 30 Apr 2020 17:25:11 GMT - finest: socket 200 closedThu, 30 Apr 2020 17:25:20 GMT - severe: [f5-cloud-failover] Failover initialization failed: connect ECONNREFUSED 40.71.13.226:443 Error: connect ECONNREFUSED 40.71.13.226:443

Thu, 30 Apr 2020 17:23:56 GMT - severe: [f5-cloud-failover] Sending telemetry failed: Unable to register device: connect ECONNREFUSED 35.199.173.84:443

I've confirmed that the metadata-route management route exists. Perhaps it's redirecting to these public IPs?

Also getting the error on 52.239.155.132.

Adding multiple routes to scopingAddressRanges

Documentation implies that it is possible to add multiple destination routes to the declaration. E.g. the rage https://github.com/F5Networks/f5-cloud-failover-extension/blob/master/docs/userguide/configuration.rst refers to "A list of the destination routes to update in the event of failover".

I tried to add the second value 192.0.6.0/24 to the declaration file as follows:
"scopingAddressRanges":[
{
"range":"192.0.2.0/24, 192.0.6.0/24"
}
],
but received the following error message after using "declare":
{"message":"Invalid declaration: [{"keyword":"format","dataPath":".failoverRoutes.scopingAddressRanges[0].range","schemaPath":"#/properties/failoverRoutes/properties/scopingAddressRanges/items/properties/range/format","params":{"format":"ipAddressWithCidr"},"message":"should match format \"ipAddressWithCidr\""}]"}500

I also tried without the separating space (i,e, "192.0.2.0/24,192.0.6.0/24") but received the same error message.

The document quoted above does not describe how "ranges" (under scopingAddressRanges) can become an array similar to "items" under defaultNextHopAddresses.

documentation on creating route to metadata service in Azure is unclear

Description

The documentation here:

https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/azure.html#using-tmsh

should be updated to clearly state that the gateway address in the example should be changed to the actual management interface gateway address on the BIG-IP that's been deployed. For example, after the code snipped we should add something like the following:

Please ensure that the gateway address in the example above has been changed to the actual gateway you have configured on your BIG-IPs for the management interface. You can get that address by running the following tmsh command:

tmsh list sys management-route

Environment information

  • Cloud Failover Extension Version: all
  • BIG-IP version: any
  • Cloud provider: Azure

Severity: 4

Clarification needed on AWS CFE examples

(Issue moved from: f5devcentral/f5-cloud-failover-extension#17)
Reported By: @C0missar

The AWS section of the CFE user guide leaves a lot of questions open, or if the answers are there, I didn't understand them.

https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/aws.html
The drawing doesn't match either the text or the example declarations. It would help to have a single drawing, with the declarations matching it exactly, and discussion using that scenario and IPs - preferably two examples, a same-AZ and an across-AZ case. The routing considerations are substantially different.

• What route(s) are to be updated? The Big-IPs can be in different subnets.
• The examples talk about both the default route and RFC 1918 routes being updated.
• Must the web servers' default route be pointed at the Big-IPs internal interface?
• Is iLX installation required? It appears so.
• Can CFE share the same S3 bucket as the one created by the CFT? It appears so.
• The failover drawing shows that VIPs must be in traffic group 'none' – why?
• Using addresses like '10.0.1.10' and '10.0.11.10' is confusing and hard to read. Why not '10.0.20.x' and '10.0.30.x' so the differences stand out?

When it comes to operations, I haven't been able to make CFE do anything. Although it accepted my declaration and responds appropriately to status and failover triggers, nothing is actually happening. Not terribly surprising, as I still don't understand it, but I should get some indication back.

• How do you troubleshoot CFE?
• Why does a call to Trigger Failover return "SUCCEEDED" when nothing happened?
• The Across-AZ CFT creates an EIP and a private VIP on bigip1, but no private VIP on bigip2, so there is nothing to associate the EIP with on failover.

Thanks,
Stan

Azure: CFE deletes ipconfigs but fails to create ipconfigs if Public IPs exist in another Resource Group

Issue Description

My customer found that Ipconfigs were successfully deleted from Device1 but failed to be created on Device2 due to permissions errors. CFE logs below.

Customer deployed a supported ARM template, then created additional VIPs using public IP addresses from a different RG.

Since the Managed Identity for the VM's is only permissioned at the RG in which the BIG-IP is deployed, it did not have write permissions over the public IP.

I noticed that in issue #31 there was a reference to AUTOSDK-376 that would target a feature that might check for missing dependencies. My main question here is:

a) could we have some kind of permissions check prior to, or at the time of failover, so that we can avoid hitting permission errors after the ipconfigs have been successfully deleted but before they are created on the other device, or

b) could we document the requirement that public IP's must be in the appropriate RG's or have appropriate permissions in place in order to failover correctly? Currently there is a reference to RG's and permissions in the FAQ but it may help to call out public IP's as these may get created much later than BIG-IPs, potentially by different teams.

Workaround

  1. I asked customer to delete public IP and recreate in RG in which ManagedIdentity is a Contributor.
  2. I also suggested customer could give ManagedIdentity Contributor permissions in all other RG's where PublicIP's may get created

Steps to Recreate

  1. Deploy supported HA ARM template, failover via API.
  2. After deployment, create a new VIP. Allocate a private IP and then associate a Public IP that already exists in a different RG from where the BIG-IP was deployed.

CFE log file

I can provide the full log file but I have copied the lines from this failover event below, and manually removed any GUID's and object names:

Fri, 31 Jul 2020 04:25:57 GMT - finest: socket 206 opened
Fri, 31 Jul 2020 04:26:02 GMT - info: [f5-cloud-failover] Performing failover - execute
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Discover Address operations using localAddresses {"0":"x.x.x.x","1":"x.x.x.x"} failoverAddresses {"0":"x.x.x.x","1":"x.x.x.x"} to discover
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Performing Failover - update
Fri, 31 Jul 2020 04:26:57 GMT - finest: socket 206 closed
Fri, 31 Jul 2020 04:26:58 GMT - finest: socket 207 opened
Fri, 31 Jul 2020 04:27:04 GMT - finest: socket 207 closed
Fri, 31 Jul 2020 04:30:37 GMT - info: [f5-cloud-failover] Disassociate NICs successful.
Fri, 31 Jul 2020 04:34:01 GMT - severe: [f5-cloud-failover] The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope '[ResourceId of NIC]'; however, it does not have permission to perform action 'Microsoft.Network/publicIPAddresses/join/action' on the linked scope(s) '[ResourceId of PublicIP1],[ResourceId of PublicIP2]' or the linked scope(s) are invalid. Error: The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope '[ResourceId of NIC]'; however, it does not have permission to perform action 'Microsoft.Network/publicIPAddresses/join/action' on the linked scope(s) '[ResourceId of PublicIP1],[ResourceId of PublicIP2]' or the linked scope(s) are invalid.
    at client.pipeline (/var/config/rest/iapps/f5-cloud-failover/node_modules/azure-arm-network/lib/operations/networkInterfaces.js:2037:19)
    at retryCallback (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/systemErrorRetryPolicyFilter.js:89:9)
    at retryCallback (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/exponentialRetryPolicyFilter.js:140:9)
    at /var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/rpRegistrationFilter.js:59:14
    at handleRedirect (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/redirectFilter.js:39:9)
    at /var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/formDataFilter.js:23:14
    at Request.defaultRequest [as _callback] (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/requestPipeline.js:125:16)
    at Request.self.callback (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:185:22)
    at emitTwo (events.js:126:13)
    at Request.emit (events.js:214:7)
    at Request.<anonymous> (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:1154:10)
    at emitOne (events.js:121:20)
    at Request.emit (events.js:211:7)
    at IncomingMessage.<anonymous> (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:1076:12)
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
Fri, 31 Jul 2020 19:18:24 GMT - finest: socket 208 opened

error "Cannot read property 'disassociate' of undefined" when trigering Failover

(Issue moved from f5devcentral/f5-cloud-failover-extension#22)

Reported By: @tewfikm

from Version CFE : v1.1
TITLE: When trigering failover Severe error "severe: [f5-cloud-failover] failover.execute() error: Cannot read property 'disassociate' of undefined TypeError: Cannot read property 'disassociate' of undefined"
is generated.

version impacted:

{
"version": "1.1.0",
"release": "0",
"schemaCurrent": "1.1.0",
"schemaMinimum": "0.9.1"
}

MSI:
OK

Details:
on Unit1:
Wed, 25 Mar 2020 15:39:29 GMT - finest: socket 206 closed
Wed, 25 Mar 2020 15:53:59 GMT - finest: socket 207 opened
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing failover - execute
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing Failover - recovery
Wed, 25 Mar 2020 15:54:02 GMT - warning: [f5-cloud-failover] Recovering previous task: {"routes":{"operations":[]}}
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing Failover - update
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] No localAddresses/failoverAddresses to discover
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] No route operations to run
Wed, 25 Mar 2020 15:54:02 GMT - severe: [f5-cloud-failover] failover.execute() error: Cannot read property 'disassociate' of undefined TypeError: Cannot read property 'disassociate' of undefined
at _discoverAddressOperations.then.operations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/azure/cloud.js:129:66)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromiseCtx (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:606:10)
at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:138:12)
at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:794:20)
at tryOnImmediate (timers.js:752:5)
at processImmediate [as _immediateCallback] (timers.js:729:5)
Wed, 25 Mar 2020 15:54:07 GMT - finest: socket 207 closed

on Unit2:
Wed, 25 Mar 2020 15:39:29 GMT - finest: socket 206 closed
Wed, 25 Mar 2020 15:53:59 GMT - finest: socket 207 opened
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing failover - execute
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing Failover - recovery
Wed, 25 Mar 2020 15:54:02 GMT - warning: [f5-cloud-failover] Recovering previous task: {"routes":{"operations":[]}}
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] Performing Failover - update
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] No localAddresses/failoverAddresses to discover
Wed, 25 Mar 2020 15:54:02 GMT - info: [f5-cloud-failover] No route operations to run
Wed, 25 Mar 2020 15:54:02 GMT - severe: [f5-cloud-failover] failover.execute() error: Cannot read property 'disassociate' of undefined TypeError: Cannot read property 'disassociate' of undefined
at _discoverAddressOperations.then.operations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/azure/cloud.js:129:66)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromiseCtx (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:606:10)
at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:138:12)
at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:794:20)
at tryOnImmediate (timers.js:752:5)
at processImmediate [as _immediateCallback] (timers.js:729:5)
Wed, 25 Mar 2020 15:54:07 GMT - finest: socket 207 closed

Failover is not triggered when it should be after both F5 VMs are stopped or rebooted

If both of the F5 VMs are stopped or rebooted failover of AWS objects is not triggered by the script even the new F5 VM elected active is not the same as before the reboot. Let's consider the following sequence of events that reproduces the error consistently:

  1. bigip1 is Active and AWS PIPs/routes are pointing to it (bigip2 is Standby).
  2. Both systems are stopped at the AWS-instance level.
  3. bigip2 is started and becomes Active. Failover is not triggered and AWS keeps trying to use bigip1 which is still unavailable. The outage continues until the bigip1 is brought up as well.
  4. bigip1 is started and becomes Standby. Failover is still not triggered and AWS will send traffic to bigip1. If VSs are configured in traffic-groups None the requests will be processed by bigip1, but persistence tables may not work correctly, and the discrepancy in Active/Standby state at the F5 vs AWS level will cause confusions.

I have not tested the behavior of the cluster when both VMs are "shutdown" at the F5 level (rather than stopped at the AWS level), but I suspect it will be as described above.

Multiple Next Hop Addresses - failover operation fails

Description

I am trying to use the declaration from the following CFE CloudDocs article:

https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/example-declarations.html#multiple-next-hop-addresses

The declaration is successfully loaded, but when failover is triggered, the following error message is generated:

Tue, 11 Aug 2020 06:15:56 GMT - severe: [f5-cloud-failover] Cannot read property 'NetworkInterfaceId' of undefined TypeError: Cannot read property 'NetworkInterfaceId' of undefined at _listNics.then.nics (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/aws/cloud.js:485:51)

I also tested a similar declaration using "discoveryType":"routeTag" and obtained exactly the same error message.

If I remove the "192.168.12.0/24" clause, corresponding to the internal selfIPs, from the declaration failover is successful.

I am using an AWS CloudFormation template based on the following F5-supported template:

https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/failover/across-net/via-api/3nic/existing-stack/payg

Declaration code

My declaration code is below (the only differences with the example are the "debug" log level; "mydeployment" tags replaced with "slb1ha"; selfIPs are replaced with my external and internal selfIP pairs: 10.120.2.200,10.120.3.86 and 10.120.4.152, 10.120.5.12 respectively).

{
    "class": "Cloud_Failover",
      "controls":{
      "class":"Controls",
      "logLevel":"debug"
    },
    "environment": "aws",
    "externalStorage": {
      "scopingTags": {
        "f5_cloud_failover_label": "slb1ha"
      }
    },
    "failoverAddresses": {
      "enabled": true,
      "scopingTags": {
        "f5_cloud_failover_label": "slb1ha"
      }
    },
    "failoverRoutes": {
      "enabled": true,
      "scopingTags": {
        "f5_cloud_failover_label": "slb1ha"
      },
      "scopingAddressRanges": [
        {
          "range": "192.168.11.0\/24",
          "nextHopAddresses": {
            "discoveryType": "static",
            "items": [
              "10.120.2.200",
              "10.120.3.86"
            ]
          }
        },
        {
          "range": "192.168.12.0\/24",
          "nextHopAddresses": {
            "discoveryType": "static",
            "items": [
              "10.120.4.152",
              "10.120.5.12"
            ]
          }
        }
      ]
    }
}

IPv6 support needed for failover of EIP's in AWS

Summary

We need support for failover of IPv6 addresses in AWS. According to faq in current documentation that is not supported yet.

Detail

We have both IPv4 and IPv6 addresses today on-prem on our F5 devices, and are required to migrate the application and F5 services to AWS. We have a requirement to continue to support both IPv4 and IPv6 in AWS, and we would like to use CFE for high availability because we need fast failover times to meet our strict uptime SLA. We have compliance enforced by a 3rd party so changing these requirements may be complex.

With an internet-facing deployment of F5 VE in AWS that supports IPv6 VIPs, we need the CFE to support failover of these VIPs. Can you suggest any workaround or timelines so we can plan for this? Many thanks.

DCHP default route does not catch Azure metadata 169.254.169.254 traffic: ECONNREFUSED

(Issue moved from f5devcentral/f5-cloud-failover-extension#21)
Reported By: @JeffGiroux

A default route created by Azure DHCP does not catch traffic going to Azure metadata service as required by the CFE pre-reqs which is 169.254.169.254. If I hit certain API URLs for CFE then I get ECONNREFUSED.

Example endpoint = /reset
Hitting above endpoint without 169.254.169.254 specifically configured as a route will result in unreachable.

My Azure deployment creates DHCP routes like this...

sys management-route default {
description configured-by-dhcp
gateway 10.90.1.1
network default
}
sys management-route dhclient_route1 {
description configured-by-dhcp
gateway 10.90.1.1
network 168.63.129.16/32
}

The note in documentation states this...
"Certain BIG-IP versions and/or topologies may use DHCP to create the management routes (for example: dhclient_route1), if that is the case the below steps are not required."

However, my dhclient_route1 does not contain the network address required by CFE. Therefore, I have to manually add an additional route according to CFE documentation. My example...

tmsh modify sys db config.allow.rfc3927 value enable
tmsh create sys management-route metadata-route network 169.254.169.254/32 gateway 10.90.1.1

If you do not add the config.allow.rfc3927 ahead of time, then F5 will not allow you to add the 169.x.x.x route. Error = 01020062:3: IP Address 169.254.169.254 is invalid, link-local address not allowed.

Can you validate and/or update documentation if needed.

RFE: CFE to support Azure Private Endpoints for Storage Account

(Issue recreated from f5devcentral/f5-cloud-failover-extension#26)
Reported By: @tewfikm

Title : Request for Enhancement : CFE to support Azure Private Endpoints for Storage Account

Details:
For security reasons, customer is asking CFE to avoid performing API calls to Azure Public Endpoints.

Current CFE Azure provider is performing DNS resolution to Azure Storage account using connection String to Storage Account Public endpoint.

ASK:
CFE to Support Private Endpoint for storage accounts, by:

giving option to configure storage account to switch to Private Endpoint on the VNET CFE is installed
denying public access using storage FW

Azure: CFE modifies routes whose nextHop doesn't belong to BIG-IP

CFE v.1.2.0

CFE will modify a route, changing its next hop address to one of the BIG-IP local addresses even if the current next hop address on the route doesn't belong to BIG-IP, i.e. the address isn't in f5_self_ips.

For instance

  • Route table tagged with "f5_self_ips":"192.168.56.37,192.168.56.38"

  • Route in route table with prefix 10.0.0.0/8 and with next hop 192.168.56.36

192.168.56.36 is not BIG-IP and is not in f5_self_ips

CFE will change next hop of route to either 192.168.56.37 or 192.168.56.38.

POST request to http://localhost:8100/mgmt/shared/cloud-failover/trigger fails with 500 error

(Issue moved from: f5devcentral/f5-cloud-failover-extension#20)
Reported by: @y-myk

BIG-IP HA cluster in Azure deployed from https://github.com/F5Networks/f5-azure-arm-templates/tree/v7.4.0.0/supported/failover/same-net/via-api/n-nic/existing-stack/byol

Deployment completed with no errors. However failover fails.

/var/log/cloud/azure/onboard.log

2020-03-17T14:21:54.300Z info: Onboard starting.
...
2020-03-17T14:22:10.411Z info: Licensing.
2020-03-17T14:22:34.619Z info: Provisioning modules {"ltm":"nominal"}
2020-03-17T14:23:07.305Z info: Installing package at path: /var/config/rest/downloads/f5-cloud-failover-1.1.0-0.noarch.rpm
2020-03-17T14:23:07.306Z info: Installing package at path: /var/config/rest/downloads/f5-appsvcs-3.5.1-5.noarch.rpm
2020-03-17T14:23:23.368Z info: Saving config.
2020-03-17T14:23:35.766Z info: Waiting for device to be active.
2020-03-17T14:23:40.228Z info: Sending metrics
2020-03-17T14:23:41.126Z info: Device onboard complete.
2020-03-17T14:23:44.433Z info: Onboard finished.

/var/log/cloud/azure/cluster.log

2020-03-17T14:24:08.961Z info: /config/cloud/azure/node_modules/@f5devcentral/f5-cloud-libs/scripts/cluster.js called with /usr/bin/f5-rest-node /config/cloud/azure/node_modules/@f5devcentral/f5-cloud-libs/scripts/cluster.js --output /var/log/cloud/azure/cluster.log --log-level info --host 10.10.0.11 --port 443 -u svc_user --password-url file:///config/cloud/.passwd --password-encrypted --config-sync-ip 10.10.2.11 --join-group --device-group Sync --sync --remote-host 10.10.0.10 --remote-user svc_user --remote-password-url file:///config/cloud/.passwd
2020-03-17T14:24:08.970Z info: Cluster starting.
...
2020-03-17T14:24:48.624Z info: Adding to remote trust.
2020-03-17T14:25:04.010Z info: Adding to remote device group.
2020-03-17T14:25:10.712Z info: Checking for datasync-global-dg.
2020-03-17T14:25:21.554Z info: Telling remote to sync.
2020-03-17T14:25:53.117Z info: Waiting for sync to complete.
2020-03-17T14:25:54.571Z info: Sync complete.
2020-03-17T14:25:54.573Z info: Waiting for BIG-IP to be active.
2020-03-17T14:25:56.619Z info: Cluster finished.

/var/log/ltm

Mar 18 10:40:54 f5jp010.eastus.cloudapp.azure.com notice sod[7364]: 010c006d:5: Leaving Standby for Active: Next Active, peers agree on config.
Mar 18 10:40:54 f5jp010.eastus.cloudapp.azure.com notice sod[7364]: 010c0053:5: Active for traffic group traffic-group-1.
Mar 18 10:40:54 f5jp010.eastus.cloudapp.azure.com notice sod[7364]: 010c0019:5: Active

/var/log/restnoded/restnoded.log

Wed, 18 Mar 2020 10:40:54 GMT - finest: socket 202 opened
Wed, 18 Mar 2020 10:40:59 GMT - info: [f5-cloud-failover] Performing failover - execute
Wed, 18 Mar 2020 10:40:59 GMT - info: [f5-cloud-failover] Performing Failover - recovery
Wed, 18 Mar 2020 10:40:59 GMT - warning: [f5-cloud-failover] Recovering previous task: {"addresses":null,"routes":null}
Wed, 18 Mar 2020 10:40:59 GMT - info: [f5-cloud-failover] Performing Failover - update
Wed, 18 Mar 2020 10:40:59 GMT - info: [f5-cloud-failover] No localAddresses/failoverAddresses to discover
Wed, 18 Mar 2020 10:40:59 GMT - severe: [f5-cloud-failover] failover.execute() error: Cannot read property 'disassociate' of undefined TypeError: Cannot read property 'disassociate' of undefined
at _discoverAddressOperations.then.operations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/azure/cloud.js:129:66)
at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromiseCtx (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:606:10)
at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:138:12)
at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:794:20)
at tryOnImmediate (timers.js:752:5)
at processImmediate [as _immediateCallback] (timers.js:729:5)
Wed, 18 Mar 2020 10:40:59 GMT - info: [f5-cloud-failover] No route operations to run
Wed, 18 Mar 2020 10:41:04 GMT - finest: socket 202 closed

/var/log/restjavad-audit.0.log

[I][243][18 Mar 2020 10:40:59 UTC][ForwarderPassThroughWorker] {"user":"local/admin","method":"POST","uri":"http://localhost:8100/mgmt/shared/cloud-failover/trigger","status":500,"from":"Unknown"}

CFE endpoint https://localhost/mgmt/shared/cloud-failover/info is responding on each HA peer, however GET request to https://localhost/mgmt/shared/cloud-failover/trigger fails:

[admin@f5jp011:Active:In Sync] ~ # curl -ik -u admin https://localhost/mgmt/shared/cloud-failover/info
HTTP/1.1 200 OK
Date: Wed, 18 Mar 2020 10:56:05 GMT
Server: Jetty(9.2.22.v20170606)
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=16070400; includeSubDomains
Content-Type: application/json; charset=UTF-8
X-Powered-By: Express
Pragma: no-cache
Cache-Control: no-store
Cache-Control: no-cache
Cache-Control: must-revalidate
Expires: -1
Content-Length: 81
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data: blob:; img-src 'self' data: http://127.4.1.1 http://127.4.2.1

[admin@f5jp011:Active:In Sync] ~ # curl -ik -u admin https://localhost/mgmt/shared/cloud-failover/trigger
HTTP/1.1 400 Bad Request
Date: Wed, 18 Mar 2020 10:56:32 GMT
Server: Jetty(9.2.22.v20170606)
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=16070400; includeSubDomains
Content-Type: application/json; charset=UTF-8
Pragma: no-cache
Cache-Control: no-store
Cache-Control: no-cache
Cache-Control: must-revalidate
Expires: -1
Content-Length: 1349
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data: blob:; img-src 'self' data: http://127.4.1.1 http://127.4.2.1
Connection: close

{"taskState":"FAILED","message":"Failover failed because of failover.execute() error: Cannot read property 'disassociate' of undefined TypeError: Cannot read property 'disassociate' of undefined\n at _discoverAddressOperations.then.operations (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/azure/cloud.js:129:66)\n at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)\n at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)\n at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)\n at Promise._settlePromiseCtx (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:606:10)\n at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:138:12)\n at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)\n at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)\n at runCallback (timers.js:794:20)\n at tryOnImmediate (timers.js:752:5)\n at processImmediate [as _immediateCallback] (timers.js:729:5)","timestamp":"2020-03-18T10:54:18.040Z","instance":"f5jp011.eastus.cloudapp.azure.com","failoverOperations":{"addresses":null,"routes":null},"code":400}[admin@f5jp011:Active:In Sync] ~ #

CFE uses hostname rather than device name when identifying traffic group

(Issue moved from f5devcentral/f5-cloud-failover-extension#23)
Reported By: @C0missar

If the host name of the device does not match the device name in the device group, CFE is unable to identify itself and determine which traffic groups are associated. The result is that both devices think they are in standby, both report themselves as being in standby in response to a call to /mgmt/shared/cloud-failover/inspect, and no EIP re-association ever occurs.

The host name is written to the S3 f5cloudfailoverstate.json file, and the failover script searches for this name in the device group.

The CFE should only use the device name for identification of the traffic groups.

feature request: AWS route added if it does not exist?

This is not a bug but a feature request for your consideration: A colleague was configuring CFE to update a route in AWS, and his declaration was correct. However, the route he was looking to update in the route table did not exist, and hence his failure to update this route. Once he created this route within AWS, CFE failover worked great.

He had assumed that CFE would create the route for him. While we know that the route must exist to be updated, is this a reasonable assumption a customer might make? If yes, would you consider having CFE create the route if it is not found in the tagged AWS route table? Or, perhaps, log messages to indicate that the route table was found, but the specified route does not exist? If not, please close issue, but I thought to log for your consideration.

GCP: On Failover CFE updates forwarding rules based on IP only

(Issue moved from: f5devcentral/f5-cloud-failover-extension#18)
Reported By: @abhishek-batapati

Issue:
On failover- CFE updates forwarding rules based on IP only. We have a scenario where same IP is used in forwarding rules target to 2 sets of F5s but on different port numbers.

Scenario:
2 set of F5 using same IP with one set using ports 80 & 443 and other set using ports 8080 & 8443

Forwarding rules are defined with IP + Port as mentioned above.

f5-cluster-1-tcp-443 us-west1 35.xx.xx.108 TCP us-west1-a/targetInstances/bigip1-cluster1-ti
f5-cluster-1-tcp-80 us-west1 35.xx.xx.108 TCP us-west1-a/targetInstances/bigip1-cluster1-ti
f5-cluster-2-tcp-8080 us-west1 35.xx.xx.108 TCP us-west1-b/targetInstances/bigip1-cluster2-ti
f5-cluster-2-tcp-8443 us-west1 35.xx.xx.108 TCP us-west1-b/targetInstances/bigip1-cluster2-ti

Virtual servers are defined in both cluster listening on IP+Port as defined above.

On triggering a failover on cluster-1, F5 in cluster-1 made changes to all 4 forwarding rules described above.

Requirement:
We have a requirement where F5 CFE should look into IP+Port instead of IP only to make changes on the forwarding rules.

GCP: CFE fails to update routes on a failover- when a project has more than 500 routes(globally)

Issue: F5 CFE fails to update the user defined route on a failover

In our project we have more than 1500 global routes.

It might because of CFE making a GET API call to fetch all routes and then grep for the one with the right key:value pair in the description field.

But based on the below link GCP might return max 500 results per page.
https://cloud.google.com/compute/docs/reference/rest/v1/routes/list

This issue is currently a blocker for us as we have seen multiple failovers in our env and all of them have failed.

[DOC] AWS: GIF uses incorrect terms for AWS objects

(Issue moved from:
Reported by: @ajsq

The GIF for Failover Devent Diagram on the AWS page has the following errors:

each BIG-IP is shown as existing in its own VPC - it should say AZ since all BIG-IPs must exist in a single VPC (Elastic IPs are scoped to VPC)
the AZs in us-west-1 are named us-west-1a, us-west-1b, etc not us-west-1-a etc

CFE giving back 400 on /inspect and /reset - Azure

Description

15.1.0.4 and CFE 1.5 - both /inspect and /reset are handing back 400 errors. Other API endpoints are working as expected, it is the same on both units (active/standby)

Environment information

For bugs, enter the following information:

  • Cloud Failover Extension Version: 1.5
  • BIG-IP version: 15.1.0.4
  • Cloud provider: Azure
  • Azure Region: North Central

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 3

Severity level definitions:

  1. Severity 1 (Critical) : Defect is causing systems to be offline and/or nonfunctional. immediate attention is required.
  2. Severity 2 (High) : Defect is causing major obstruction of system operations.
  3. Severity 3 (Medium) : Defect is causing intermittent errors in system operations.
  4. Severity 4 (Low) : Defect is causing infrequent interruptions in system operations.
  5. Severity 5 (Trivial) : Defect is not causing any interruptions to system operations, but none-the-less is a bug.

Testing

Units have access to metadata and they can query it, so the MSI is working as expected. All tags have been in place and verified using: curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2020-06-01"

In var/log/restnoded we can see the CFE update come in but there is never an event for /inspect or /reset

restnoded

Example of CFE:

{
    "class": "Cloud_Failover",
    "environment": "azure",
    "externalStorage": {
        "scopingTags": {
            "f5_cloud_failover_label": "resource_group_name"
        }
    },
    "failoverAddresses": {
        "enabled": true,
        "scopingTags": {
            "f5_cloud_failover_label": "resource_group_name"
        }
    },
    "failoverRoutes": {
        "enabled": true,
        "scopingTags": {
            "f5_cloud_failover_label": "resource_group_name"
        },
        "scopingAddressRanges": [
            {
                "range": "192.0.2.0/24"
            }
        ],
        "defaultNextHopAddresses": {
            "discoveryType": "static",
            "items": [
                "192.0.2.10",
                "192.0.2.11"
            ]
        }
    },
    "schemaVersion": "1.5.0"
}

Restjavad logs look like they are throwing exceptions:

restjavadwhole

Cannot read property IPAddress of undefined fwdRules.forEach

Using terraform to deploy my environment. I have target instance groups created, a forwarding rule, but I do NOT use the forwarding rule address as a virtual server. For unrelated reasons, I'm simply using 0.0.0.0/0 as a test VS. But...the terraform resources for target instance and forwarding rules are still created.

When this happens...AND the BIG-IP is NOT using a virtual address from a forwarding rule, you get an error.

Sun, 14 Jun 2020 17:27:22 GMT - info: [f5-cloud-failover] Performing failover - execute
Sun, 14 Jun 2020 17:27:22 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Sun, 14 Jun 2020 17:27:23 GMT - severe: [f5-cloud-failover] Cannot read property 'IPAddress' of undefined TypeError: Cannot read property 'IPAddress' of undefined
    at fwdRules.forEach (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:809:56)
    at Array.forEach (<anonymous>)
    at Promise.all.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/gcp/cloud.js:808:31)
    at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Promise._fulfill (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:638:18)
    at PromiseArray._resolve (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:126:19)
    at PromiseArray._promiseFulfilled (/usr/share/rest/node/node_modules/bluebird/js/release/promise_array.js:144:14)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:574:26)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)

This is related to GCP and forwarding rules in general...see my other issue #26.

There needs to be better handling of the error and also possibly a "pass through" feature where...if error, still failover the remaining objects. I say this because...my alias IP and my routes did NOT failover due to the forwarding rule error. Once i fixed my VS to use an address from the forwarding rule, no error.

Suggestions:

  1. Look for VS address that matches forwarding rule
  2. If found, then search for target instances
  3. Then update forwarding rule with other target instance from unit2

Behavior now...

  1. Seems that the script will query for vm instances and find target instance groups
  2. Then, script seems to expect remaining stuff of a target instance to be there too (forwarding rule, matching VS IP)
  3. If step 2 fails to find a matching rule for a VM that lives in a target instance...fail
  4. No other cloud objects move

Workaround

Workaround for user...

  1. If not planning to use a forwarding rule, then delete the target instance groups and the forwarding rule
  2. Or...if not planning to use the IP address of the forwarding rule as your BIG-IP virtual server address, then delete the target instance groups and the forwarding rule

Workaround for CFE...

  1. fail through...meaning, if forwarding rules fail is it possible to still move the other objects (aliasIP, routes)?
  2. Update logic to not fail on forwarding rules if VS IP doesn't match. If latter, then that means user must have accidentally created a target instance in the past.

GCP aliasIP only moves on NIC0, need aliasIP to move on other NICs too

Request

Currently, CFE in GCP only moves aliasIP ranges associated with NIC0. I have a use case for aliasIPs on NIC2 (aka "internal network") to move as well. This request is to have CFE look for all data plane NICs and move aliasIPs if found.

Current Behavior

  1. deploy HA via API 3nic template in GCP with aliasIP (this gets assigned to NIC0)
  2. once deployed, manually go to active BIG-IP VM and add new aliasIP range to NIC2
  3. test failover
  4. Only NIC0 aliasIP ranges move to new active unit
  5. NIC2 aliasIP ranges remain on old active unit

Expected Behavior

  1. failover should move all aliasIP ranges associated with all data plane NICs

Use Case

Outbound SNAT needs to have floater SNAT. BIG-IP cluster sending egress traffic needs to have consistent floater IP during failover events. Without an aliasIP assigned to the NIC, traffic will not leave on the additional IP assigned to the SNAT if the VM only has one self IP assigned.

CFE v1.4.0 not working after IOS Upgrade to version 15.1.0.5

After upgrading BIP-IP( hosted in Azure cloud) to version 15.1.0.5, Cloud failover functionality stopped working. Per checking found that RPM file(f5-cloud-failover) was visible in iApps from GUI but it was missing when checked from CLI at location /var/config/rest/iapps.

Workaround: Uninstall f5-cloud-failover(RPM) and Import again.

BEFORE

[user@Big-IP:Standby:In Sync] ~ # cd /var/config/rest/iapps/
[user@Big-IP:Standby:In Sync] iapps # ls -lrt
total 12
drwxr-xr-x. 2 root root 4096 Aug 29 02:36 RPMS
drwxr-xr-x. 9 restnoded restnoded 4096 Aug 29 02:36 f5-appsvcs

AFTER RE-IMPORT

[user@Big-IP:Standby:In Sync] ~ # cd /var/config/rest/iapps/
[user@Big-IP:Standby:In Sync] iapps # ls -lrt
total 12
drwxr-xr-x. 2 root root 4096 Aug 29 02:36 RPMS
drwxr-xr-x. 9 restnoded restnoded 4096 Aug 29 02:36 f5-appsvcs
drwxr-xr-x. 4 restnoded restnoded 4096 Aug 29 04:23 f5-cloud-failover ==> This folder was missing

Route tables not updated if they are in a different subscription than BIG-IP

Issue description

Using CFE 1.2 I can only update RouteTables that are in the same subscription as BIG-IP. This is a problem for 2 reasons

  1. In a recommended hub and spoke model, the peered hub and spoke VNETs will often be in different subscriptions. In this case, BIG-IP is in hub VNET, and I need to update a UDR for response traffic in the spoke VNET. Both subscriptions are in same tenant.
  2. In the Readme docs for the failover ARM templates, we call out that UDR's will be updated across multiple subscriptions.

When failover occurs, UDRs will be updated across multiple Azure subscriptions.

Troubleshooting done

  • I have tested failover with CFE and can only get UDR's to update when they are in the same subscription as BIG-IP VM's.
  • I tested CFE 1.1 and 1.2
  • I ensured that the BIG-IP ManagedIdentities had Contributor role in the RG from the subscription in which the UDR was created, ie. BIG-IP in hub has Contributor role in RG in spoke.
  • I ensured that UDR's were correctly tagged.
  • restnoded.log shows only the UDR in hub subscription is discovered, whether 1 or multiple UDR's. However UDR's with same tags in spoke subscription is not discovered or updated.

Allow CFE route failover for 0.0.0.0/0

I would like to have CFE to failover for a specific route table where both route appear on 2 different route table. Spun up a pair of F5 (3-NIC). There are external route table and internal route table. Both route table had 0.0.0.0/0 route. Both route table being tagged for route failover for other route.
External route table

10.2.0.0/16 --> local
0.0.0.0/0 --> igw-xxxx
192.0.2.0/24 --> point to Active BIG-IP ENI-1234

Internal route table

10.2.0.0/16 --> local
0.0.0.0/0 --> point to Active BIG-IP ENI-1234

Excerpt of the CFE declaration
"range": "0.0.0.0/0",
"nextHopAddresses": {
"discoveryType": "static",
"items": [
"10.2.1.108",
"10.2.2.228"
]
}

As both route being tagged, during failover, CFE change both route table to point to ENI-1234, which not a desirable outcome. For 0.0.0.0/0 route, only require the internal route table to be failover. Not both.

CFE attempts to connect to AWS S3 buckets in different regions

When you initiate a failover with CFE it will attempt to use S3 buckets in a different region. This leads performance issues in isolated environments and users must leverage VPC Endpoint services as these are region specific with updates taking nearly a minute.

The correct behavior is that the system isolates the API calls to the region that it is located in. For example a system in us-west-2 should not be calling us-west-1, ca-central, us-east-1 etc, but this can be seen in a packet capture.

CFE Version 1.3 validated as having this incorrect behavior.

document f5_cloud_failover_vips attribute for AWS

Hi,
Clouddoc states that for AWS :

For Across Network Topology¶
If provisioning Across Network Topology, you will need to:

Create two sets of tags for Elastic IP addresses:

a special key called f5_cloud_failover_vips that contains a comma-separated list of addresses mapping to a private IP address on each instance in the cluster that the Elastic IP is associated with. For example: 10.0.0.10,10.0.0.11

ASK:

Could you please update API Docs with this parameter ?

Provide sample example for 2x EIP and 4 privates IPs ?

Thanks

Implications of not being able to tag individual routes in AWS

The "Tag the User-Defined routes in AWS" section in
https://github.com/F5Networks/f5-cloud-failover-extension/blob/master/docs/userguide/aws.rst
implies that a route (or routes) in AWS can be tagged individually:
"... you need to tag the route(s) in a route table with a key-value pair that will correspond to the key-value pair in the failoverRoutes.scopingTags section of the CFE declaration".

It turns out, in AWS only a route table object can be assigned tags and individual routes cannot.

We also need to take into account that in AWS "...each tag key must be unique, and each tag key can have only one value" (see https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_multi-value-conditions.html).
As a result, to support implementation of multiple F5 HA clusters within the same VPC, the value assigned to the route table for the key "f5_cloud_failover_label" should not be f5-cluster-specific. In other words, this key/value pair should be considered a switch to enable/disable the CFE-based failover of routes for any HA clusters of F5s within the VPC (e.g. the recommended key:value should be "f5_cloud_failover_label:enabled" rather that, say "f5_cloud_failover_label:bigip_cluster1").

In my opinion this is an important consideration in the context of tagging-strategy part of CFE-based failover implementations and should be documented.

Improved visibility/telemetry data of F5 CFE operations

Enhancement request:

Request to improve visibility/telemetry data of F5 CFE operations whether in a location in AWS or in the F5 /var/log/restnoded/restnoded.log
(the “silly” log verbose method only displays the following when working).

Tue, 26 May 2020 23:25:08 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Tue, 26 May 2020 23:25:09 GMT - info: [f5-cloud-failover] Performing Failover - update
Tue, 26 May 2020 23:25:09 GMT - info: [f5-cloud-failover] Association of Elastic IP addresses not required
Tue, 26 May 2020 23:25:10 GMT - info: [f5-cloud-failover] Addresses reassociated successfully
Tue, 26 May 2020 23:25:10 GMT - info: [f5-cloud-failover] Failover Complete

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.