In iscsi : open_iscsi_discovery is always executed.

2012/6/26 Dejan Muhamedagic reply@reply.gi

On Wed, Jun 27, 2012 at 08:51:53AM -0700, liandros wrote: <p dir="aut

2012/7/17 Dejan Muhamedagic reply@reply.gi

On Wed, Jul 18, 2012 at 05:35:41AM -0700, liandros wrote: <p dir="aut

2012/7/18 Dejan Muhamedagic reply@reply.gi

iscsi script: suggestions for state check about resource-agents HOT 8 CLOSED

clusterlabs commented on July 19, 2024

iscsi script: suggestions for state check

from resource-agents.

Comments (8)

dmuhamedagic commented on July 19, 2024

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

As for the patch, I'd rather stuff the big if/fi code into a function, named say discover_portal. Further, the $portal will need to be removed from the status/monitor functions as it's not going to be available.

from resource-agents.

liandros commented on July 19, 2024

2012/6/26 Dejan Muhamedagic
[email protected]:

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

Determine the resource fault for not being able to do the operation
"discovery" would be a mistake
because is not a conclusion determined by the iSCSI layer.
You can check the status of the iSCSI session using the iscsiadm
command (as shown by the patch).

Under what circumstances would I watch this?
Consider a virtual machine dependent on a iSCSI resource. The iSCSI
resource fails
temporarily (overload, or switching to other high availability
hardware). (Note that in
these circumstances the "discovery" operation may fail). The iSCSI
layer will pause the I/O
operations temporarily, but applications in the virtual machine will
not receive the fault, except
for extended timeout. If the iSCSI resource is recovered in the next
few seconds (configurable by the
administrator in /etc/iscsi/iscsid.conf), then nothing will happen
to applications in the virtual machine.
Conversely, if the session is considered dead because of a failure of
the "discovery" operation, the
machine can be configured to turn off (this fails for lack of disk)
or hot migrate (losing the iscsi commands
in queue ). Both of they result in data corruption.

As for the patch, I'd rather stuff the big if/fi code into a function, named say discover_portal. Further, the $portal will need to be removed from the status/monitor functions as it's not going to be available.

Yes.. true, but looking closely I see the following in the big if/fi code :

The code block has related to the operation "status" and "monitor"
that are no longer needed.

When the operation is "stop", it returns $OCF_SUCCESS, regardless
of the value returned by $discovery

Only when the operation is "start", it is necessary to consider the
value returned by $discovery.
What do you think about my suggestion in this new patch?

I do not understand why it should be removed the $portal from the
status/monitor functions. Could be done using
the variable as shown in this new patch.
Sorry by the patch format, The github web iface is not useful.

--- iscsi.orig  2012-06-27 12:21:09.330289384 -0300
+++ iscsi       2012-06-27 12:21:57.714286736 -0300
@@ -257,7 +257,21 @@
        $iscsiadm -m node -p $1 -T $2 -u
 }
 open_iscsi_status() {
-       $iscsiadm -m session 2>/dev/null | grep -qs "$2$"
+       #Agree RFC3720, and open_iscsi, the session transitions are:
+       # FREE -> ACTIVE -> LOGGED_IN -> FAILED -> FREE
+       #There are various configurable timeouts between state transitions 
+       # "LOGGED_IN <-> FAILED" and "FAILED <-> FREE"
+       #We consider the disk lost, if it reach the state "FREE"
+       local session_id=`$iscsiadm -m session 2>/dev/null| \
+               grep -E -s "\b$1,[0-9]+ +$2$"|sed -rn -e 's/.*\[([0-9]+)\].*/\1/p'`
+       [ -z "$session_id" ] && return 1
+       local session_state=`$iscsiadm -m session -r $session_id -P 1| \
+               sed -rn -e 's/.*iSCSI +Session +State *: *([A-Z_]+)/\1/ip'`
+       case $session_state in
+         ACTIVE|LOGGED_IN) return 0;; #All ok
+         FAILED) return 0;; #still there's a chance to recover
+         FREE|*) return 1;;
+       esac
 }

 #
@@ -376,40 +390,30 @@
        exit $OCF_ERR_PERM
 fi

-discovery_type=${OCF_RESKEY_discovery_type}
-udev=${OCF_RESKEY_udev}
-$discovery  # discover and setup the real portal string (address)
-case $? in
-0) ;;
-1) [ "$1" = stop ] && exit $OCF_SUCCESS
-   [ "$1" = monitor ] && exit $OCF_NOT_RUNNING
-   [ "$1" = status ] && exit $LSB_STATUS_STOPPED
-   exit $OCF_ERR_GENERIC
-;;
-2) [ "$1" = stop ] && {
-       iscsi_monitor || exit $OCF_SUCCESS
-   }
-   ocf_is_probe && {
-       iscsi_monitor; exit
-   }
-   exit $OCF_ERR_GENERIC
-;;
-3) ocf_is_probe && exit $OCF_NOT_RUNNING
-   if ! is_iscsid_running; then
-               [ $setup_rc -eq 1 ] &&
-                       ocf_log warning "iscsid.startup probably not correctly set in /etc/iscsi/iscsid.conf"
-               [ "$1" = stop ] && exit $OCF_SUCCESS
-               exit $OCF_ERR_INSTALLED
-   fi
-   exit $OCF_ERR_GENERIC
-;;
-esac
-
 # which method was invoked?
 case "$1" in
-       start)  iscsi_start
+       start)  
+               # discover and setup the real portal string (address)
+               discovery_type=${OCF_RESKEY_discovery_type}
+               udev=${OCF_RESKEY_udev}
+               $discovery || case $? in
+                  1)  exit $OCF_ERR_GENERIC;; #target not found
+                  2)  exit $OCF_ERR_GENERIC;; #target found but can't connect it unambigously
+                  3) # iscsiadm returned error  
+                     if ! is_iscsid_running; then
+                       [ $setup_rc -eq 1 ] &&
+                         ocf_log warning "iscsid.startup probably not correctly set in /etc/iscsi/iscsid.conf"
+                       exit $OCF_ERR_INSTALLED
+                     fi
+                     exit $OCF_ERR_GENERIC
+                     ;;
+               esac
+               echo iscsi_start
        ;;
-       stop)   iscsi_stop
+       stop)   
+               # discover and setup the real portal string (address)
+               $discovery || exit $OCF_SUCCESS 
+               iscsi_stop
        ;;
        status) if iscsi_status
                then

from resource-agents.

dmuhamedagic commented on July 19, 2024

On Wed, Jun 27, 2012 at 08:51:53AM -0700, liandros wrote:

2012/6/26 Dejan Muhamedagic
[email protected]:

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

Determine the resource fault for not being able to do the operation
"discovery" would be a mistake
because is not a conclusion determined by the iSCSI layer.
You can check the status of the iSCSI session using the iscsiadm
command (as shown by the patch).

[...]

case $session_state in
ACTIVE|LOGGED_IN) return 0;; #All ok
FAILED) return 0;; #still there's a chance to recover
FREE|*) return 1;;
esac

Even if the session gets into the FREE state it may recover. At
least that's what I observed here on SLE11SP2.

Returning success in case the session is in the FAILED state
won't do, because the RA cannot lie. What it can do is wait a
bit and see if the state changes.

Now, this may be considered an improvement, but it is certainly
a considerable change in behaviour. I'd suggest to introduce a
try_recovery parameter, which would modify the monitor action
appropriately. The change would be to loop indefinitely in case
the session state is different from "ACTIVE|LOGGED_IN". The loop
would be stopped on monitor action timeout.

As for the discovery, we can move it out of the monitor path.

This should be two different patches.

from resource-agents.

dmuhamedagic commented on July 19, 2024

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

from resource-agents.

liandros commented on July 19, 2024

2012/7/17 Dejan Muhamedagic
[email protected]:

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

thanks, very polite.
My identity: Leandro Santinelli [email protected]
I do not know that part of my suggestion was accepted, please suggest
the commit message.

from resource-agents.

dmuhamedagic commented on July 19, 2024

On Wed, Jul 18, 2012 at 05:35:41AM -0700, liandros wrote:

2012/7/17 Dejan Muhamedagic
[email protected]:

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

thanks, very polite.
My identity: Leandro Santinelli [email protected]
I do not know that part of my suggestion was accepted, please suggest
the commit message.

All of it accepted, just with some modifications. Discovery is
now only in the start operation. The status is expanded to check
the connection status. The recovery will be tried if the
try_recovery parameter is set. If you'll take a look and give it
a try, I'd appreciate. Thanks!

from resource-agents.

liandros commented on July 19, 2024

2012/7/18 Dejan Muhamedagic
[email protected]:

All of it accepted, just with some modifications. Discovery is
now only in the start operation. The status is expanded to check
the connection status. The recovery will be tried if the
try_recovery parameter is set. If you'll take a look and give it
a try, I'd appreciate. Thanks!

I have checked the new code, and works perfectly.
For cosmetic reasons, I think it would be nice a restore message in logs:

@@ -299,8 +299,14 @@
# some drivers don't return connection state, in that case
# we'll assume that we're still connected
case "$conn_state" in

```
      "LOGGED IN") return 0;;
```

      "Unknown"|"") return 0;; # this is also probably OK

```
      "LOGGED IN")
```
```
          [ -n "$msg_logged" ] &&
```

              ocf_log info "connection state $conn_state. Session restored."

```
          return 0;;
```

      "Unknown"|"") # this is also probably OK

```
          [ -n "$msg_logged" ] &&
```

              ocf_log info "connection state $conn_state. Session restored."

          return 0;;
    *) # failed
        if [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true

$recov; then
if [ -z "$msg_logged" ]; then

from resource-agents.

dmuhamedagic commented on July 19, 2024

Patch applied. Many thanks for the contribution! Closing.

from resource-agents.

iscsi script: suggestions for state check about resource-agents HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs