GithubHelp home page GithubHelp logo

Comments (8)

dmuhamedagic avatar dmuhamedagic commented on July 19, 2024

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

As for the patch, I'd rather stuff the big if/fi code into a function, named say discover_portal. Further, the $portal will need to be removed from the status/monitor functions as it's not going to be available.

from resource-agents.

liandros avatar liandros commented on July 19, 2024

2012/6/26 Dejan Muhamedagic
[email protected]:

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

Determine the resource fault for not being able to do the operation
"discovery" would be a mistake
because is not a conclusion determined by the iSCSI layer.
You can check the status of the iSCSI session using the iscsiadm
command (as shown by the patch).

Under what circumstances would I watch this?
Consider a virtual machine dependent on a iSCSI resource. The iSCSI
resource fails
temporarily (overload, or switching to other high availability
hardware). (Note that in
these circumstances the "discovery" operation may fail). The iSCSI
layer will pause the I/O
operations temporarily, but applications in the virtual machine will
not receive the fault, except
for extended timeout. If the iSCSI resource is recovered in the next
few seconds (configurable by the
administrator in /etc/iscsi/iscsid.conf), then nothing will happen
to applications in the virtual machine.
Conversely, if the session is considered dead because of a failure of
the "discovery" operation, the
machine can be configured to turn off (this fails for lack of disk)
or hot migrate (losing the iscsi commands
in queue ). Both of they result in data corruption.

As for the patch, I'd rather stuff the big if/fi code into a function, named say discover_portal. Further, the $portal will need to be removed from the status/monitor functions as it's not going to be available.

Yes.. true, but looking closely I see the following in the big if/fi code :

  • The code block has related to the operation "status" and "monitor"
    that are no longer needed.
  • When the operation is "stop", it returns $OCF_SUCCESS, regardless
    of the value returned by $discovery
  • Only when the operation is "start", it is necessary to consider the
    value returned by $discovery.
    What do you think about my suggestion in this new patch?

I do not understand why it should be removed the $portal from the
status/monitor functions. Could be done using
the variable as shown in this new patch.
Sorry by the patch format, The github web iface is not useful.

--- iscsi.orig  2012-06-27 12:21:09.330289384 -0300
+++ iscsi       2012-06-27 12:21:57.714286736 -0300
@@ -257,7 +257,21 @@
        $iscsiadm -m node -p $1 -T $2 -u
 }
 open_iscsi_status() {
-       $iscsiadm -m session 2>/dev/null | grep -qs "$2$"
+       #Agree RFC3720, and open_iscsi, the session transitions are:
+       # FREE -> ACTIVE -> LOGGED_IN -> FAILED -> FREE
+       #There are various configurable timeouts between state transitions 
+       # "LOGGED_IN <-> FAILED" and "FAILED <-> FREE"
+       #We consider the disk lost, if it reach the state "FREE"
+       local session_id=`$iscsiadm -m session 2>/dev/null| \
+               grep -E -s "\b$1,[0-9]+ +$2$"|sed -rn -e 's/.*\[([0-9]+)\].*/\1/p'`
+       [ -z "$session_id" ] && return 1
+       local session_state=`$iscsiadm -m session -r $session_id -P 1| \
+               sed -rn -e 's/.*iSCSI +Session +State *: *([A-Z_]+)/\1/ip'`
+       case $session_state in
+         ACTIVE|LOGGED_IN) return 0;; #All ok
+         FAILED) return 0;; #still there's a chance to recover
+         FREE|*) return 1;;
+       esac
 }

 #
@@ -376,40 +390,30 @@
        exit $OCF_ERR_PERM
 fi

-discovery_type=${OCF_RESKEY_discovery_type}
-udev=${OCF_RESKEY_udev}
-$discovery  # discover and setup the real portal string (address)
-case $? in
-0) ;;
-1) [ "$1" = stop ] && exit $OCF_SUCCESS
-   [ "$1" = monitor ] && exit $OCF_NOT_RUNNING
-   [ "$1" = status ] && exit $LSB_STATUS_STOPPED
-   exit $OCF_ERR_GENERIC
-;;
-2) [ "$1" = stop ] && {
-       iscsi_monitor || exit $OCF_SUCCESS
-   }
-   ocf_is_probe && {
-       iscsi_monitor; exit
-   }
-   exit $OCF_ERR_GENERIC
-;;
-3) ocf_is_probe && exit $OCF_NOT_RUNNING
-   if ! is_iscsid_running; then
-               [ $setup_rc -eq 1 ] &&
-                       ocf_log warning "iscsid.startup probably not correctly set in /etc/iscsi/iscsid.conf"
-               [ "$1" = stop ] && exit $OCF_SUCCESS
-               exit $OCF_ERR_INSTALLED
-   fi
-   exit $OCF_ERR_GENERIC
-;;
-esac
-
 # which method was invoked?
 case "$1" in
-       start)  iscsi_start
+       start)  
+               # discover and setup the real portal string (address)
+               discovery_type=${OCF_RESKEY_discovery_type}
+               udev=${OCF_RESKEY_udev}
+               $discovery || case $? in
+                  1)  exit $OCF_ERR_GENERIC;; #target not found
+                  2)  exit $OCF_ERR_GENERIC;; #target found but can't connect it unambigously
+                  3) # iscsiadm returned error  
+                     if ! is_iscsid_running; then
+                       [ $setup_rc -eq 1 ] &&
+                         ocf_log warning "iscsid.startup probably not correctly set in /etc/iscsi/iscsid.conf"
+                       exit $OCF_ERR_INSTALLED
+                     fi
+                     exit $OCF_ERR_GENERIC
+                     ;;
+               esac
+               echo iscsi_start
        ;;
-       stop)   iscsi_stop
+       stop)   
+               # discover and setup the real portal string (address)
+               $discovery || exit $OCF_SUCCESS 
+               iscsi_stop
        ;;
        status) if iscsi_status
                then

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on July 19, 2024

On Wed, Jun 27, 2012 at 08:51:53AM -0700, liandros wrote:

2012/6/26 Dejan Muhamedagic
[email protected]:

Sounds plausible. Under which circumstances did you observe this? How do you test the iscsi session recovery?

Determine the resource fault for not being able to do the operation
"discovery" would be a mistake
because is not a conclusion determined by the iSCSI layer.
You can check the status of the iSCSI session using the iscsiadm
command (as shown by the patch).

[...]

case $session_state in
ACTIVE|LOGGED_IN) return 0;; #All ok
FAILED) return 0;; #still there's a chance to recover
FREE|*) return 1;;
esac

Even if the session gets into the FREE state it may recover. At
least that's what I observed here on SLE11SP2.

Returning success in case the session is in the FAILED state
won't do, because the RA cannot lie. What it can do is wait a
bit and see if the state changes.

Now, this may be considered an improvement, but it is certainly
a considerable change in behaviour. I'd suggest to introduce a
try_recovery parameter, which would modify the monitor action
appropriately. The change would be to loop indefinitely in case
the session state is different from "ACTIVE|LOGGED_IN". The loop
would be stopped on monitor action timeout.

As for the discovery, we can move it out of the monitor path.

This should be two different patches.

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on July 19, 2024

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

from resource-agents.

liandros avatar liandros commented on July 19, 2024

2012/7/17 Dejan Muhamedagic
[email protected]:

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

thanks, very polite.
My identity: Leandro Santinelli [email protected]
I do not know that part of my suggestion was accepted, please suggest
the commit message.

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on July 19, 2024

On Wed, Jul 18, 2012 at 05:35:41AM -0700, liandros wrote:

2012/7/17 Dejan Muhamedagic
[email protected]:

I did some patches in the meantime, the road to recovery implementation should be clear now. I'll write the patch for that too, but would like to give you credit. Can you please send me whatever you want to be put into the commit message (name, email address, etc). Thanks!

thanks, very polite.
My identity: Leandro Santinelli [email protected]
I do not know that part of my suggestion was accepted, please suggest
the commit message.

All of it accepted, just with some modifications. Discovery is
now only in the start operation. The status is expanded to check
the connection status. The recovery will be tried if the
try_recovery parameter is set. If you'll take a look and give it
a try, I'd appreciate. Thanks!

from resource-agents.

liandros avatar liandros commented on July 19, 2024

2012/7/18 Dejan Muhamedagic
[email protected]:

All of it accepted, just with some modifications. Discovery is
now only in the start operation. The status is expanded to check
the connection status. The recovery will be tried if the
try_recovery parameter is set. If you'll take a look and give it
a try, I'd appreciate. Thanks!

I have checked the new code, and works perfectly.
For cosmetic reasons, I think it would be nice a restore message in logs:

@@ -299,8 +299,14 @@
# some drivers don't return connection state, in that case
# we'll assume that we're still connected
case "$conn_state" in

  •       "LOGGED IN") return 0;;
    
  •       "Unknown"|"") return 0;; # this is also probably OK
    
  •       "LOGGED IN")
    
  •           [ -n "$msg_logged" ] &&
    
  •               ocf_log info "connection state $conn_state. Session restored."
    
  •           return 0;;
    
  •       "Unknown"|"") # this is also probably OK
    
  •           [ -n "$msg_logged" ] &&
    
  •               ocf_log info "connection state $conn_state. Session restored."
    
  •           return 0;;
        *) # failed
            if [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true
    
    $recov; then
    if [ -z "$msg_logged" ]; then

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on July 19, 2024

Patch applied. Many thanks for the contribution! Closing.

from resource-agents.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.