GithubHelp home page GithubHelp logo

Comments (13)

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

Hmm, I can recall testing this and deliberately choosing the connection state instead of session state. Do you have an URL where the difference between two is explained?

from resource-agents.

liandros avatar liandros commented on June 16, 2024

Apologize for my lack of understanding about the subject.
Trying to understand, I read the following:

iSCSI RFC 3270 speak about session states and connection states as follow:
Connection states for initiator (section 7.1.3) are :
FREE, XPT_WAIT, IN_LOGIN, LOGGED_IN, IN_LOGOUT, LOGOUT_REQUESTED, CLEANUP_WAIT
Session state for initiator (section 7.3.1) are :
FREE , LOGGED_IN , FAILED

Playing with open-iscsi, and simulating a network connection error, I can see
the following secuencial changes of state:

1)With session established:
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN

2)Simulating a network error, and after first timers timeouts:
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED

3)Expiring the replacement_timeout timer:
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FREE

4)After of reestablish the network connection:
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN

Only session states seem to respect the RFC 3270.

Speaking about the iscsi resource monitor, there is no difference between checking
the connection state, or session state, just think that a iscsi resource is healthy
if the connection state is "LOGGED IN", or if the session state is "LOGGED_IN"; and
the resource is bad if the value differs.

In this sense, my initial comment makes no sense. Sorry to consume your time, but
maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use the value
of replacement_timeout, properly configured for each user, ( and based on the meaning
that gives the RFC: last timer) to end the loop caused when using $OCF_RESKEY_try_recovery.

In this case, because the absence of a connection state change when the timer replacement_timeout
expires, we can only use the session state, rather than the connection state value.
What do you think?

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

On Fri, Nov 09, 2012 at 04:47:18AM -0800, Leandro Santinelli wrote:

Apologize for my lack of understanding about the subject.

No need to apologize, it's always good to discuss matters which
are not clear.

the resource is bad if the value differs.

Right.

maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use the value
of replacement_timeout, properly configured for each user, ( and based on the meaning
that gives the RFC: last timer) to end the loop caused when using $OCF_RESKEY_try_recovery.

So, you think that we should read the replacement_timeout from
the configuration and then terminate the loop once
replacement_timeout expired? That may make code rather complex.
I'm not sure if it's worth it.

In this case, because the absence of a connection state change when the timer replacement_timeout
expires, we can only use the session state, rather than the connection state value.
What do you think?

I didn't experiment with replacement_timeout. If it expires, is
the state (whichever of the two) then set to something from which
one can recognize that recovery will not be attempted anymore?

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

If the user wants to use replacement_timeout why would they want a longer timeout on the same iscsi disk monitor?

from resource-agents.

liandros avatar liandros commented on June 16, 2024

maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use
the value
of replacement_timeout, properly configured for each user, ( and based
on the meaning
that gives the RFC: last timer) to end the loop caused when using
$OCF_RESKEY_try_recovery.

So, you think that we should read the replacement_timeout from
the configuration and then terminate the loop once
replacement_timeout expired? That may make code rather complex.
I'm not sure if it's worth it.

No. We no need to check the value of replacement_timeout. That is the job
of iscsi daemon.
Our task is only to check the session state change.

In this case, because the absence of a connection state change when the
timer replacement_timeout

expires, we can only use the session state, rather than the connection
state value.
What do you think?

I didn't experiment with replacement_timeout. If it expires, is
the state (whichever of the two) then set to something from which
one can recognize that recovery will not be attempted anymore?

Yes, that's the idea.
When the value of replacement_timeout expires, then the layer "iscsi"
reports to the layer "scsi" that the disc is in the fail state. At this
point it makes no sense to continue with the checks.

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

How does the iscsi know that the disk is definitely in the failed state? It could recover again later. And, as I already mentioned above, if it depends on this particular timeout, it doesn't make sense to have two timeouts for the same concept. BTW, if you can express this in code, we can take a look again, but so far I'm inclined not to do any further processing in this part. It is already quite complex and it is uncertain what would the benefit be.

from resource-agents.

liandros avatar liandros commented on June 16, 2024

iscsi can recover from failed state "FREE", but for practical purposes we can consider that too much time has passed for it to be useful, because of the expiration of the timer replacement_timeout.
Please not want to go with this, and discard it if you think it is not useful. Please examine the follow code:

open_iscsi_status() {
        local target="$1"
        local session_id sess_state outp
        local prev_state msg_logged
        local recov recov_loop

        recov=${2:-$OCF_RESKEY_try_recovery}
        [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov && recov_loop=true
        session_id=`open_iscsi_get_session_id "$target"`
        prev_state=""
        [ -z "$session_id" ] &&
                return 1
        while :; do
                outp=`$iscsiadm -m session -r $session_id -P 1` ||
                        return 2
                sess_state=`echo "$outp" | sed -n '/iSCSI Session State/s/.*: //p'`
                # some drivers don't return session state, in that case
                # we'll assume that we're still connected
                case "$sess_state" in
                        "LOGGED_IN")
                                [ -n "$msg_logged" ] &&
                                        ocf_log info "session state $sess_state. Session restored."
                                return 0;;
                        "FAILED") # iscsi replacement_timeout timer is considered as a last chance
                                [ -z "$recov_loop" ] && return 1 # No wait for recovery
                                [ -z "$msg_logged" ] && msg_logged=1 &&
                                        ocf_log info "session state $sess_state. Waiting for session recovery.";;
                        "FREE") # iscsi replacement_timeout expired. 
                                [ -n "$msg_logged" ] &&
                                        ocf_log info "session state $sess_state. Session no connected."
                                return 1;;
                        "Unknown"|"") # this is also probably OK
                                [ -n "$msg_logged" ] &&
                                        ocf_log info "connection state $sess_state. Session restored."
                                return 0;;
                        *) # session state: FAILED
                                [ -z "$recov_loop" ] &&
                                        ocf_log err "iscsiadm output: $outp" &&
                                        return 2
                        ;;
                esac
                sleep 1
        done
}

have been based on ee1837b version, but I think it will not work using $ OCF_RESKEY_try_recovery = true, when using $ __OCF_ACTION "= monitor.
patch to my code:

    local recov recov_loop

    recov=${2:-$OCF_RESKEY_try_recovery}
-   [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov && recov_loop=true
+   ocf_is_probe && ocf_is_true $recov && recov_loop=true
    session_id=`open_iscsi_get_session_id "$target"`
    prev_state=""
    [ -z "$session_id" ] &&

ee1837b patch version:

@@ -309,7 +309,7 @@
                    ocf_log info "connection state $conn_state. Session restored."
                return 0;;
            *) # failed
-               if [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov; then
+               if ocf_is_probe && ocf_is_true $recov; then
                    if [ "$conn_state" != "$prev_state" ]; then
                        ocf_log warning "connection state $conn_state, waiting for recovery..."
                        prev_state="$conn_state"

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

Before I delve into that code, it looks like you never considered my comment about two different timeouts: #168 (comment)

from resource-agents.

liandros avatar liandros commented on June 16, 2024

2012/11/21 Dejan Muhamedagic [email protected]

Before I delve into that code, it looks like you never considered my
comment about two different timeouts: #168#168 (comment)

Two different timeouts would already be included in the iscsi layer, and
represented by the session state changes: LOGGED_IN -> FAILED, and FAILED
-> FREE. These timeouts are set directly in open-iscsi .

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

Yes, the timeouts are set elsewhere, but please consider the logic. Do we need two timeouts? How is the timeout in the open-iscsi going to help the RA? Finally, we have only one exit code for the timeout.

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

Leandro, I think that we can close this. What do you say?

from resource-agents.

liandros avatar liandros commented on June 16, 2024

Yes please... sorry

2013/7/3 Dejan Muhamedagic [email protected]

Leandro, I think that we can close this. What do you say?


Reply to this email directly or view it on GitHubhttps://github.com//issues/168#issuecomment-20408351
.

from resource-agents.

dmuhamedagic avatar dmuhamedagic commented on June 16, 2024

No problem. Thanks!

from resource-agents.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.