Comments (13)
Hmm, I can recall testing this and deliberately choosing the connection state instead of session state. Do you have an URL where the difference between two is explained?
from resource-agents.
Apologize for my lack of understanding about the subject.
Trying to understand, I read the following:
iSCSI RFC 3270 speak about session states and connection states as follow:
Connection states for initiator (section 7.1.3) are :
FREE, XPT_WAIT, IN_LOGIN, LOGGED_IN, IN_LOGOUT, LOGOUT_REQUESTED, CLEANUP_WAIT
Session state for initiator (section 7.3.1) are :
FREE , LOGGED_IN , FAILED
Playing with open-iscsi, and simulating a network connection error, I can see
the following secuencial changes of state:
1)With session established:
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
2)Simulating a network error, and after first timers timeouts:
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
3)Expiring the replacement_timeout timer:
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FREE
4)After of reestablish the network connection:
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Only session states seem to respect the RFC 3270.
Speaking about the iscsi resource monitor, there is no difference between checking
the connection state, or session state, just think that a iscsi resource is healthy
if the connection state is "LOGGED IN", or if the session state is "LOGGED_IN"; and
the resource is bad if the value differs.
In this sense, my initial comment makes no sense. Sorry to consume your time, but
maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use the value
of replacement_timeout, properly configured for each user, ( and based on the meaning
that gives the RFC: last timer) to end the loop caused when using $OCF_RESKEY_try_recovery.
In this case, because the absence of a connection state change when the timer replacement_timeout
expires, we can only use the session state, rather than the connection state value.
What do you think?
from resource-agents.
On Fri, Nov 09, 2012 at 04:47:18AM -0800, Leandro Santinelli wrote:
Apologize for my lack of understanding about the subject.
No need to apologize, it's always good to discuss matters which
are not clear.
the resource is bad if the value differs.
Right.
maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use the value
of replacement_timeout, properly configured for each user, ( and based on the meaning
that gives the RFC: last timer) to end the loop caused when using $OCF_RESKEY_try_recovery.
So, you think that we should read the replacement_timeout from
the configuration and then terminate the loop once
replacement_timeout expired? That may make code rather complex.
I'm not sure if it's worth it.
In this case, because the absence of a connection state change when the timer replacement_timeout
expires, we can only use the session state, rather than the connection state value.
What do you think?
I didn't experiment with replacement_timeout. If it expires, is
the state (whichever of the two) then set to something from which
one can recognize that recovery will not be attempted anymore?
from resource-agents.
If the user wants to use replacement_timeout why would they want a longer timeout on the same iscsi disk monitor?
from resource-agents.
maybe as I mentioned, we can think of to give a better utility when using
$OCF_RESKEY_try_recovery = true, and according as mentioned, you can use
the value
of replacement_timeout, properly configured for each user, ( and based
on the meaning
that gives the RFC: last timer) to end the loop caused when using
$OCF_RESKEY_try_recovery.So, you think that we should read the replacement_timeout from
the configuration and then terminate the loop once
replacement_timeout expired? That may make code rather complex.
I'm not sure if it's worth it.No. We no need to check the value of replacement_timeout. That is the job
of iscsi daemon.
Our task is only to check the session state change.In this case, because the absence of a connection state change when the
timer replacement_timeoutexpires, we can only use the session state, rather than the connection
state value.
What do you think?I didn't experiment with replacement_timeout. If it expires, is
the state (whichever of the two) then set to something from which
one can recognize that recovery will not be attempted anymore?
Yes, that's the idea.
When the value of replacement_timeout expires, then the layer "iscsi"
reports to the layer "scsi" that the disc is in the fail state. At this
point it makes no sense to continue with the checks.
from resource-agents.
How does the iscsi know that the disk is definitely in the failed state? It could recover again later. And, as I already mentioned above, if it depends on this particular timeout, it doesn't make sense to have two timeouts for the same concept. BTW, if you can express this in code, we can take a look again, but so far I'm inclined not to do any further processing in this part. It is already quite complex and it is uncertain what would the benefit be.
from resource-agents.
iscsi can recover from failed state "FREE", but for practical purposes we can consider that too much time has passed for it to be useful, because of the expiration of the timer replacement_timeout.
Please not want to go with this, and discard it if you think it is not useful. Please examine the follow code:
open_iscsi_status() {
local target="$1"
local session_id sess_state outp
local prev_state msg_logged
local recov recov_loop
recov=${2:-$OCF_RESKEY_try_recovery}
[ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov && recov_loop=true
session_id=`open_iscsi_get_session_id "$target"`
prev_state=""
[ -z "$session_id" ] &&
return 1
while :; do
outp=`$iscsiadm -m session -r $session_id -P 1` ||
return 2
sess_state=`echo "$outp" | sed -n '/iSCSI Session State/s/.*: //p'`
# some drivers don't return session state, in that case
# we'll assume that we're still connected
case "$sess_state" in
"LOGGED_IN")
[ -n "$msg_logged" ] &&
ocf_log info "session state $sess_state. Session restored."
return 0;;
"FAILED") # iscsi replacement_timeout timer is considered as a last chance
[ -z "$recov_loop" ] && return 1 # No wait for recovery
[ -z "$msg_logged" ] && msg_logged=1 &&
ocf_log info "session state $sess_state. Waiting for session recovery.";;
"FREE") # iscsi replacement_timeout expired.
[ -n "$msg_logged" ] &&
ocf_log info "session state $sess_state. Session no connected."
return 1;;
"Unknown"|"") # this is also probably OK
[ -n "$msg_logged" ] &&
ocf_log info "connection state $sess_state. Session restored."
return 0;;
*) # session state: FAILED
[ -z "$recov_loop" ] &&
ocf_log err "iscsiadm output: $outp" &&
return 2
;;
esac
sleep 1
done
}
have been based on ee1837b version, but I think it will not work using $ OCF_RESKEY_try_recovery = true, when using $ __OCF_ACTION "= monitor.
patch to my code:
local recov recov_loop
recov=${2:-$OCF_RESKEY_try_recovery}
- [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov && recov_loop=true
+ ocf_is_probe && ocf_is_true $recov && recov_loop=true
session_id=`open_iscsi_get_session_id "$target"`
prev_state=""
[ -z "$session_id" ] &&
ee1837b patch version:
@@ -309,7 +309,7 @@
ocf_log info "connection state $conn_state. Session restored."
return 0;;
*) # failed
- if [ "$__OCF_ACTION" != stop ] && ! ocf_is_probe && ocf_is_true $recov; then
+ if ocf_is_probe && ocf_is_true $recov; then
if [ "$conn_state" != "$prev_state" ]; then
ocf_log warning "connection state $conn_state, waiting for recovery..."
prev_state="$conn_state"
from resource-agents.
Before I delve into that code, it looks like you never considered my comment about two different timeouts: #168 (comment)
from resource-agents.
2012/11/21 Dejan Muhamedagic [email protected]
Before I delve into that code, it looks like you never considered my
comment about two different timeouts: #168#168 (comment)Two different timeouts would already be included in the iscsi layer, and
represented by the session state changes: LOGGED_IN -> FAILED, and FAILED
-> FREE. These timeouts are set directly in open-iscsi .
from resource-agents.
Yes, the timeouts are set elsewhere, but please consider the logic. Do we need two timeouts? How is the timeout in the open-iscsi going to help the RA? Finally, we have only one exit code for the timeout.
from resource-agents.
Leandro, I think that we can close this. What do you say?
from resource-agents.
Yes please... sorry
2013/7/3 Dejan Muhamedagic [email protected]
Leandro, I think that we can close this. What do you say?
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/168#issuecomment-20408351
.
from resource-agents.
No problem. Thanks!
from resource-agents.
Related Issues (20)
- nothing provides /bin/ps needed by resource-agents-4.11.0 HOT 1
- WARNING: Can't get <node-name> xlog location. HOT 6
- ZFS promotion not working HOT 10
- Occasional false positive "down" reports from IPv6addr "monitor" action
- ZFS can't migrate to other node (cannot open pool: no such pool) HOT 2
- ERROR: LXC container name not set! HOT 23
- How to use the parameter of monitor_script?
- Unable to get metadata for resource agent 'stonith:fence_watchdog' (SyntaxError:JSON.parse:unexpected character at line 1) HOT 2
- master-pgsql attribute disappear HOT 1
- AWS Pacemaker awsvip failing with different errors HOT 4
- Resource agent - AWS Lambda support HOT 2
- Postfix RA continuously fails validate check HOT 1
- iSCSITarget - don't create default portal HOT 4
- resource-agents/heartbeat/ZFS - '-f' to option HOT 1
- "ocf : heartbeat : docker" does not exists in resource-agent v4.10 HOT 1
- How can I create a galera resource with two nodes?
- Filesystem in RHEL9.3 takes considerably longer to complete its stop operation compared to RHEL9.2. HOT 4
- Discusses the VirtualDomain_start function HOT 5
- fails to build on Arch Linux HOT 2
- ocf : heartbeat : pgsql - 3-node PG HA Cluster with async streaming replication HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from resource-agents.