mysql-master-ha
bmildren / mysql-master-ha Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/mysql-master-ha
Automatically exported from code.google.com/p/mysql-master-ha
mysql-master-ha
What steps will reproduce the problem?
1.Stop original master
2. while the mha monitor is electing slave A to master power down the slave A
3.check the log
-------------------
What is the expected output? What do you see instead?
Not sure if this behaviour is by design, but i would expect that the manger
when it detects that the slave is not reachable via ssh would try another slave
( my test environment is 1 master and 3 slaves )
What version of the product are you using? On what operating system?
Linux ubuntu 12.04 - mha-5.3
Please provide any additional information below.
Please see the log below At this line "Fri Dec 7 17:05:33 2012 - [warning]
HealthCheck: SSH to ip-10-0-1-248 is NOT reachable." manger know that the
elected master is not reachable and fail the switch. ( make sense to make a
second check ? )
Thanks
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec 7 17:03:44 2012 - [debug] SSH check command: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/var/log/mysql
--output_file=/var/log/masterha/app1/save_binary_logs_test
--manager_version=0.53 --binlog_prefix=mysql-bin --debug
Fri Dec 7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec 7 17:03:44 2012 - [warning] secondary_check_script is not defined. It
is highly recommended setting it to check master reachability from two or more
routes.
Fri Dec 7 17:03:44 2012 - [info] Starting ping health check on
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec 7 17:03:44 2012 - [debug] Connected on master.
Fri Dec 7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec 7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL
doesn't respond..
root@ip-10-0-1-45:/var/log/masterha# tail -f app1.log
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec 7 17:03:44 2012 - [debug] Disconnected from
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec 7 17:03:44 2012 - [debug] SSH check command: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/var/log/mysql
--output_file=/var/log/masterha/app1/save_binary_logs_test
--manager_version=0.53 --binlog_prefix=mysql-bin --debug
Fri Dec 7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec 7 17:03:44 2012 - [warning] secondary_check_script is not defined. It
is highly recommended setting it to check master reachability from two or more
routes.
Fri Dec 7 17:03:44 2012 - [info] Starting ping health check on
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec 7 17:03:44 2012 - [debug] Connected on master.
Fri Dec 7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec 7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL
doesn't respond..
Fri Dec 7 17:05:17 2012 - [warning] Got error on MySQL select ping: 2006
(MySQL server has gone away)
Fri Dec 7 17:05:17 2012 - [info] Executing SSH check script: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/var/log/mysql
--output_file=/var/log/masterha/app1/save_binary_logs_test
--manager_version=0.53 --binlog_prefix=mysql-bin --debug
Fri Dec 7 17:05:18 2012 - [info] HealthCheck: SSH to ip-10-0-1-149 is
reachable.
Fri Dec 7 17:05:20 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.1.149' (111))
Fri Dec 7 17:05:20 2012 - [warning] Connection failed 1 time(s)..
Fri Dec 7 17:05:23 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.1.149' (111))
Fri Dec 7 17:05:23 2012 - [warning] Connection failed 2 time(s)..
Fri Dec 7 17:05:26 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.0.1.149' (111))
Fri Dec 7 17:05:26 2012 - [warning] Connection failed 3 time(s)..
Fri Dec 7 17:05:26 2012 - [warning] Master is not reachable from health
checker!
Fri Dec 7 17:05:26 2012 - [warning] Master ip-10-0-1-149(10.0.1.149:3306) is
not reachable!
Fri Dec 7 17:05:26 2012 - [warning] SSH is reachable.
Fri Dec 7 17:05:26 2012 - [info] Connecting to a master server failed. Reading
configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and
trying to connect to all servers to check server status..
Fri Dec 7 17:05:26 2012 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Fri Dec 7 17:05:26 2012 - [info] Reading application default configurations
from /etc/app1.cnf..
Fri Dec 7 17:05:26 2012 - [info] Reading server configurations from
/etc/app1.cnf..
Fri Dec 7 17:05:26 2012 - [debug] Skipping connecting to dead master
ip-10-0-1-149(10.0.1.149:3306).
Fri Dec 7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec 7 17:05:26 2012 - [debug] Connected to:
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec 7 17:05:26 2012 - [debug] Connected to: ip-10-0-1-49(10.0.1.49:3306),
user=root
Fri Dec 7 17:05:26 2012 - [debug] Connected to:
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec 7 17:05:26 2012 - [debug] Comparing MySQL versions..
Fri Dec 7 17:05:26 2012 - [debug] Comparing MySQL versions done.
Fri Dec 7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec 7 17:05:26 2012 - [info] Dead Servers:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] Alive Servers:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Fri Dec 7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] Checking slave configurations..
Fri Dec 7 17:05:26 2012 - [info] read_only=1 is not set on slave
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec 7 17:05:26 2012 - [warning] relay_log_purge=0 is not set on slave
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec 7 17:05:26 2012 - [info] read_only=1 is not set on slave
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec 7 17:05:26 2012 - [warning] relay_log_purge=0 is not set on slave
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec 7 17:05:26 2012 - [info] read_only=1 is not set on slave
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec 7 17:05:26 2012 - [warning] relay_log_purge=0 is not set on slave
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec 7 17:05:26 2012 - [info] Checking replication filtering settings..
Fri Dec 7 17:05:26 2012 - [info] Replication filtering check ok.
Fri Dec 7 17:05:26 2012 - [info] Master is down!
Fri Dec 7 17:05:26 2012 - [info] Terminating monitoring script.
Fri Dec 7 17:05:26 2012 - [info] Got exit code 20 (Master dead).
Fri Dec 7 17:05:26 2012 - [info] MHA::MasterFailover version 0.53.
Fri Dec 7 17:05:26 2012 - [info] Starting master failover.
Fri Dec 7 17:05:26 2012 - [info]
Fri Dec 7 17:05:26 2012 - [info] * Phase 1: Configuration Check Phase..
Fri Dec 7 17:05:26 2012 - [info]
Fri Dec 7 17:05:26 2012 - [debug] Skipping connecting to dead master
ip-10-0-1-149.
Fri Dec 7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec 7 17:05:26 2012 - [debug] Connected to:
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec 7 17:05:26 2012 - [debug] Connected to: ip-10-0-1-49(10.0.1.49:3306),
user=root
Fri Dec 7 17:05:26 2012 - [debug] Connected to:
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec 7 17:05:26 2012 - [debug] Comparing MySQL versions..
Fri Dec 7 17:05:26 2012 - [debug] Comparing MySQL versions done.
Fri Dec 7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec 7 17:05:26 2012 - [info] Dead Servers:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] Checking master reachability via mysql(double
check)..
Fri Dec 7 17:05:26 2012 - [info] ok.
Fri Dec 7 17:05:26 2012 - [info] Alive Servers:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Fri Dec 7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:26 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:26 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:26 2012 - [info] ** Phase 1: Configuration Check Phase
completed.
Fri Dec 7 17:05:26 2012 - [info]
Fri Dec 7 17:05:26 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Dec 7 17:05:26 2012 - [info]
Fri Dec 7 17:05:26 2012 - [debug] Stopping IO thread on
ip-10-0-1-248(10.0.1.248:3306)..
Fri Dec 7 17:05:26 2012 - [debug] Stopping IO thread on
ip-10-0-1-49(10.0.1.49:3306)..
Fri Dec 7 17:05:26 2012 - [debug] Stop IO thread on
ip-10-0-1-248(10.0.1.248:3306) done.
Fri Dec 7 17:05:26 2012 - [info] Forcing shutdown so that applications never
connect to the current master..
Fri Dec 7 17:05:26 2012 - [debug] Stopping IO thread on
ip-10-0-1-171(10.0.1.171:3306)..
Fri Dec 7 17:05:26 2012 - [debug] Stop IO thread on
ip-10-0-1-49(10.0.1.49:3306) done.
Fri Dec 7 17:05:26 2012 - [info] Executing master IP deactivatation script:
Fri Dec 7 17:05:26 2012 - [info] /opt/scripts/master_ip_failover
--orig_master_host=ip-10-0-1-149 --orig_master_ip=10.0.1.149
--orig_master_port=3306 --command=stopssh --ssh_user=root
Fri Dec 7 17:05:26 2012 - [debug] Stop IO thread on
ip-10-0-1-171(10.0.1.171:3306) done.
Fri Dec 7 17:05:27 2012 - [info] done.
Fri Dec 7 17:05:27 2012 - [warning] shutdown_script is not set. Skipping
explicit shutting down of the dead master.
Fri Dec 7 17:05:27 2012 - [info] * Phase 2: Dead Master Shutdown Phase
completed.
Fri Dec 7 17:05:27 2012 - [info]
Fri Dec 7 17:05:27 2012 - [info] * Phase 3: Master Recovery Phase..
Fri Dec 7 17:05:27 2012 - [info]
Fri Dec 7 17:05:27 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Dec 7 17:05:27 2012 - [info]
Fri Dec 7 17:05:27 2012 - [debug] Fetching current slave status..
Fri Dec 7 17:05:27 2012 - [debug] Fetching current slave status done.
Fri Dec 7 17:05:27 2012 - [info] The latest binary log file/position on all
slaves is mysql-bin.000009:82776781
Fri Dec 7 17:05:27 2012 - [info] Latest slaves (Slaves that received relay log
files to the latest):
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info] The oldest binary log file/position on all
slaves is mysql-bin.000009:82776781
Fri Dec 7 17:05:27 2012 - [info] Oldest slaves:
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-248(10.0.1.248:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:27 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:27 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:27 2012 - [info]
Fri Dec 7 17:05:27 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog
Phase..
Fri Dec 7 17:05:27 2012 - [info]
Fri Dec 7 17:05:27 2012 - [info] Fetching dead master's binary logs..
Fri Dec 7 17:05:27 2012 - [info] Executing command on the dead master
ip-10-0-1-149(10.0.1.149:3306): save_binary_logs --command=save
--start_file=mysql-bin.000009 --start_pos=82776781 --binlog_dir=/var/log/mysql
--output_file=/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306
_20121207170526.binlog --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.53 --debug
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from mysql-bin.000009 pos 82776781 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog ..
parse_init_headers: file=mysql-bin.000009 event_type=15 server_id=10 length=103
nextmpos=107 prevrelay=4 cur(post)relay=107
parse_init_headers: file=mysql-bin.000009 event_type=2 server_id=10 length=78
nextmpos=185 prevrelay=107 cur(post)relay=185
Dumping binlog format description event, from position 0 to 107.. ok.
Dumping effective binlog data from /var/log/mysql/mysql-bin.000009 position 82776781 to tail(82777069).. ok.
parse_init_headers:
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog
event_type=15 server_id=10 length=103 nextmpos=107 prevrelay=4
cur(post)relay=107
parse_init_headers:
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog
event_type=2 server_id=10 length=78 nextmpos=82776859 prevrelay=107
cur(post)relay=185
Concat succeeded.
Fri Dec 7 17:05:29 2012 - [info] scp from
[email protected]:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_33
06_20121207170526.binlog to
local:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_2012120
7170526.binlog succeeded.
Fri Dec 7 17:05:33 2012 - [warning] HealthCheck: SSH to ip-10-0-1-248 is NOT
reachable.
Fri Dec 7 17:05:34 2012 - [info] HealthCheck: SSH to ip-10-0-1-49 is reachable.
Fri Dec 7 17:05:35 2012 - [info] HealthCheck: SSH to ip-10-0-1-171 is
reachable.
Fri Dec 7 17:05:35 2012 - [info]
Fri Dec 7 17:05:35 2012 - [info] * Phase 3.3: Determining New Master Phase..
Fri Dec 7 17:05:35 2012 - [info]
Fri Dec 7 17:05:35 2012 - [info] Finding the latest slave that has all relay
logs for recovering other slaves..
Fri Dec 7 17:05:35 2012 - [info] All slaves received relay logs to the same
position. No need to resync each other.
Fri Dec 7 17:05:35 2012 - [info] Dead Servers:
Fri Dec 7 17:05:35 2012 - [info] ip-10-0-1-149(10.0.1.149:3306)
Fri Dec 7 17:05:35 2012 - [info] ip-10-0-1-248(10.0.1.248:3306) Not
reachable via SSH Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version
between slaves) log-bin:enabled
Fri Dec 7 17:05:35 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:35 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:35 2012 - [info] Alive Slaves:
Fri Dec 7 17:05:35 2012 - [info] ip-10-0-1-49(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:35 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:35 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:35 2012 - [info] ip-10-0-1-171(10.0.1.171:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Fri Dec 7 17:05:35 2012 - [debug] Relay log info repository: FILE
Fri Dec 7 17:05:35 2012 - [info] Replicating from
10.0.1.149(10.0.1.149:3306)
Fri Dec 7 17:05:35 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln443] Server
ip-10-0-1-248(10.0.1.248:3306) is dead, but must be alive! Check server
settings.
Fri Dec 7 17:05:35 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/ManagerUtil.pm, ln178] Got ERROR: at
/usr/local/share/perl/5.14.2/MHA/MasterFailover.pm line 1456
Fri Dec 7 17:05:35 2012 - [debug] Disconnected from
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec 7 17:05:35 2012 - [debug] Disconnected from
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec 7 17:05:35 2012 - [info]
----- Failover Report -----
app1: MySQL Master failover ip-10-0-1-149
Master ip-10-0-1-149 is down!
Check MHA Manager logs at ip-10-0-1-45:/var/log/masterha/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on ip-10-0-1-149.
The latest slave ip-10-0-1-248(10.0.1.248:3306) has all relay logs for recovery.
Got Error so couldn't continue failover from here.
_
Andrea Ceresoni
Original issue reported on code.google.com by [email protected]
on 7 Dec 2012 at 5:26
Could you please recommend/share BEST my.cnf settings for your script?!
like log-bin and etc.
Original issue reported on code.google.com by [email protected]
on 29 Feb 2012 at 11:54
The example code on
http://code.google.com/p/mysql-master-ha/wiki/Using_With_Clustering_Software
currently reads:
rc=`masterha_master_switch --master_state=dead --interactive=0 --wait_on_failover_error=0 --dead_master_host=host1 --new_master_host=host2`
exit $rc
The `` operator actually doesn't return the exit code of the enclosed command
but its stdout output. As long as the output is actually empty the code above
works as expected, returning the exit status of the executed code as "exit $rc"
effectively becomes just "exit" and so returns the exit status of the previous
command.
As soon as the command in backticks actually returns text the result will
become this instead though:
bash: exit: some_text: numeric argument required
and the actual exit status will always be "2" for "incorrect use of builtin
shell argument" even if the actual code in backticks executed successfully
So the right code should actually be just
start)
`...`
exit
to return the exit status, or maybe using "exit $?" instead of just "exit"
to make it more explicit. The rc=... assignment on the other hand can be
removed completely
Or am i missing something in the original code?
Original issue reported on code.google.com by [email protected]
on 19 Apr 2012 at 12:45
What steps will reproduce the problem?
1. Create a MHA config (/etc/mha_cluster_config) with some nodes specified with
no_master=1
2. Run SSH check as follows: masterha_check_ssh --conf=/etc/mha_cluster_config
3. When you check the output of masterha_check_ssh you will see that it is
trying to check SSH connection from those nodes as well which have no_master=1
What is the expected output? What do you see instead?
According to how MHA works, SSH connection originating from only those hosts
are needed to work which ever have the possibility of becoming a master,
because they need to transfer differential relay logs to other slaves. However,
when a node is defined as no_master=1, we are specifically asking MHA to make
sure that this particular node is never considered for a master role, and hence
we do not need to check if nodes if no_master=1 can connect to other other
hosts. I suggest that when MHA checks SSH connection it should only try to test
to make sure that the candidate master nodes can connect to all the other nodes.
What version of the product are you using? On what operating system?
# rpm -qa | grep -i mha
mha4mysql-node-0.53-0.el6.noarch
mha4mysql-manager-0.53-0.el6.noarch
# uname -r
2.6.32-279.9.1.el6.x86_64
# cat /etc/redhat-release
CentOS release 6.3 (Final)
Original issue reported on code.google.com by [email protected]
on 19 Oct 2012 at 6:50
I tested case when one of the slaves is dead and masterha_manager should start.
I add in cnf "ignore_fail=1" as written in wiki
part of my conf:
[server1]
hostname=172.16.50.11
candidate_master=1
[server2]
hostname=172.16.50.14
candidate_master=1
[server3]
ignore_fail=1
hostname=172.16.50.13
server1 is master, server2 slave of server1.
mysql on server3 was switched off.
Next i try start "masterha_manager"
masterha_manager --conf=/etc/mha_manager/app1.cnf
And it couldnt
It write messages:
###########
###########
Tue Nov 20 17:39:12 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Tue Nov 20 17:39:12 2012 - [info] Reading application default configurations
from /etc/mha_manager/app1.cnf..
Tue Nov 20 17:39:12 2012 - [info] Reading server configurations from
/etc/mha_manager/app1.cnf..
Tue Nov 20 17:39:12 2012 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 20 17:39:12 2012 - [info] Dead Servers:
Tue Nov 20 17:39:12 2012 - [info] 172.16.50.13(172.16.50.13:3306)
Tue Nov 20 17:39:12 2012 - [info] Alive Servers:
Tue Nov 20 17:39:12 2012 - [info] 172.16.50.11(172.16.50.11:3306)
Tue Nov 20 17:39:12 2012 - [info] 172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info] Alive Slaves:
Tue Nov 20 17:39:12 2012 - [info] 172.16.50.11(172.16.50.11:3306)
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Tue Nov 20 17:39:12 2012 - [info] Replicating from
172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info] Primary candidate for the new Master
(candidate_master is set)
Tue Nov 20 17:39:12 2012 - [info] Current Alive Master:
172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info] Checking slave configurations..
Tue Nov 20 17:39:12 2012 - [info] read_only=1 is not set on slave
172.16.50.11(172.16.50.11:3306).
Tue Nov 20 17:39:12 2012 - [warning] relay_log_purge=0 is not set on slave
172.16.50.11(172.16.50.11:3306).
Tue Nov 20 17:39:12 2012 - [info] Checking replication filtering settings..
Tue Nov 20 17:39:12 2012 - [info] binlog_do_db= testdb, binlog_ignore_db=
Tue Nov 20 17:39:12 2012 - [info] Replication filtering check ok.
Tue Nov 20 17:39:12 2012 - [info] Starting SSH connection tests..
Tue Nov 20 17:39:13 2012 - [info] All SSH connection tests passed successfully.
Tue Nov 20 17:39:13 2012 - [info] Checking MHA Node version..
Tue Nov 20 17:39:13 2012 - [info] Version check ok.
Tue Nov 20 17:39:13 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444] Server
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server
settings.
Tue Nov 20 17:39:13 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln384] Error happend
on checking configurations. at
/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm line 362
Tue Nov 20 17:39:13 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln480] Error
happened on monitoring servers.
Tue Nov 20 17:39:13 2012 - [info] Got exit code 1 (Not master dead).
###########
###########
masterha_check_repl print same errors
###########
###########
Tue Nov 20 18:19:29 2012 - [info] All SSH connection tests passed successfully.
Tue Nov 20 18:19:29 2012 - [info] Checking MHA Node version..
Tue Nov 20 18:19:30 2012 - [info] Version check ok.
Tue Nov 20 18:19:30 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444] Server
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server
settings.
Tue Nov 20 18:19:30 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln384] Error happend
on checking configurations. at
/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm line 362
Tue Nov 20 18:19:30 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln480] Error
happened on monitoring servers.
Tue Nov 20 18:19:30 2012 - [info] Got exit code 1 (Not master dead).
###########
###########
it seems like masterha_* doesnt see in cnf option "ignore_fail=1"
I tried debug
I add print:
#more +440 ServerManager.pm | head
foreach (@dead_servers) {
next if ( $_->{id} eq $current_master->{id} );
next if ( $ignore_fail_check && $_->{ignore_fail} );
print "\n" . $_->{ignore_fail} . $ignore_fail_check . "\n";
$log->error(
sprintf( " Server %s is dead, but must be alive! Check server settings.",
$_->get_hostinfo() )
);
croak;
}
and in messages it prints
####
####
Tue Nov 20 18:32:59 2012 - [info] Version check ok.
10
Tue Nov 20 18:32:59 2012 -
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444] Server
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server
settings
####
####
So its strange and i expect that masterha_* will start with one dead slave and
option "ignore_fail=1"
Original issue reported on code.google.com by [email protected]
on 20 Nov 2012 at 2:37
What steps will reproduce the problem?
1. log in to a el5 system (centos 5.5 in this case)
2. rpm -ivh mha4mysql-node-0.54-0.el5.noarch.rpm
3.
What is the expected output? What do you see instead?
EXPECTED:
Preparing... ########################################### [100%]
1:mha4mysql-node ########################################### [100%]
ACTUAL:
error: Failed dependencies:
rpmlib(FileDigests) <= 4.6.0-1 is needed by mha4mysql-node-0.54-0.el5.noarch
rpmlib(PayloadIsXz) <= 5.2-1 is needed by mha4mysql-node-0.54-0.el5.noarch
What version of the product are you using? On what operating system?
mha 5.4 on Centos 5.5
Please provide any additional information below.
The error output is the same when I try to install the el6 package, so I'm
assuming the el5 wasn't packaged for el5 but for el6.
Original issue reported on code.google.com by [email protected]
on 18 Dec 2012 at 6:56
First of all; Great project!
I wonder what happens when the manager fail.
For an example, the project MMM needs a writer with only readers, when your
write is down you have an issue, so a SPOF.
In this case you have a manager which is also a SPOF because when it goes down,
the servers are not checked anymore and if a Master fails another Slave cannot
become a Master.
Is it not an idea to have all Slave nodes be some sort of Manager which checks
what slave is the last Slave and becomes master when the master goes down ? In
this case the master never needs to be a Manager too.
I think about this because I don't want to hev a SPOF. In a Master<>Master
replacation is should have no SPOF but a split brain can occure instead which
is not nice at all.
Original issue reported on code.google.com by [email protected]
on 22 Feb 2012 at 10:35
In production environment, the masterha's script can't accessed to MySQL's host
with the same IP than for SSH's access because we are in a multi VLAN
configuration.
For security reason, the SSH is not permit on the data's VLAN.
Is it possible to do an evolution of your product for parameterize an SSH's IP
per host in the configuration file, please?
It's for a client with SkySQL Support.
Original issue reported on code.google.com by [email protected]
on 25 Oct 2011 at 4:47
What steps will reproduce the problem?
1. Add to config
[mysqld]
chroot=/var/lib/mysql
2. Try start masterha_check_repl --conf=/etc/app1.cnf
3. Get error:
Tue Oct 25 21:16:45 2011 - [info] Checking SSH publickey authentication and
checking recovery script configurations on the current master..
Tue Oct 25 21:16:45 2011 - [info] Executing command: save_binary_logs
--command=test --start_file=mysql-bin.000003 --start_pos=4
--binlog_dir=/var/lib/mysql/binlog
--output_file=/var/lib/mysql/tmp/save_binary_logs_test --manager_version=0.52
Tue Oct 25 21:16:45 2011 - [info] Connecting to root@db1(db1)..
Creating /var/lib/mysql/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000003
Tue Oct 25 21:16:46 2011 - [info] Master setting check done.
Tue Oct 25 21:16:46 2011 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Tue Oct 25 21:16:46 2011 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user=root --slave_host=db2 --slave_ip=192.168.10.4
--slave_port=3306 --workdir=/var/lib/mysql/tmp --target_version=5.1.59-log
--manager_version=0.52 --relay_log_info=/db/relay-log.info --slave_pass=xxx
Tue Oct 25 21:16:46 2011 - [info] Connecting to [email protected](db2)..
Checking slave recovery environment settings..
Opening /db/relay-log.info ...Could not open relay-log-info file /db/relay-log.info.
at /usr/bin/apply_diff_relay_logs line 274
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln129] Slaves settings check failed!
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln304] Slave configuration failed.
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln315] Error happend on checking configurations. at
/usr/bin/masterha_check_repl line 48
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln396] Error happened on monitoring servers.
Tue Oct 25 21:16:46 2011 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
What version of the product are you using? On what operating system?
$ rpm -qa | grep -i mysql
libmysqlclient16-5.1.59-alt1
perl-DBD-mysql-4.020-alt2
mha4mysql-node-0.52-alt1
MySQL-client-5.1.59-alt1
MySQL-server-5.1.59-alt1
Original issue reported on code.google.com by [email protected]
on 25 Oct 2011 at 9:51
config the parameter file : /etc/section1.cnf
[server default]
# mysql user and password
user=root
password=rootpass
# working directory on the manager
manager_workdir=/apps/mha4mysql-manager-0.53/workdir/section1
# manager log file
manager_log=/apps/mha4mysql-manager-0.53/workdir/section1/section1.log
# working directory on MySQL servers
remote_workdir=/apps/mha4mysql-node-0.53/section1
# master_binlog_dir
master_binlog_dir=/apps/mysql-5.5.16/data
# master_ip_failover_script
master_ip_failover_script=/usr/local/samples/bin/master_ip_failover
# shutdown_script
shutdown_script=/usr/local/samples/bin/power_manager
# master_ip_online_change_script
master_ip_online_change_script=/usr/local/samples/bin/master_ip_online_change
[server1]
hostname=192.168.167.71
[server2]
hostname=192.168.167.47
candidate_master=1
[server3]
hostname=192.168.167.46
when i execute the masterha_check_repl ,get the follow error :
Fri Feb 3 15:00:45 2012 - [info] /usr/local/samples/bin/master_ip_failover
--command=status --ssh_user=root --orig_master_host=192.168.167.47
--orig_master_ip=192.168.167.47 --orig_master_port=3306
Bareword "FIXME_xxx" not allowed while "strict subs" in use at
/usr/local/samples/bin/master_ip_failover line 88.
Execution of /usr/local/samples/bin/master_ip_failover aborted due to
compilation errors.
Fri Feb 3 15:00:45 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln214] Failed to
get master_ip_failover_script status with return code 255:0.
Fri Feb 3 15:00:45 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln383] Error
happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Fri Feb 3 15:00:45 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln478] Error
happened on monitoring servers.
Fri Feb 3 15:00:45 2012 - [info] Got exit code 1 (Not master dead).
if i marked the parameter "master_ip_failover_script","shutdown_script",
"master_ip_online_change_script" , check repl was ok.
can you tell me ,why ?
i used version of 0.53 on linux enterprise 5.
Original issue reported on code.google.com by [email protected]
on 3 Feb 2012 at 7:23
What steps will reproduce the problem?
1. Get lastest(on 17.12.2012) code by git from
https://github.com/yoshinorim/mha4mysql-manager.git
2. make && install
perl Makefile.PL PREFIX=/usr
make
make install
3.my mha conf file
##
# init users and dirs
##
# list of servers
[server1]
hostname=us1
ip=172.16.50.11
candidate_master=1
ignore_fail=1
[server2]
hostname=us4
ip=172.16.50.14
candidate_master=1
ignore_fail=1
[server3]
ignore_fail=1
hostname=us3
ip=172.16.50.13
candidate_master=1
4.
Check that mysql is stopped on server 172.16.50.13, and run on other
Run mha-manager
masterha_manager --ignore_fail_on_start=1 --conf=/home/mha4mysql/etc/app1.cnf
What is the expected output?
Mha should start and notice that US3 is dead and then continue work
What do you see instead?
Mha gone out with error
############
############
Mon Dec 17 10:58:08 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Mon Dec 17 10:58:08 2012 - [info] Reading application default configurations
from /home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] Reading server configurations from
/home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] MHA::MasterMonitor version 0.55.
Mon Dec 17 10:58:08 2012 - [info] Dead Servers:
Mon Dec 17 10:58:08 2012 - [info] us3(172.16.50.13:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Servers:
Mon Dec 17 10:58:08 2012 - [info] funky(172.16.50.11:3306)
Mon Dec 17 10:58:08 2012 - [info] us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Slaves:
Mon Dec 17 10:58:08 2012 - [info] funky(172.16.50.11:3306)
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Mon Dec 17 10:58:08 2012 - [info] Replicating from
172.16.50.14(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Primary candidate for the new Master
(candidate_master is set)
Mon Dec 17 10:58:08 2012 - [info] Current Alive Master: us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Checking slave configurations..
Mon Dec 17 10:58:08 2012 - [info] read_only=1 is not set on slave
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [warning] relay_log_purge=0 is not set on slave
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [info] Checking replication filtering settings..
Mon Dec 17 10:58:08 2012 - [info] binlog_do_db= testdb, binlog_ignore_db=
Mon Dec 17 10:58:08 2012 - [info] Replication filtering check ok.
Mon Dec 17 10:58:08 2012 - [info] Starting SSH connection tests..
Mon Dec 17 10:58:09 2012 - [info] All SSH connection tests passed successfully.
Mon Dec 17 10:58:09 2012 - [info] Checking MHA Node version..
Mon Dec 17 10:58:09 2012 - [info] Version check ok.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/ServerManager.pm,
ln443] Server us3(172.16.50.13:3306) is dead, but must be alive! Check server
settings.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm,
ln386] Error happend on checking configurations. at
/usr/share/perl/5.14/MHA/MasterMonitor.pm line 363
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm,
ln482] Error happened on monitoring servers.
Mon Dec 17 10:58:09 2012 - [info] Got exit code 1 (Not master dead).
###########
###########
Please provide any additional information below.
I am not ace in perl, but i try to debug error.
I add printf in MasterMonitor.pm after section "GetOptions(" in "sub main "
it prints evrytime 0 and not depend what i write in arg when execute
masterha_manager --ignore_fail_on_start=1 --conf=/home/mha4mysql/etc/app1.cnf
or
masterha_manager --ignore_fail_on_start=0 --conf=/home/mha4mysql/etc/app1.cnf
I found what need to change
diff --git a/lib/MHA/MasterMonitor.pm b/lib/MHA/MasterMonitor.pm
index 71945de..ff80c89 100644
--- a/lib/MHA/MasterMonitor.pm
+++ b/lib/MHA/MasterMonitor.pm
@@ -636,7 +636,7 @@ sub main {
'manager_log=s' => \$g_logfile,
'skip_ssh_check' => \$g_skip_ssh_check, # for testing
'skip_check_ssh' => \$g_skip_ssh_check,
- 'ignore_fail_on_start' => \$g_ignore_fail_on_start,
+ 'ignore_fail_on_start=i' => \$g_ignore_fail_on_start,
);
setpgrp( 0, $$ ) unless ($g_interactive);
After that mha-manager works with argument correctly and as expected
Check it please
Original issue reported on code.google.com by [email protected]
on 17 Dec 2012 at 11:05
On the default perl install on RHEL6, the vendor dir is
/usr/share/perl5/vendor_lib. The spec files hardcode /usr/lib/perl5/vendor_lib
and make the packages unusable on RHEL6.
Original issue reported on code.google.com by petefbsd
on 18 Nov 2011 at 2:01
Attachments:
What steps will reproduce the problem ?
1. Configure ssh keys on all servers
2. Test ssh connection with command line => OK
3. Test ssh connection with masterha_check_ssh => KO
What is the expected output? What do you see instead ?
Wed Jan 4 16:47:30 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jan 4 16:47:30 2012 - [info] Reading application default configurations
from /etc/myha.cnf..
Wed Jan 4 16:47:30 2012 - [info] Reading server configurations from
/etc/myha.cnf..
Wed Jan 4 16:47:30 2012 - [info] Starting SSH connection tests..
Wed Jan 4 16:47:30 2012 - [error][/usr/local/share/perl5/MHA/SSHCheck.pm, ln63]
Wed Jan 4 16:47:30 2012 - [debug] Connecting via SSH from root@node1 to
root@node2..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Wed Jan 4 16:47:30 2012 - [error][/usr/local/share/perl5/MHA/SSHCheck.pm,
ln106] SSH connection from root@node1 to root@node2 failed!
Wed Jan 4 16:47:31 2012 - [debug]
Wed Jan 4 16:47:30 2012 - [debug] Connecting via SSH from root@node2 to
root@node1..
Wed Jan 4 16:47:30 2012 - [debug] ok.
SSH Configuration Check Failed!
at ./masterha_check_ssh line 44
What version of the product are you using? On what operating system?
0.52 on redhat 6
Original issue reported on code.google.com by [email protected]
on 4 Jan 2012 at 3:58
1. Try start masterha_check_repl --conf=/etc/masterha/app1.cnf
2. Get this error:
[root@EGSNS-49-2 bin]# ./masterha_check_repl --conf=/etc/masterha/app1.cnf
Fri Jun 1 17:37:44 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Fri Jun 1 17:37:44 2012 - [info] Reading application default configurations
from /etc/masterha/app1.cnf..
Fri Jun 1 17:37:44 2012 - [info] Reading server configurations from
/etc/masterha/app1.cnf..
Fri Jun 1 17:37:44 2012 - [info] MHA::MasterMonitor version 0.53.
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.
at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.
at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.
at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.
at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.
at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,
ln263] Got fatal error, stopping operations
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln383] Error happend on checking configurations. at
/usr/local/share/perl5/MHA/MasterMonitor.pm line 298
Fri Jun 1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln478] Error happened on monitoring servers.
Fri Jun 1 17:37:44 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
Pls help me explain why does this error happen?
And I had installed the perl-DBD-MySQL package every node.
3. This is my /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/
user=root
password=emag@234
ssh_user=root
repl_user=rep
repl_password=rep
shutdown_script=""
master_ip_failover_script="/apps/mha4mysql-manager-0.53/samples/scripts/master_i
p_failover"
report_script=""
remote_workdir=/apps/mha4mysql-node-0.53/section1
[server1]
hostname=192.168.49.9
[server2]
hostname=192.168.49.2
candidate_master=1
[server3]
hostname=192.168.49.1
[server4]
hostname=192.168.49.3
[server5]
hostname=192.168.49.4
Pls help me explain why does this error happen?
OS:redhat 6.2 64bit
Mysql: 5.5.22( build from source )
basedir: /apps/mysql
datadir: /apps/mysql/data
current master:192.168.49.9
standby master:192.168.49.1
masterha_check_ssh:OK
Pls help me check and give me an advice as soon as possible
Thanks,
[email protected]
Original issue reported on code.google.com by [email protected]
on 1 Jun 2012 at 10:08
What steps will reproduce the problem?
1. on the mha manager box, running the following command: masterha_manager
--conf=/vol/mapi_qa.cnf
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
CentOS 5 (Amazon ami image)
Please provide any additional information below.
Fri Feb 17 05:59:15 2012 - [info] Connecting to
[email protected](xx.xx.xxx.xxx:22)..
Can't exec "mysqlbinlog": No such file or directory at
/usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
at /usr/local/bin/apply_diff_relay_logs line 463
Original issue reported on code.google.com by [email protected]
on 17 Feb 2012 at 6:02
1. Try start masterha_check_repl --conf=/etc/app1.cnf
2. Get this error:
[root@Manager ~]# masterha_check_repl --conf=/etc/masterha_default.cnf
Thu May 24 14:32:05 2012 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] Reading application default configurations
from /etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] Reading server configurations from
/etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] MHA::MasterMonitor version 0.52.
Thu May 24 14:32:05 2012 - [info] Dead Servers:
Thu May 24 14:32:05 2012 - [info] Alive Servers:
Thu May 24 14:32:05 2012 - [info] Master(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Slave(192.168.114.131:3306)
Thu May 24 14:32:05 2012 - [info] Slave2(192.168.114.134:3306)
Thu May 24 14:32:05 2012 - [info] Alive Slaves:
Thu May 24 14:32:05 2012 - [info] Slave(192.168.114.131:3306)
Version=5.5.14-log (oldest major version between slaves) log-bin:enabled
Thu May 24 14:32:05 2012 - [info] Replicating from
192.168.114.132(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Slave2(192.168.114.134:3306)
Version=5.5.14-log (oldest major version between slaves) log-bin:enabled
Thu May 24 14:32:05 2012 - [info] Replicating from
192.168.114.132(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Current Alive Master:
Master(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Checking slave configurations..
Thu May 24 14:32:05 2012 - [warning] read_only=1 is not set on slave
Slave(192.168.114.131:3306).
Thu May 24 14:32:05 2012 - [warning] relay_log_purge=0 is not set on slave
Slave(192.168.114.131:3306).
Thu May 24 14:32:05 2012 - [warning] read_only=1 is not set on slave
Slave2(192.168.114.134:3306).
Thu May 24 14:32:05 2012 - [warning] relay_log_purge=0 is not set on slave
Slave2(192.168.114.134:3306).
Thu May 24 14:32:05 2012 - [info] Checking replication filtering settings..
Thu May 24 14:32:05 2012 - [info] binlog_do_db= EcommerceDB, binlog_ignore_db=
information_schema,mysql,performance_schema,test
Thu May 24 14:32:05 2012 - [info] Replication filtering check ok.
Thu May 24 14:32:05 2012 - [info] Starting SSH connection tests..
Thu May 24 14:32:07 2012 - [info] All SSH connection tests passed successfully.
Thu May 24 14:32:07 2012 - [info] Checking MHA Node version..
Thu May 24 14:32:08 2012 - [info] Version check ok.
Thu May 24 14:32:08 2012 - [info] Checking SSH publickey authentication and
checking recovery script configurations on the current master..
Thu May 24 14:32:08 2012 - [info] Executing command: save_binary_logs
--command=test --start_file=ecommerce-bin.000001 --start_pos=4
--binlog_dir=/data/ecommerce_bin_log
--output_file=/var/log/masterha/app1/save_binary_logs_test
--manager_version=0.52
Thu May 24 14:32:08 2012 - [info] Connecting to root@Master(Master)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/ecommerce_bin_log, up to ecommerce-bin.000001
Thu May 24 14:32:09 2012 - [info] Master setting check done.
Thu May 24 14:32:09 2012 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Thu May 24 14:32:09 2012 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user=root --slave_host=Slave --slave_ip=192.168.114.131
--slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.14-log
--manager_version=0.52 --relay_log_info=/usr/local/mysql/data/relay-log.info
--slave_pass=xxx
Thu May 24 14:32:09 2012 - [info] Connecting to [email protected](Slave)..
Checking slave recovery environment settings..
Opening /usr/local/mysql/data/relay-log.info ... ok.
Relay log found at /data/ecommerce_relay_log, up to ecommerce-relay-bin.000003
Temporary relay log file is /data/ecommerce_relay_log/ecommerce-relay-bin.000003
Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
at /usr/bin/apply_diff_relay_logs line 315
main::check() called at /usr/bin/apply_diff_relay_logs line 429
eval {...} called at /usr/bin/apply_diff_relay_logs line 409
main::main() called at /usr/bin/apply_diff_relay_logs line 97
Thu May 24 14:32:09 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln129] Slaves
settings check failed!
Thu May 24 14:32:09 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln304] Slave
configuration failed.
Thu May 24 14:32:09 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln315] Error
happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Thu May 24 14:32:09 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln396] Error
happened on monitoring servers.
Thu May 24 14:32:09 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
3. This is my /etc/app1.cnf
manager_log=/var/log/masterha/app1/app1.log
manager_workdir=/var/log/masterha/app1
user=root
password=123456
remote_workdir=/data/ecommerce_bin_log
[server1]
hostname=Master
[server2]
hostname=Slave
candidate_master=1
[server3]
hostname=Slave2
Pls help me explain why does this error happen?
OS: CentOS Release 5.2
Mysql: 5.5.14( build from source )
basedir: /usr/local/mysql
datadir: /usr/local/mysql/data
Pls help me check and give me an advice as soon as possible
Thanks,
[email protected]
Original issue reported on code.google.com by [email protected]
on 24 May 2012 at 7:50
1.I had setting up the masterha_manager and masterha_node,but when i am kill
the master mysql's porcess, the masterha_manager will quit out.
and the failure of the switch can not be achieved.
2.The below is the Architecture when i am testing.
master candidate_master
10.1.200.216 --------> 10.1.200.215 10.1.200.27
masterha_node masterha_node masterha_manager & masterha_node
\
\
\
slave
10.1.200.217
--------------------------------------------
The Purpose:
after killall -9 mysqld at 10.1.200.216, it must be the below,
master
10.1.200.215 10.1.200.27
masterha_node masterha_manager
\
\
\
slave
10.1.200.217
masterha_node
BUT:
after killall -9 mysqld at 10.1.200.216, masterha_manager will quit out, and nothing change.
some more info:
1.install the mysql package both at 10.1.200.215, 10.1.200.216,10.1.200.217,
10.1.200.27
rpm -ivh MySQL-server-5.5.16-1.linux2.6.x86_64.rpm
rpm -ivh MySQL-devel-5.5.16-1.linux2.6.x86_64.rpm
rpm -ivh MySQL-client-5.5.16-1.linux2.6.x86_64.rpm
2.install the mha4mysql-node-0.52 to all mysql nodes and 10.1.200.27
cd mha4mysql-node-0.52;
perl Makefile.PL&&make install
(cut some of the steps that are not related)
3.install masterha_manger on 10.1.200.27
cd mha4mysql-manager-0.52
perl Makefile.PL
(cut some of the steps that are not related)
make install
4. the configuration on 10.1.200.27
cat /etc/app1.cnf
[server default]
user=root
password=
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
remote_workdir=/var/log/masterha/app1
[server1]
hostname=10.1.200.215
candidate_master=1
master_binlog_dir=/var/lib/mysql
[server2]
hostname=10.1.200.216
master_binlog_dir=/var/lib/mysql
[server3]
hostname=10.1.200.217
master_binlog_dir=/var/lib/mysql
cat /etc/masterha_default.cnf
[server default]
user=root
password=
ssh_user=root
repl_user=slave
repl_password= mysqlsalve
master_binlog_dir= /var/lib/mysql
remote_workdir=/data/log/masterha
manager_log=/data/log/masterha/manager.log
secondary_check_script= masterha_secondary_check -s 10.1.200.217 -s 10.1.200.215 --user=root --master_host=10.1.200.216
ping_interval=3
master_ip_failover_script= /usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
report_script=/usr/local/bin/send_report
1.the output of masterha_check_ssh( )
masterha_check_ssh --conf=/etc/app1.cnf
Tue Dec 27 22:14:06 2011 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading application default configurations
from /etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading server configurations from
/etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:14:07 2011 - [debug]
Tue Dec 27 22:14:06 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.215) to [email protected](10.1.200.216)..
Tue Dec 27 22:14:06 2011 - [debug] ok.
Tue Dec 27 22:14:06 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.215) to [email protected](10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug] ok.
Tue Dec 27 22:14:07 2011 - [debug]
Tue Dec 27 22:14:06 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.216) to [email protected](10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug] ok.
Tue Dec 27 22:14:07 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.216) to [email protected](10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug] ok.
Tue Dec 27 22:14:08 2011 - [debug]
Tue Dec 27 22:14:07 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.217) to [email protected](10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug] ok.
Tue Dec 27 22:14:07 2011 - [debug] Connecting via SSH from
[email protected](10.1.200.217) to [email protected](10.1.200.216)..
Tue Dec 27 22:14:08 2011 - [debug] ok.
Tue Dec 27 22:14:08 2011 - [info] All SSH connection tests passed successfully.
the output of masterha_check_repl --conf=/etc/app1.cnf
Tue Dec 27 22:16:10 2011 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading application default configurations
from /etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading server configurations from
/etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] MHA::MasterMonitor version 0.52.
Tue Dec 27 22:16:10 2011 - [info] Dead Servers:
Tue Dec 27 22:16:10 2011 - [info] Alive Servers:
Tue Dec 27 22:16:10 2011 - [info] 10.1.200.215(10.1.200.215:3306)
Tue Dec 27 22:16:10 2011 - [info] 10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] 10.1.200.217(10.1.200.217:3306)
Tue Dec 27 22:16:10 2011 - [info] Alive Slaves:
Tue Dec 27 22:16:10 2011 - [info] 10.1.200.215(10.1.200.215:3306)
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info] Replicating from
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Primary candidate for the new Master
(candidate_master is set)
Tue Dec 27 22:16:10 2011 - [info] 10.1.200.217(10.1.200.217:3306)
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info] Replicating from
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Current Alive Master:
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Checking slave configurations..
Tue Dec 27 22:16:10 2011 - [warning] read_only=1 is not set on slave
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning] relay_log_purge=0 is not set on slave
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning] read_only=1 is not set on slave
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [warning] relay_log_purge=0 is not set on slave
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [info] Checking replication filtering settings..
Tue Dec 27 22:16:10 2011 - [info] binlog_do_db= , binlog_ignore_db=
Tue Dec 27 22:16:10 2011 - [info] Replication filtering check ok.
Tue Dec 27 22:16:10 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:16:12 2011 - [info] All SSH connection tests passed successfully.
Tue Dec 27 22:16:12 2011 - [info] Checking MHA Node version..
Tue Dec 27 22:16:13 2011 - [info] Version check ok.
Tue Dec 27 22:16:13 2011 - [info] Checking SSH publickey authentication and
checking recovery script configurations on the current master..
Tue Dec 27 22:16:13 2011 - [info] Executing command: save_binary_logs
--command=test --start_file=mysql-bin.000007 --start_pos=4
--binlog_dir=/var/lib/mysql
--output_file=/var/log/masterha/app1/save_binary_logs_test
--manager_version=0.52
Tue Dec 27 22:16:13 2011 - [info] Connecting to
[email protected](10.1.200.216)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000007
Tue Dec 27 22:16:14 2011 - [info] Master setting check done.
Tue Dec 27 22:16:14 2011 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Tue Dec 27 22:16:14 2011 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user=root --slave_host=10.1.200.215
--slave_ip=10.1.200.215 --slave_port=3306 --workdir=/var/log/masterha/app1
--target_version=5.5.16-log --manager_version=0.52
--relay_log_info=/var/lib/mysql/relay-log.info --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info] Connecting to
[email protected](10.1.200.215)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to yl-hyper-15-relay-bin.000014
Temporary relay log file is /var/lib/mysql/yl-hyper-15-relay-bin.000014
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user=root --slave_host=10.1.200.217
--slave_ip=10.1.200.217 --slave_port=3306 --workdir=/var/log/masterha/app1
--target_version=5.5.16-log --manager_version=0.52
--relay_log_info=/var/lib/mysql/relay-log.info --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info] Connecting to
[email protected](10.1.200.217)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to yl-hyper-17-relay-bin.000013
Temporary relay log file is /var/lib/mysql/yl-hyper-17-relay-bin.000013
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info] Slaves settings check done.
Tue Dec 27 22:16:14 2011 - [info]
10.1.200.216 (current master)
+--10.1.200.215
+--10.1.200.217
Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.215..
Tue Dec 27 22:16:14 2011 - [info] ok.
Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.217..
Tue Dec 27 22:16:14 2011 - [info] ok.
Tue Dec 27 22:16:14 2011 - [info] Checking master_ip_failvoer_script status:
Tue Dec 27 22:16:14 2011 - [info] /usr/local/bin/master_ip_failover
--command=status --ssh_user=root --orig_master_host=10.1.200.216
--orig_master_ip=10.1.200.216 --orig_master_port=3306
Tue Dec 27 22:16:14 2011 - [info] OK.
Tue Dec 27 22:16:14 2011 - [warning] shutdown_script is not defined.
Tue Dec 27 22:16:14 2011 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
i do the below command to run masterha_manager at 10.1.200.27
mkdir -p /data/log/masterha;
nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /data/log/masterha/manager.log 2>&1 &;
tail -f /data/log/masterha/manager.log
Tue Dec 27 21:39:59 2011 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading application default configurations
from /etc/app1.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading server configurations from
/etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading application default configurations
from /etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading server configurations from
/etc/app1.cnf..
Original issue reported on code.google.com by [email protected]
on 27 Dec 2011 at 2:43
Hello!
I found a bug with working with config file of mha.
i have strong password in mysql, that contains such symbols like
'$','\','%','$' and other. Not just letters and didgits.
For example i write in in mha cnf
password=%DE&T^GF1
and then i execute
masterha_check_repl --conf=/etc/mha_manager/app1.cnf
i got messages about programm couldnt connect to mysql.
############################
############################
############################
Wed Nov 14 17:31:17 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Nov 14 17:31:17 2012 - [info] Reading application default configurations
from /etc/mha_manager/app1.cnf..
Wed Nov 14 17:31:17 2012 - [info] Reading server configurations from
/etc/mha_manager/app1.cnf..
Wed Nov 14 17:31:17 2012 - [info] MHA::MasterMonitor version 0.53.
Wed Nov 14 17:31:17 2012 - [info] Dead Servers:
Wed Nov 14 17:31:17 2012 - [info] Alive Servers:
Wed Nov 14 17:31:17 2012 - [info] 172.16.50.11(172.16.50.11:3306)
Wed Nov 14 17:31:17 2012 - [info] 172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Alive Slaves:
Wed Nov 14 17:31:17 2012 - [info] 172.16.50.11(172.16.50.11:3306)
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Nov 14 17:31:17 2012 - [info] Replicating from
172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Current Alive Master:
172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Checking slave configurations..
Wed Nov 14 17:31:17 2012 - [info] Checking replication filtering settings..
Wed Nov 14 17:31:17 2012 - [info] binlog_do_db= testdb, binlog_ignore_db=
Wed Nov 14 17:31:17 2012 - [info] Replication filtering check ok.
Wed Nov 14 17:31:17 2012 - [info] Starting SSH connection tests..
Wed Nov 14 17:31:18 2012 - [info] All SSH connection tests passed successfully.
Wed Nov 14 17:31:18 2012 - [info] Checking MHA Node version..
Wed Nov 14 17:31:18 2012 - [info] Version check ok.
Wed Nov 14 17:31:18 2012 - [info] Checking SSH publickey authentication
settings on the current master..
Wed Nov 14 17:31:18 2012 - [info] HealthCheck: SSH to 172.16.50.14 is reachable.
Wed Nov 14 17:31:18 2012 - [info] Master MHA Node version is 0.53.
Wed Nov 14 17:31:18 2012 - [info] Checking recovery script configurations on
the current master..
Wed Nov 14 17:31:18 2012 - [info] Executing command: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/home/mysqldata/
--output_file=/home/mha_manager_data/app1/save_binary_logs_test
--manager_version=0.53 --start_file=mysql-bin.000009
Wed Nov 14 17:31:18 2012 - [info] Connecting to
[email protected](172.16.50.14)..
Creating /home/mha_manager_data/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /home/mysqldata/, up to mysql-bin.000009
Wed Nov 14 17:31:19 2012 - [info] Master setting check done.
Wed Nov 14 17:31:19 2012 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Wed Nov 14 17:31:19 2012 - [info] Connecting to
[email protected](172.16.50.11:22)..
Checking slave recovery environment settings..
Opening /home/mysqldata/relay-log.info ... ok.
Relay log found at /home/mysqldata, up to mysql-relay-bin.000002
Temporary relay log file is /home/mysqldata/mysql-relay-bin.000002
Testing mysql connection and privileges..ERROR 1045 (28000): Access denied for user 'root'@'172.16.50.11' (using password: YES)
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 351
main::check() called at /usr/bin/apply_diff_relay_logs line 470
eval {...} called at /usr/bin/apply_diff_relay_logs line 450
main::main() called at /usr/bin/apply_diff_relay_logs line 110
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln194] Slaves settings check failed!
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln373] Slave configuration failed.
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln384] Error happend on checking configurations. at
/usr/bin/masterha_check_repl line 48
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln479] Error happened on monitoring servers.
Wed Nov 14 17:31:19 2012 - [info] Got exit code 1 (Not master dead).
############################
############################
############################
well, and there is very interesting moment: in the begin it could connect to
mysql and it could get values of global variables.
but then it couldnt.
i've never written perl scripts (i use c++ and bash usually) but i tried find
where is problem;
i found that problem in incorrect passing parameters without escaping.
In MasterMonitor.pm in line 185: when construction $command and concatenate
--slave_pass it should be escaped and placed in quotes. Because the script
looks on '$' symbol like on control character and miss it.
I tried change MasterMonitor.pm in this way:
$command .= " --slave_pass='$s->{password}' ";
But this is not help me.
I tried manually run apply_diff_relay_logs with parameters and found that
sybmol '$' in password should be escaped
for example:
it doesnt work
--slave_pass='%DE&T^GF1'
and it works
--slave_pass='%DE\&T^GF1'
and if i remove ''' quotes slashes it doesnt work too
--slave_pass=%DE\&T^GF1
So, please fix this bug or say how to work with such symbols in password(maybe
there is correct way to write it in cnf file).
I would have done patch for it if i had known Perl.
Original issue reported on code.google.com by [email protected]
on 14 Nov 2012 at 2:12
Just a feature request that would help me out. I'd like the location of "ssh"
to be specifiable, rather than always derived from the PATH.
Original issue reported on code.google.com by [email protected]
on 5 Jan 2012 at 12:26
What steps will reproduce the problem?
1. masterha_check_ssh: OK(No Error)
2. masterha_check_repl: OK(No Error)
3. masterha_manager: OK(No Error. End of filed is "Ping Succeeded, ...)
4. Master node is down(shutdown -r now)
5. Failed: Starting master failover.
Output with bellow:
---------------------
Wed Jun 20 12:08:44 2012 - [info] Starting ping health check on
10.1.10.80(10.1.10.80:3306)..
Wed Jun 20 12:08:44 2012 - [info] Ping succeeded, sleeping until it doesn't
respond..
:
: Master node is down(shutdown -r now)
:
Wed Jun 20 12:10:02 2012 - [warning] Got error on MySQL ping: 2006 (MySQL
server has gone away)
ssh: connect to host 10.1.10.80 port 22: Connection refused
Wed Jun 20 12:10:02 2012 - [warning] HealthCheck: SSH to 10.1.10.80 is NOT
reachable.
Wed Jun 20 12:10:08 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:08 2012 - [warning] Connection failed 1 time(s)..
Wed Jun 20 12:10:11 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:11 2012 - [warning] Connection failed 2 time(s)..
Wed Jun 20 12:10:14 2012 - [warning] Got error on MySQL connect: 2003 (Can't
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:14 2012 - [warning] Connection failed 3 time(s)..
Wed Jun 20 12:10:14 2012 - [warning] Master is not reachable from health
checker!
Wed Jun 20 12:10:14 2012 - [warning] Master 10.1.10.80(10.1.10.80:3306) is not
reachable!
Wed Jun 20 12:10:14 2012 - [warning] SSH is NOT reachable.
Wed Jun 20 12:10:14 2012 - [info] Connecting to a master server failed. Reading
configuration file /etc/masterha_default.cnf and /etc/mha_manager/app1.cnf
again, and trying to connect to all servers to check server status..
Wed Jun 20 12:10:14 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 20 12:10:14 2012 - [info] Reading application default configurations
from /etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Reading server configurations from
/etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Dead Servers:
Wed Jun 20 12:10:14 2012 - [info] 10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Alive Servers:
Wed Jun 20 12:10:14 2012 - [info] 10.1.10.81(10.1.10.81:3306)
Wed Jun 20 12:10:14 2012 - [info] 10.1.20.80(10.1.20.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Alive Slaves:
Wed Jun 20 12:10:14 2012 - [info] 10.1.10.81(10.1.10.81:3306)
Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 20 12:10:14 2012 - [info] Replicating from
10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info] 10.1.20.80(10.1.20.80:3306)
Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 20 12:10:14 2012 - [info] Replicating from
10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Checking slave configurations..
Wed Jun 20 12:10:14 2012 - [warning] read_only=1 is not set on slave
10.1.10.81(10.1.10.81:3306).
Wed Jun 20 12:10:14 2012 - [warning] relay_log_purge=0 is not set on slave
10.1.10.81(10.1.10.81:3306).
Wed Jun 20 12:10:14 2012 - [warning] read_only=1 is not set on slave
10.1.20.80(10.1.20.80:3306).
Wed Jun 20 12:10:14 2012 - [warning] relay_log_purge=0 is not set on slave
10.1.20.80(10.1.20.80:3306).
Wed Jun 20 12:10:14 2012 - [info] Checking replication filtering settings..
Wed Jun 20 12:10:14 2012 - [info] Replication filtering check ok.
Wed Jun 20 12:10:14 2012 - [info] Master is down!
Wed Jun 20 12:10:14 2012 - [info] Terminating monitoring script.
Wed Jun 20 12:10:14 2012 - [info] Got exit code 20 (Master dead).
Wed Jun 20 12:10:14 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 20 12:10:14 2012 - [info] Reading application default configurations
from /etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Reading server configurations from
/etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] MHA::MasterFailover version 0.52.
Wed Jun 20 12:10:14 2012 - [info] Starting master failover.
Wed Jun 20 12:10:14 2012 -
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerUtil.pm, ln158] Got ERROR:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerConst.pm line 84.
---------------------
What is the expected output? What do you see instead?
1. Why this problem means.
2. Please tell me the workaround.
What version of the product are you using? On what operating system?
Manager:
- OS: RHEL5.7 (2.6.18-274.el5)
- MHA Manager: 0.52 โปThis issue happend 0.53
- MHA Node: 0.52
Node:(10.1.10.8[01], 10.1.20.80)
- OS: RHEL5.7 (2.6.18-274.el5)
- MHA Node: 0.52
- MySQL 5.5.25
Original issue reported on code.google.com by [email protected]
on 20 Jun 2012 at 10:30
What steps will reproduce the problem?
1. masterha_master_switch --master_state=alive --conf=/vol/mha/mapi_qa.cnf
--new_master_host=xx.xxx.xxx.xxx --orig_master_is_new_slave
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
0.53
Please provide any additional information below.
Trying to do a manual failover and am getting the following error:
Starting master switch from xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306) to
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)? (yes/NO): yes
Tue Feb 21 19:29:19 2012 - [info] Checking whether
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306) is ok for the new master..
Tue Feb 21 19:29:19 2012 - [info] ok.
Tue Feb 21 19:29:19 2012 - [info] xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306): SHOW
SLAVE STATUS returned empty result. To check replication filtering rules,
temporarily executing CHANGE MASTER to a dummy host.
Tue Feb 21 19:29:19 2012 - [info] xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306):
Resetting slave pointing to the dummy host.
Tue Feb 21 19:29:19 2012 - [info] ** Phase 1: Configuration Check Phase
completed.
Tue Feb 21 19:29:19 2012 - [info]
Tue Feb 21 19:29:19 2012 - [debug] Disconnected from
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:19 2012 - [info] * Phase 2: Rejecting updates Phase..
Tue Feb 21 19:29:19 2012 - [info]
Tue Feb 21 19:29:19 2012 - [info] Executing master ip online change script to
disable write on the current master:
Tue Feb 21 19:29:19 2012 - [info] /vol/mha/master_ip_online_change
--command=stop --orig_master_host=xx.xxx.xxx.xxx
--orig_master_ip=xx.xxx.xxx.xxx --orig_master_port=3306
--new_master_host=xx.xxx.xxx.xxx --new_master_ip=xx.xxx.xxx.xxx
--new_master_port=3306
Got Error: DBI
connect(';host=xx.xxx.xxx.xxx;port=3306;mysql_connect_timeout=4','root',...)
failed: Access denied for user 'root'@'xx.xxx.xxx.xxx' (using password: YES) at
/usr/local/share/perl5/MHA/DBHelper.pm line 181
at /vol/mha/master_ip_online_change line 122
Tue Feb 21 19:29:20 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm,
ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
Tue Feb 21 19:29:20 2012 - [debug] Already disconnected from
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:20 2012 - [debug] Disconnected from
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:20 2012 - [debug] Disconnected from
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Am I doing something wrong? auto failover with masterha_manager running seems
to work fine... I can alos connect to each mysql host from all the involved
databases in the cluster
Original issue reported on code.google.com by [email protected]
on 21 Feb 2012 at 7:33
What steps will reproduce the problem?
1. Define relay-log in /etc/my.cnf as: relay-log = /data/relaylogs/relay-bin
2. Define datadir in /etc/my.cnf as: datadir = /data
3. From MHA manager, run masterha_check_repl --conf=/etc/MHA.cnf
What is the expected output? What do you see instead?
The relay_log_info should be /data/relaylogs/relay-log.info but it is
"/data/relay-log.info" and fails.
What version of the product are you using? On what operating system?
0.53 of Node and Manager on CentOS 5.8
Please provide any additional information below.
Mon May 14 10:23:54 2012 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user=mhadmin --slave_host=db2 --slave_ip=99.99.99.239
--slave_port=3306 --workdir=/var/log/masterha --target_version=5.0.92-50-log
--manager_version=0.53 --relay_log_info=/data/relay-log.info
--relay_dir=/data/ --slave_pass=xxx
Mon May 14 10:23:54 2012 - [info] Connecting to [email protected](db2:22)..
Checking slave recovery environment settings..
Opening /data/relay-log.info ...Could not open relay-log-info file /data/relay-log.info.
at /usr/bin/apply_diff_relay_logs line 306
Original issue reported on code.google.com by [email protected]
on 14 May 2012 at 5:34
What steps will reproduce the problem?
1. Blank mysql password (not sure this is required)
2. Run master_ha_check_repl
3. See error:
Tue Dec 11 10:07:59 2012 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Tue Dec 11 10:07:59 2012 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user='root' --slave_host=33.33.33.12
--slave_ip=33.33.33.12 --slave_port=3306 --workdir=/var/log/masterha/app1
--target_version=5.1.66-0ubuntu0.11.10.3-log --manager_version=0.54
--relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/
--slave_pass=xxx
Tue Dec 11 10:07:59 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln386] Error happend on checking configurations. Use of uninitialized value in
string ne at /usr/share/perl5/MHA/MasterMonitor.pm line 186.
Tue Dec 11 10:07:59 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln482] Error happened on monitoring servers.
Tue Dec 11 10:07:59 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
What is the expected output? What do you see instead?
replication health ok, no errors.
What version of the product are you using? On what operating system?
MHA 0.54_0, Ubuntu Oneric, this vagrant install:
https://github.com/jayjanssen/vagrant-mysql-mha
Please provide any additional information below.
If I replace every instance of 'escaped_password' in /usr/share/perl5/MHA with
'password', it works fine. I could not see anywhere in the code where
'escaped_password' was actually set.
Original issue reported on code.google.com by [email protected]
on 11 Dec 2012 at 4:22
Hi.
Testing mysql-master-ha (with 3 slaves and one master), I discovered that the
new master will still be seen as a slave and masterha_manager then refuses to
start.
It also won't remove the failed master from the config when I run:
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf
This is part of the log telling that mysql-master-ha failed to remove the slave
part from the new master and that it still runs as slave:
Tue Sep 25 14:25:45 2012 - [info] * Phase 5: New master cleanup phease..
Tue Sep 25 14:25:45 2012 - [info]
Tue Sep 25 14:25:45 2012 - [info] Resetting slave info on the new master..
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm,
ln674] SHOW SLAVE STATUS shows new master replicates from somewhere. Check for
details!
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm,
ln688] db02.db.cert.fronter.net: Resetting slave info failed.
Tue Sep 25 14:25:45 2012 -
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1537] Master
failover to db02.mynetwork.net(11.22.33.2:3306) done, but recovery on slave
partially failed.
Tue Sep 25 14:25:45 2012 - [info]
This is output of show slave status:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: db01.mynetwork.net
Master_User: replica
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.000049
Read_Master_Log_Pos: 107
Relay_Log_File: mysqld-relay-bin.000004
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql-bin.000049
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 107
Relay_Log_Space: 839
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master '[email protected]:3306' - retry-time: 10 retries: 86400
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
And finally this is the error I get running
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf
Tue Sep 25 15:28:10 2012 - [warning] SQL Thread is stopped(no error) on
db02.mynetwork.net(11.22.33.2:3306)
Tue Sep 25 15:28:10 2012 -
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln732] Multi-master
configuration is detected, but two or more masters are either writable
(read-only is not set) or dead! Check configurations for details. Master
configurations are as below:
Master db01.mynetwork.net(11.22.33.1:3306), dead
Master db02.db.cert.fronter.net(11.22.33.2:3306), replicating from
db01.mynetwork.net(11.22.33.1:3306)
Tue Sep 25 15:28:10 2012 -
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend
on checking configurations. at
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Tue Sep 25 15:28:10 2012 -
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error
happened on monitoring servers.
Tue Sep 25 15:28:10 2012 - [info] Got exit code 1 (Not master dead).
Is it a known issue? Any idea why this fails?
Original issue reported on code.google.com by [email protected]
on 25 Sep 2012 at 1:31
What steps will reproduce the problem?
1. add user without SELECT privileges.
2. run masterha_check_repl
3. error will be "User repLAN does not exist or does not have REPLICATION SLAVE
privilege"
User had REPLICATION SLAVE privileges. Added code to DBHelper.pm to die on
execute() and print DBI error string.
Output with debug:
Thu Nov 10 16:52:46 2011 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Thu Nov 10 16:52:46 2011 - [info] Reading application default configurations
from /etc/wcdb_mha.cnf..
Thu Nov 10 16:52:46 2011 - [info] Reading server configurations from
/etc/wcdb_mha.cnf..
Thu Nov 10 16:52:46 2011 - [info] MHA::MasterMonitor version 0.52.
Thu Nov 10 16:52:46 2011 - [info] Dead Servers:
Thu Nov 10 16:52:46 2011 - [info] Alive Servers:
... REMOVED ...
Thu Nov 10 16:52:46 2011 - [info] Checking replication filtering settings..
Thu Nov 10 16:52:46 2011 - [info] binlog_do_db= , binlog_ignore_db=
Thu Nov 10 16:52:46 2011 - [info] Replication filtering check ok.
repl_user: repLAN
user: repLAN
Repl_User_SQL: SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = ?
Thu Nov 10 16:52:46 2011 -
[error][/usr/local/lib/perl5/site_perl/5.10.0/MHA/MasterMonitor.pm, ln315]
Error happend on checking configurations. SELECT command denied to user
'mha'@'XXX.XX.XXX.XX' for table 'user' at
/usr/local/lib/perl5/site_perl/5.10.0/MHA/DBHelper.pm line 212.
Thu Nov 10 16:52:46 2011 -
[error][/usr/local/lib/perl5/site_perl/5.10.0/MHA/MasterMonitor.pm, ln396]
Error happened on monitoring servers.
Thu Nov 10 16:52:46 2011 - [info] Got exit code 1 (Not master dead).
The problem is that the mha user does not have access to query the mysql table!
Not a bug, but would be useful to display the proper error.
Original issue reported on code.google.com by [email protected]
on 10 Nov 2011 at 9:56
What steps will reproduce the problem?
1. Configure two instance's both on port 3331
2. Setup instance B to be a Slave of Instance A
3. masterha_master_switch --conf=/etc/mha.cnf --master_state=alive
--new_master_host="hostb" --orig_master_is_new_slave
What is the expected output? What do you see instead?
masterha_master_switch --conf=/etc/mha.cnf --master_state=alive
--new_master_host=10.30.70.54 --orig_master_is_new_slave
Wed Apr 11 11:27:26 2012 - [info] MHA::MasterRotate version 0.53.
Wed Apr 11 11:27:26 2012 - [info] Starting online master switch..
Wed Apr 11 11:27:26 2012 - [info]
Wed Apr 11 11:27:26 2012 - [info] * Phase 1: Configuration Check Phase..
Wed Apr 11 11:27:26 2012 - [info]
Wed Apr 11 11:27:26 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Apr 11 11:27:26 2012 - [info] Reading application default configurations
from /etc/mha.cnf..
Wed Apr 11 11:27:26 2012 - [info] Reading server configurations from
/etc/mha.cnf..
Wed Apr 11 11:27:26 2012 - [info] Current Alive Master:
10.30.36.132(10.30.36.132:3331)
Wed Apr 11 11:27:26 2012 - [info] Alive Slaves:
Wed Apr 11 11:27:26 2012 - [info] 10.30.70.54(10.30.70.54:3331)
Version=5.3.6-MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Apr 11 11:27:26 2012 - [info] Replicating from
10.30.36.132(10.30.36.132:3331)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before
switching. Is it ok to execute on 10.30.36.132(10.30.36.132:3331)? (YES/no):
YES
Wed Apr 11 11:27:35 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES.
This may take long time..
Wed Apr 11 11:27:35 2012 - [info] ok.
Wed Apr 11 11:27:35 2012 - [info] Checking MHA is not monitoring or doing
failover..
Wed Apr 11 11:27:35 2012 - [info] Checking replication health on 10.30.70.54..
Wed Apr 11 11:27:35 2012 - [info] ok.
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/ServerManager.pm,
ln1145] 10.30.70.54 is not alive!
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/MasterRotate.pm, ln232]
Failed to get new master!
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/ManagerUtil.pm, ln178]
Got ERROR: at /usr/bin/masterha_master_switch line 53
What version of the product are you using? On what operating system?
ii mha4mysql-manager 0.53 Master
High Availability Manager and Tools for MySQL, Manager Package
ii mha4mysql-node 0.53 Master
High Availability Manager and Tools for MySQL, Node Package
Debian 6.0.3
Please provide any additional information below.
When i reconfigure the mysql instances to port 3306 and re-configure the
mha.conf file it all works fine.
Broken config
[server default]
# mysql user and password
user=xxxxx
password=xxxxx
repl_user=xxxx
repl_password=xxxxx
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# manager log file
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=10.30.36.132
port=3331
[server2]
hostname=10.30.70.54
port=3331
Original issue reported on code.google.com by [email protected]
on 11 Apr 2012 at 7:59
Hi,
MHA worked fine on my two servers (debian) in 0.53. After upgrade the package
in 0.54, i have a problem when i start the manager
masterha_manager --conf=/etc/masterha/app1.cnf
-------
Wed Dec 12 14:56:57 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Dec 12 14:56:57 2012 - [info] Reading application default configurations
from /etc/masterha/app1.cnf..
Wed Dec 12 14:56:57 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln386] Error happend on checking configurations. Undefined subroutine
&MHA::NodeUtil::escape_for_shell called at /usr/share/perl5/MHA/Config.pm line
285.
Wed Dec 12 14:56:57 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm,
ln482] Error happened on monitoring servers.
Wed Dec 12 14:56:57 2012 - [info] Got exit code 1 (Not master dead).
----------
this is my app1.cnf
----------
[server default]
# mysql user and password
user=root
password=rootpass
ssh_user=mysql
# working directory on the manager
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
master_binlog_dir=/data/mysql/log_binaire
ping_interval=5
master_ip_failover_script=/home/mysql/master_ip_failover
[server1]
hostname=10.0.0.1
[server2]
hostname=10.0.0.2
----------
if i test the ssh with: masterha_check_ssh --conf=/etc/masterha/app1.cnf
i have this issue Wed Dec 12 14:59:42 2012 - [warning] Global configuration
file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 12 14:59:42 2012 - [info] Reading application default configurations
from /etc/masterha/app1.cnf..
Undefined subroutine &MHA::NodeUtil::escape_for_shell called at
/usr/share/perl5/MHA/Config.pm line 285.
thanks for your help
Sebastien
Original issue reported on code.google.com by [email protected]
on 12 Dec 2012 at 2:01
Hi.
I have a setup where I have one master database and multiple slaves.
Our application can talk to the master database for writes only and to the
slaves for reads.
The slaves are behind a mysql proxy so the read only load is divided among
them.
What I need to do is to create a setup where whenever the master MySQL server
goes down, it's virtual IP will be assigned to the new master (a former slave).
I was thinking about using a script and just let mysql-master-ha assign a new
IP to the new master.
But this may fail miserably if the reason for the old master server to go down
was e.g. someone pulling out the network cable. In that case I may suddenly
have two servers with the same IP.
Therefore I need to use clustering software like peacemaker together with
mysql-master-ha.
Is it possible to do that?
The document on
http://code.google.com/p/mysql-master-ha/wiki/Using_With_Clustering_Software
describes how to do that with a simple two node scenario but that wouldn't work
for me...
Can myster-ha maybe run a command to tell pacemaker to move over the virtual IP
whenever it decides to switch the master database ?
That way the peacemaker would make sure there will never be two servers with
the same virtual IP and at the same time mysql-master-ha would have control
over promotion of slaves to master?
Original issue reported on code.google.com by [email protected]
on 20 Sep 2012 at 12:32
i write script for starting/stopping masterha_manager as service in
system(ubuntu)
you advice use nohup or daemontools.
http://code.google.com/p/mysql-master-ha/wiki/Runnning_Background
i decide using nohup
my script looks very simple still
#############
#############
#############
#!/bin/bash
RETVAL=0
do_start() {
echo "Starting"
nohup masterha_manager --conf=/etc/mha_manager/app1.cnf < /dev/null > /home/mha4mysql/app1.log 2>&1 &
RETVAL=$?
echo
return $RETVAL
}
do_stop() {
echo "Stopping"
masterha_stop --conf=/etc/mha_manager/app1.cnf
RETVAL=$?
echo
return $RETVAL
}
do_stop_force() {
echo "Stopping"
masterha_stop --abort --conf=/etc/mha_manager/app1.cnf
RETVAL=$?
echo
return $RETVAL
}
case $* in
start)
do_start
;;
stop)
do_stop
;;
stop_force)
do_stop_force
;;
*)
echo "usage: $0 {start|stop|restart}" >&2
exit 1
;;
esac
exit $RETVAL
#############
#############
#############
when i start masterha_manager, i cannot know it starts good or not.
Usually when any daemon starts, it does all checks, then forks and return 0 as
state that all is good. Or other numer when there is an error.
It gives important information about successfull of start
If i start masterha_manager i cannot get any status code, because it doesnt
fork and work in one thread. If i start it with nohup i receive 0. Its nohup
status. Nohup says that it start good and nothing more. It can stop with error
little while, becouse masterha_manager will make checks and find any error.
I would want to get return code from masterha_manager, but cannot now with
nohup.
i think masterha_manager should be able to start as normal daemon with fork. It
should make checks all and fork only when all is good. And when it forks it'd
return 0. In other cases, when any checks were failed, it shouldnt fork and
must return error code.
or how can i write service script?
ps: same thing with masterha_stop: it returns 0 always. Even it cannot find
mha_manager process - it writes about it but anyway return 0;
for example
root@:/home/mha4mysql# ./mha4mysql.servise stop
Stopping
MHA Manager is not running on app1(2:NOT_RUNNING).
root@:/home/mha4mysql# echo $?
0
Original issue reported on code.google.com by [email protected]
on 4 Dec 2012 at 2:04
What steps will reproduce the problem?
1. Trying to run masterha_check_ssh --conf=/etc/app1.cnf
2.
3.
What is the expected output? What do you see instead?
I was expecting to get back some data like what is on this page:
http://code.google.com/p/mysql-master-ha/wiki/Requirements#SSH_public_key_authen
tication
What version of the product are you using? On what operating system?
mha4mysql-node-0.53-0.el6
mha4mysql-manager-0.53-0.el6
on CentOS6
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 25 Jun 2012 at 1:48
What steps will reproduce the problem?
1. I have testdb01 and testdb02 as master and slaves. I am switching to new
master testdb02
2. masterha_master_switch --conf=cluster.conf --master_state=alive
--new_master_host=testdb02
2.
3.
What is the expected output? What do you see instead?
The script should configure testdb02 as master and testdb01 as a slave of testdb02. Instead it spews the following output.
===================================
Wed Sep 7 00:30:49 2011 - [info] MHA::MasterRotate version 0.51.
Wed Sep 7 00:30:49 2011 - [info] Starting online master switch..
Wed Sep 7 00:30:49 2011 - [info]
Wed Sep 7 00:30:49 2011 - [info] * Phase 1: Configuration Check Phase..
Wed Sep 7 00:30:49 2011 - [info]
Wed Sep 7 00:30:49 2011 - [warn] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Sep 7 00:30:49 2011 - [info] Reading application default configurations
from cluster.conf..
Wed Sep 7 00:30:49 2011 - [info] Reading server configurations from
cluster.conf..
Wed Sep 7 00:30:49 2011 - [info] Current Master: testdb01(192.168.12.10:3306)
Wed Sep 7 00:30:49 2011 - [info] Alive Slaves:
Wed Sep 7 00:30:49 2011 - [info] testdb02(192.168.12.11:3306)
Version=5.1.52-community-log (oldest major version between slaves)
log-bin:enabled
Wed Sep 7 00:30:49 2011 - [info] Replicating from
testdb01(192.168.12.10:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before
switching. Is it ok to execute on testdb01(192.168.12.10:3306)? (YES/no): yes
Wed Sep 7 00:31:01 2011 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES.
This may take long time..
Wed Sep 7 00:31:01 2011 - [info] ok.
Wed Sep 7 00:31:01 2011 - [info] Checking MHA is not monitoring or doing
failover..
Wed Sep 7 00:31:01 2011 - [info] Checking replication health on testdb02..
Wed Sep 7 00:31:01 2011 - [info] ok.
Wed Sep 7 00:31:01 2011 - [info] testdb02 can be new master.
Wed Sep 7 00:31:01 2011 - [info]
From:
testdb01 (current master)
+--testdb02
To:
testdb02 (new master)
Starting master switch from testdb01(192.168.12.10:3306) to
testdb02(192.168.12.11:3306)? (yes/NO): yes
Wed Sep 7 00:31:19 2011 - [info] ** Phase 1: Configuration Check Phase
completed.
Wed Sep 7 00:31:19 2011 - [info]
Wed Sep 7 00:31:19 2011 - [info] * Phase 2: Rejecting updates Phase..
Wed Sep 7 00:31:19 2011 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on
the current master manually, applications keep writing on the current master.
Is it ok to proceed? (yes/NO): yes
Wed Sep 7 00:31:40 2011 - [info] Locking all tables on the orig master to
reject updates from everybody (including root):
Wed Sep 7 00:31:40 2011 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Sep 7 00:31:40 2011 - [info] ok.
Wed Sep 7 00:31:40 2011 - [info] Orig master binlog:pos is
mysql-bin.000005:519.
Wed Sep 7 00:31:40 2011 - [info] Waiting to execute all relay logs on
testdb02(192.168.12.11:3306)..
Wed Sep 7 00:31:40 2011 - [info] master_pos_wait(mysql-bin.000005:519)
completed on testdb02(192.168.12.11:3306). Executed 0 events.
Wed Sep 7 00:31:40 2011 - [info] done.
Wed Sep 7 00:31:40 2011 - [info] Getting new master's binlog name and
position..
Wed Sep 7 00:31:40 2011 - [info] mysql-bin.000004:106
Wed Sep 7 00:31:40 2011 - [info] All other slaves should start replication
from here. Statement should be: CHANGE MASTER TO MASTER_HOST='testdb02 or
192.168.12.11', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000004',
MASTER_LOG_POS=106, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Wed Sep 7 00:31:40 2011 - [info]
Wed Sep 7 00:31:40 2011 - [info] * Switching slaves in parallel..
Wed Sep 7 00:31:40 2011 - [info]
Wed Sep 7 00:31:40 2011 - [info] Unlocking all tables on the orig master:
Wed Sep 7 00:31:40 2011 - [info] Executing UNLOCK TABLES..
Wed Sep 7 00:31:40 2011 - [info] ok.
Wed Sep 7 00:31:40 2011 - [info] All new slave servers switched successfully.
Wed Sep 7 00:31:40 2011 - [info]
Wed Sep 7 00:31:40 2011 - [info] * Phase 5: New master cleanup phease..
Wed Sep 7 00:31:40 2011 - [info]
Wed Sep 7 00:31:40 2011 - [info] Switching master to
testdb02(192.168.12.11:3306) completed successfully.
===============================
I want the script to configure new slave so that it connects and gets updates from new master.
What version of the product are you using? On what operating system?
centos 5.6.
Please provide any additional information below.
I have only two hosts defined in cluster.conf.
testdb01 and testdb02
Original issue reported on code.google.com by [email protected]
on 7 Sep 2011 at 6:04
This is not a problem report, rather a question or request for more
documentation.
Do you know the minimum required privileges for the mysql user which will be
used by mha to do the failover? I'd rather not use a full dba account if
possible.
Of course I'm also chmoding 600 to the cnf file containing the password.
Original issue reported on code.google.com by [email protected]
on 2 Jan 2013 at 9:30
1.have 1 master 1 slave.
2 When i'm running the command masterha_master_switch --master_state=alive
--conf=/etc/app1.cnf --new_master_host=db1 i get the following output. It
seems that the manager connect to the mysql DB using root no pass
Mon Nov 26 00:46:38 2012 - [info] MHA::MasterRotate version 0.53.
Mon Nov 26 00:46:38 2012 - [info] Starting online master switch..
Mon Nov 26 00:46:38 2012 - [info]
Mon Nov 26 00:46:38 2012 - [info] * Phase 1: Configuration Check Phase..
Mon Nov 26 00:46:38 2012 - [info]
Mon Nov 26 00:46:38 2012 - [info] Reading default configuratoins from
/etc/masterha_default.cnf..
Mon Nov 26 00:46:38 2012 - [info] Reading application default configurations
from /etc/app1.cnf..
Mon Nov 26 00:46:38 2012 - [info] Reading server configurations from
/etc/app1.cnf..
Mon Nov 26 00:46:38 2012 - [info] Current Alive Master: db1(10.0.1.248:3306)
Mon Nov 26 00:46:38 2012 - [info] Alive Slaves:
Mon Nov 26 00:46:38 2012 - [info] db3(10.0.1.49:3306)
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves)
log-bin:enabled
Mon Nov 26 00:46:38 2012 - [info] Replicating from
10.0.1.248(10.0.1.248:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before
switching. Is it ok to execute on db1(10.0.1.248:3306)? (YES/no):
Mon Nov 26 00:46:40 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES.
This may take long time..
Mon Nov 26 00:46:40 2012 - [info] ok.
Mon Nov 26 00:46:40 2012 - [info] Checking MHA is not monitoring or doing
failover..
Mon Nov 26 00:46:40 2012 - [info] Checking replication health on db3..
Mon Nov 26 00:46:40 2012 - [info] ok.
Mon Nov 26 00:46:40 2012 - [info] db3 can be new master.
Mon Nov 26 00:46:40 2012 - [info]
From:
db1 (current master)
+--db3
To:
db3 (new master)
Starting master switch from db1(10.0.1.248:3306) to db3(10.0.1.49:3306)?
(yes/NO): no
Continue? (yes/NO): yes
Enter new master host name: db3
Master switch to db3(10.0.1.49:3306). OK? (yes/NO): yes
Mon Nov 26 00:47:03 2012 - [info] Checking whether db3(10.0.1.49:3306) is ok
for the new master..
Mon Nov 26 00:47:03 2012 - [info] ok.
Mon Nov 26 00:47:03 2012 - [info] ** Phase 1: Configuration Check Phase
completed.
Mon Nov 26 00:47:03 2012 - [info]
Mon Nov 26 00:47:03 2012 - [info] * Phase 2: Rejecting updates Phase..
Mon Nov 26 00:47:03 2012 - [info]
Mon Nov 26 00:47:03 2012 - [info] Executing master ip online change script to
disable write on the current master:
Mon Nov 26 00:47:03 2012 - [info] /opt/scripts/master_ip_online_change
--command=stop --orig_master_host=db1 --orig_master_ip=10.0.1.248
--orig_master_port=3306 --new_master_host=db3 --new_master_ip=10.0.1.49
--new_master_port=3306
Got Error: DBI
connect(';host=10.0.1.49;port=3306;mysql_connect_timeout=4','',...) failed:
Access denied for user 'root'@'10.0.1.45' (using password: NO) at
/usr/share/perl5/MHA/DBHelper.pm line 181
at /opt/scripts/master_ip_online_change line 128
What is the expected output? What do you see instead?
Connect To the db with username and password in the configuration file.
What version of the product are you using? On what operating system?
0.5.3
Please provide any additional information below.
Below my conf files
# mysql user and password
user=replication
repl_user=replication
repl_password=xxx
password=xxx
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1.log
[server1]
hostname=db1
ignore_fail=1
[server3]
hostname=db3
ignore_fail=1
--------------------
[server default]
user=replication
password=xxx
repl_password=xxx
ssh_user=root
master_binlog_dir= /var/log/mysql
remote_workdir=/data/log/masterha
ping_interval=3
master_ip_failover_script= /opt/scripts/master_ip_failover
master_ip_online_change_script= /opt/scripts/master_ip_online_change
~
~
Regards
Original issue reported on code.google.com by [email protected]
on 26 Nov 2012 at 1:06
Current documentation does not include "$conf" parameter which generates error
during sending email notification.
Proposed code for (MHA Manager package)/samples/scripts/send_report script.
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $conf, $subject,
$body );
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'conf=s' => \$conf,
'subject=s' => \$subject,
'body=s' => \$body,
);
Original issue reported on code.google.com by [email protected]
on 13 Oct 2011 at 6:47
What steps will reproduce the problem?
When event scheduler is activated, a process can block the failover due that
the manager consider that like a long update.
What is the expected output?
You should ignore the event scheduler thread in the failover process.
It's possible to bypass this issue with a "set global event_scheduler=OFF"
command on the mysql servers(Master and slaves)
What do you see instead?
Tue Jun 12 11:27:56 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES.
This may take long time..
Tue Jun 12 11:27:56 2012 - [info] ok.
Tue Jun 12 11:27:56 2012 - [info] Checking MHA is not monitoring or doing
failover..
Tue Jun 12 11:27:56 2012 - [info] Checking replication health on XXX.XX.XX.XX..
Tue Jun 12 11:27:56 2012 - [info] ok.
Tue Jun 12 11:27:56 2012 - [error][/.../MasterRotate.pm, ln161] We should not
start online master switch when one of connections are running long updates on
the current master. Currently 1 update thread(s) are running.
Details:
{'Time' => '48476','Command' => 'Daemon','db' => undef,'Id' => '1','Info' =>
undef,'User' => 'event_scheduler','State' => 'Waiting for next
activation','Host' => 'localhost'}
Tue Jun 12 11:27:56 2012 - [error][/.../ManagerUtil.pm, ln178] Got ERROR: at
/.../masterha_master_switch line 53
What version of the product are you using? On what operating system?
0.53
Debian 6
Original issue reported on code.google.com by [email protected]
on 12 Jun 2012 at 4:36
What steps will reproduce the problem?
1.Enable shutdown script
2. masterha_check_repl --conf=/etc/app1.cnf
3.
What is the expected output? What do you see instead?
Checking shutdown script status:
What version of the product are you using? On what operating system?
0.53
Please provide any additional information below.
Can't locate Net/Telnet.pm in @INC (@INC contains: /usr/local/lib64/perl5
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at
/usr/local/sample/bin/power_manager line 27
I've tried to search for this telnet.pn file in older versions 0.52 , but I
could not :(
Original issue reported on code.google.com by [email protected]
on 16 Feb 2012 at 11:09
What steps will reproduce the problem?
1. make git clone sources of mha node (or manager)
2. Do steps like in wiki
$ perl Makefile.PL
$ make
$ sudo make install
3. it installs "bin" files in /usr/local/bin and my system(ubuntu) doesnt see
it.
these files are : apply_diff_relay_logs filter_mysqlbinlog purge_relay_logs
save_binary_logs
If i use deb package, the dpkg installs these files in /usr/bin and all work
good.
So how to change installation dir for 'bin' files?
thanks in advance.
ps: as temporary solution i make symbolic links of these files to /usr/bin/.
but it's not good and easy for deploing on many machines
Original issue reported on code.google.com by [email protected]
on 15 Nov 2012 at 2:19
> What steps will reproduce the problem?
1. Configure passwordless ssh access on all servers with user 'root'
2. Use virtual IP on current master as server1
3. Run masterha_check_ssh on manager host
> What is the expected output? What do you see instead?
All ssh checks work manually .. so, the script is expected to confirm this ..
but instead I get the output below:
Wed Jul 18 08:50:38 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Jul 18 08:50:38 2012 - [info] Reading application default configurations
from /etc/app1.cnf..
Wed Jul 18 08:50:38 2012 - [info] Reading server configurations from
/etc/app1.cnf..
Wed Jul 18 08:50:38 2012 - [info] Starting SSH connection tests..
Wed Jul 18 08:50:38 2012 - [debug]
Wed Jul 18 08:50:38 2012 - [debug] Connecting via SSH from
[email protected](10.0.0.50:22) to [email protected](10.0.0.14:22)..
Wed Jul 18 08:50:38 2012 - [debug] ok.
Wed Jul 18 08:50:38 2012 - [debug] Connecting via SSH from
[email protected](10.0.0.50:22) to [email protected](10.0.0.12:22)..
Wed Jul 18 08:50:38 2012 - [debug] ok.
Wed Jul 18 08:50:39 2012 - [debug]
Wed Jul 18 08:50:38 2012 - [debug] Connecting via SSH from
[email protected](10.0.0.14:22) to [email protected](10.0.0.50:22)..
Wed Jul 18 08:50:38 2012 - [debug] ok.
Wed Jul 18 08:50:38 2012 - [debug] Connecting via SSH from
[email protected](10.0.0.14:22) to [email protected](10.0.0.12:22)..
Wed Jul 18 08:50:39 2012 - [debug] ok.
Wed Jul 18 08:50:39 2012 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln63]
Wed Jul 18 08:50:39 2012 - [debug] Connecting via SSH from
[email protected](10.0.0.12:22) to [email protected](10.0.0.50:22)..
Permission denied (publickey,password).
Wed Jul 18 08:50:39 2012 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln107] SSH
connection from [email protected](10.0.0.12:22) to [email protected](10.0.0.50:22)
failed!
SSH Configuration Check Failed!
at /usr/bin/masterha_check_ssh line 44
(now, the following is a manual test immediately after the failed run)#
root@staging:~# ssh -b 10.0.0.12 -l root 10.0.0.50
Linux live1 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have new mail.
Last login: Wed Jul 18 14:03:09 2012 from staging
> What version of the product are you using? On what operating system?
I am using masterha 0.53 on debian 6
> Please provide any additional information below.
I strace'd the run, and peeked in the logs and it seems this is something to do
with the temp file created in $workdir for each check .. but the script also
removes these temp files, so we cannot ascertain why or what in the log for
this check is making the script report it as a failed connection attempt)
Original issue reported on code.google.com by [email protected]
on 18 Jul 2012 at 1:06
What steps will reproduce the problem?
I have set up 2 different version of mysql on same machine and made one the
master of the other.
and followed all the steps given in
http://code.google.com/p/mysql-master-ha/wiki/Tutorial#Installing_MHA_Manager_on
_host4%28manager_host%29
But for testing master failover if I kill the master I get the following output
Tue Nov 8 21:24:28 2011 - [info] Ping succeeded, sleeping until it doesn't
respond..
Tue Nov 8 21:24:49 2011 - [warning] Got error on MySQL ping: 2006 (MySQL
server has gone away)
Tue Nov 8 21:24:49 2011 - [info] Executing seconary network check script:
masterha_secondary_check -s remote_host1 -s remote_host2 --user=root
--master_host=127.0.0.1 --master_ip=127.0.0.1 --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov 8 21:24:49 2011 - [warning] At least one of monitoring servers is not
reachable from this script. This is likely network problem. Failover should not
happen.
Tue Nov 8 21:24:49 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov 8 21:24:52 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:24:52 2011 - [warning] Connection failed 1 time(s)..
Tue Nov 8 21:24:55 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:24:55 2011 - [warning] Connection failed 2 time(s)..
Tue Nov 8 21:24:58 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:24:58 2011 - [warning] Connection failed 3 time(s)..
Tue Nov 8 21:24:58 2011 - [warning] Secondary network check script returned
errors. Failover should not start so checking server status again. Check
network settings for details.
Tue Nov 8 21:25:01 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:25:01 2011 - [warning] Connection failed 1 time(s)..
Tue Nov 8 21:25:01 2011 - [info] Executing seconary network check script:
masterha_secondary_check -s remote_host1 -s remote_host2 --user=root
--master_host=127.0.0.1 --master_ip=127.0.0.1 --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov 8 21:25:01 2011 - [warning] At least one of monitoring servers is not
reachable from this script. This is likely network problem. Failover should not
happen.
Tue Nov 8 21:25:01 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov 8 21:25:04 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:25:04 2011 - [warning] Connection failed 2 time(s)..
Tue Nov 8 21:25:07 2011 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Tue Nov 8 21:25:07 2011 - [warning] Connection failed 3 time(s)..
Tue Nov 8 21:25:07 2011 - [warning] Secondary network check script returned
errors. Failover should not start so checking server status again. Check
network settings for details.
What version of the product are you using? On what operating system?
On liNUX
Original issue reported on code.google.com by [email protected]
on 8 Nov 2011 at 4:00
I'd like to specify the "identity_file" or "options" for ssh inside conf files,
because I'm trying to use "ssh_user=mysql", but "id_rsa" file is inside mysql's
home.
So, when I try to use "masterha_check_ssh" using root privileges, ssh uses
"/root/.ssh/id_rsa"
Original issue reported on code.google.com by [email protected]
on 23 Nov 2011 at 4:09
What steps will reproduce the problem?
Install a later version of Log::Dispatch
What is the expected output? What do you see instead?
The error we get when we use switch master is
Sun Nov 11 05:15:09 2012 - [info] MHA::MasterRotate version 0.53.
Sun Nov 11 05:15:09 2012 - [info] Starting online master switch..
Sun Nov 11 05:15:09 2012 - [error][/usr/lib/perl5/vendor_perl /MHA/ManagerUtil.pm, ln178] Got ERROR: Use of uninitialized value in scalar chomp at /usr/lib/perl5/vendor_perl/MHA/ManagerConst.pm line 90.
What version of the product are you using? On what operating system?
0.53 Redhat linux 5.6
Please provide any additional information below.
When we try to failover and there is a newer version of the module (Log::Dispatch) installed the error message is not helpful.
Original issue reported on code.google.com by [email protected]
on 11 Nov 2012 at 12:16
What steps will reproduce the problem?
1. define relay_log with an absolute path in my.cnf (eg:
relay_log=/var/lib/mysql/logs/relay-log)
2. define datadir in my.cnf (eg: datadir=/var/lib/mysql/data)
What is the expected output? What do you see instead?
The relay_log_file should be "/var/lib/mysql/logs/relay-log", instead of
"/var/lib/mysql/data//var/lib/mysql/logs/relay-log.info"
What version of the product are you using? On what operating system?
0.52 on CentOS 5.7 (using RPM dist)
Original issue reported on code.google.com by [email protected]
on 23 Nov 2011 at 5:18
Attachments:
What steps will reproduce the problem?
There is simple mysql-slave replication.
For test:
172.16.50.11 - master
172.16.50.14 - slave
I am test functionality of secondary_check_script
In mha conf i added
secondary_check_script = masterha_secondary_check -s 172.16.50.14
On 172.16.50.11 i execute masterha_master_monitor and then test and look how
fail-over will doing and how masterha_secondary_check will work
masterha_master_monitor --conf=/etc/mha_manager/app1.cnf
After starting manager, in other terminal i shutdown master 172.16.50.11
but unfortunately i got next error messages
#############
#############
Fri Nov 16 16:30:10 2012 - [info]
172.16.50.11 (current master)
+--172.16.50.14
Fri Nov 16 16:30:10 2012 - [warning] master_ip_failover_script is not defined.
Fri Nov 16 16:30:10 2012 - [warning] shutdown_script is not defined.
Fri Nov 16 16:30:10 2012 - [info] Set master ping interval 3 seconds.
Fri Nov 16 16:30:10 2012 - [info] Set secondary check script:
masterha_secondary_check -s 172.16.50.14
Fri Nov 16 16:30:10 2012 - [info] Starting ping health check on
172.16.50.11(172.16.50.11:3306)..
Fri Nov 16 16:30:10 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL
doesn't respond..
Fri Nov 16 16:30:22 2012 - [warning] Got error on MySQL select ping: 2006
(MySQL server has gone away)
Fri Nov 16 16:30:22 2012 - [info] Executing SSH check script: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/home/mysqldata/
--output_file=/home/mha_manager_data/app1/save_binary_logs_test
--manager_version=0.54 --binlog_prefix=mysql-bin
Fri Nov 16 16:30:22 2012 - [info] Executing seconary network check script:
masterha_secondary_check -s 172.16.50.14 --user=mha4mysql
--master_host=172.16.50.11 --master_ip=172.16.50.11 --master_port=3306
command-line line 0: invalid time value.
Monitoring server 172.16.50.14 is NOT reachable!
Fri Nov 16 16:30:22 2012 - [warning] At least one of monitoring servers is not
reachable from this script. This is likely network problem. Failover should not
happen.
Creating /home/mha_manager_data/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /home/mysqldata/, up to mysql-bin.000016
Fri Nov 16 16:30:23 2012 - [info] HealthCheck: SSH to 172.16.50.11 is reachable.
#############
#############
well, i start debug it
Found in masterha_secondary_check at line 78 place where $comand construct
i write "print $command" and get constructed @command
ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes
-o ConnectTimeout=VAR_CONNECT_TIMEOUT -p 22 [email protected] "perl -e
\"use IO::Socket::INET; my \\\$sock = IO::Socket::INET->new(PeerAddr =>
\\\"172.16.50.11\\\", PeerPort=> 3306, Proto =>'tcp', Timeout => 4);
if(\\\$sock) { close(\\\$sock); exit 3; } exit 0;\" "
For some reason there is VAR_CONNECT_TIMEOUT variable exists here.
If i comment(or erace) place with VAR_CONNECT_TIMEOUT, then it works and could
connect to 172.16.50.14 and mha_manager correctly can use this check in work
Is it bug or i forgot something configure in cfg?
Original issue reported on code.google.com by [email protected]
on 16 Nov 2012 at 1:09
I think about next situation:
there are 4 machines
m1 - current master
m2 - slave, candidate to master
m3 - slave, candidate to master
m4 - slave, cannot be master
They are using asynchronous replication.
When m1 will fully dead (poweroff for example), mha_manager starts do failover
one of step of failover is synchronisation of binlogs between slaves. Mha finds
slave with newest replication position, download binlog from it and apply diffs
to other slaves.
there is a chance that m4 will have the newest data.
Will mha copy binlog from m4 and aplly diffs to m2 and m3, if m4 is defined as
"cannot be master"?
Or will mha compare logs only between m2 and m3?
Original issue reported on code.google.com by [email protected]
on 20 Dec 2012 at 2:19
mysql-master-ha can add an option to modify ssh_port many ssh server is not
running on standard port 22!
Original issue reported on code.google.com by unix114
on 21 Oct 2011 at 8:08
What steps will reproduce the problem?
1. execute sudo ./masterha_check_repl --conf=/etc/app1.cnf
2.Expected output:
MySQL Replication Health
3.Output got: Mon Oct 31 16:58:54 2011 - [info] Reading default configuratoins
from /etc/masterha_default.cnf..
Mon Oct 31 16:58:54 2011 - [info] Reading application default configurations
from /etc/app1.cnf..
Mon Oct 31 16:58:54 2011 - [info] Reading server configurations from
/etc/app1.cnf..
Mon Oct 31 16:58:54 2011 - [info] MHA::MasterMonitor version 0.52.
Mon Oct 31 16:58:54 2011 -
[error][/usr/local/share/perl/5.10.1/MHA/MasterMonitor.pm, ln315] Error happend
on checking configurations. Use of uninitialized value $datadir in
concatenation (.) or string at /usr/local/share/perl/5.10.1/MHA/SlaveUtil.pm
line 123.
Mon Oct 31 16:58:54 2011 -
[error][/usr/local/share/perl/5.10.1/MHA/MasterMonitor.pm, ln396] Error
happened on monitoring servers.
Mon Oct 31 16:58:54 2011 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 31 Oct 2011 at 11:33
What steps will reproduce the problem?
MHA manager uses ssh to manage nodes with a ssh_user defined in configuration
file. Some of commands must have admin rights on nodes, meaning that by
default, if you do not set ssh_user to root, it will not work.
In a production environment, ssh is not allowed to root.
ssh_user should be sudoer with no password and on manager, commands should use
sudo (apply_diff_relay_logs, save_binary_logs)
Thanks for you work. This is a very nice project.
Original issue reported on code.google.com by [email protected]
on 19 Apr 2012 at 9:56
What steps will reproduce the problem?
1. run master_check_ssh
What is the expected output? What do you see instead?
ssh connect errors as my servers listen for ssh on a non-standard port.
What version of the product are you using? On what operating system?
manager is 0.55
Please provide any additional information below.
Adding a configure option for ssh_port would be very useful. The nuisance of
it is that ssh uses "-p" for port and scp uses "-P".
I'm just getting started but this project looks to have very nicely addressed a
complicated process in an elegant way. Thank you.
Original issue reported on code.google.com by [email protected]
on 20 Dec 2012 at 10:48
hi:
When I manual or automatic switching time error occurs.
masterha_master_switch --master_state=dead --conf=/etc/app1.conf --dead_master_host=10.58.99.69 --new_master_host=10.58.99.71
Mistakes as follows:
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerUtil.pm, ln178] Got ERROR:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerConst.pm line 90.
can not switch!
Sincere advice is what causes the situation?
thanks...
Original issue reported on code.google.com by [email protected]
on 27 Jul 2012 at 9:41
1. Try start masterha_check_repl --conf=/etc/masterha/app1.cnf
2. Get this error:
[root@EGSNS-49-2 bin]# ./masterha_check_repl --conf=/etc/masterha/app1.cnf
Fri Jun 1 14:13:46 2012 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Fri Jun 1 14:13:46 2012 - [info] Reading application default configurations
from /etc/masterha/app1.cnf..
Fri Jun 1 14:13:46 2012 - [info] Reading server configurations from
/etc/masterha/app1.cnf..
Fri Jun 1 14:13:46 2012 - [info] MHA::MasterMonitor version 0.53.
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm,
ln151] Can't exec "apply_diff_relay_logs": No such file or directory at
/usr/local/share/perl5/MHA/ManagerUtil.pm line 116.
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln383] Error happend on checking configurations. Died at
/usr/local/share/perl5/MHA/ManagerUtil.pm line 152.
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln478] Error happened on monitoring servers.
Fri Jun 1 14:13:46 2012 - [info] Got exit code 1 (Not master dead).
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm,
ln122] Got error when getting node version. Error:
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm,
ln123]
MySQL Replication Health is NOT OK!
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm,
ln151] Use of uninitialized value $host in concatenation (.) or string at
/usr/local/share/perl5/MHA/ManagerUtil.pm line 139.
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln383] Error happend on checking configurations. Died at
/usr/local/share/perl5/MHA/ManagerUtil.pm line 152.
Fri Jun 1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,
ln478] Error happened on monitoring servers.
Fri Jun 1 14:13:46 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NO
3. This is my /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/
user=root
password=emag@234
ssh_user=root
repl_user=rep
repl_password=rep
shutdown_script=""
master_ip_failover_script="/apps/mha4mysql-manager-0.53/samples/scripts/master_i
p_failover"
report_script=""
remote_workdir=/apps/mha4mysql-node-0.53/section1
[server1]
hostname=192.168.49.9
[server2]
hostname=192.168.49.2
candidate_master=1
[server3]
hostname=192.168.49.1
[server4]
hostname=192.168.49.3
[server5]
hostname=192.168.49.4
Pls help me explain why does this error happen?
OS:redhat 6.2 64bit
Mysql: 5.5.22( build from source )
basedir: /apps/mysql
datadir: /apps/mysql/data
current master:192.168.49.9
standby master:192.168.49.1
masterha_check_ssh:OK
Pls help me check and give me an advice as soon as possible
Thanks,
[email protected]
Original issue reported on code.google.com by [email protected]
on 1 Jun 2012 at 6:28
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.