twitter-archive / mysos Goto Github PK

View Code? Open in Web Editor NEW

592.0 592.0 70.0 720 KB

Cotton (formerly known as Mysos)

Home Page: https://incubator.apache.org/projects/cotton.html

mysos's People

Contributors

Stargazers

Watchers

mysos's Issues

slave error: Failed to fetch URIs for container

I'm trying to run mysos on Openstack. Since I'm not using vagrant ( unable to start a VM inside an opentack instance) I had to change the scripts. here is the list of modifications:
• changed the hardcoded ip address to the host private ip address on config and script files.
• changed the username from vagrant to ubuntu on config and script files.
I was able to install and start zookeeper, mesos and mysos-scheduler. they are all connected through zookeeper.
here is how I run different services:

mesos-master:


sudo mesos-master \
--zk=zk://1.125.1.5:2181/mesos/master \
--ip=1.125.1.5 \
--work_dir=/home/ubuntu/var/local/mesos/master/db \
--quorum=1 \
--roles=mysos \
--credentials=/home/ubuntu/mysos/vagrant/etc/framework_keys.txt \
--log_dir=/home/ubuntu/log-mysos/master \
 --no-authenticate_slave

mesos-slave:


sudo mesos-slave \ 
--master=zk://1.125.1.5.1:2181/mesos/master \ 
--ip=1.125.1.5 \ 
--hostname=1.125.1.5 \ 
--resources="cpus(mysos):4;mem(mysos):1024;disk(mysos):20000;ports(mysos):[31000-32000]" \ 
--isolation="cgroups/cpu,cgroups/mem" \ 
--cgroups_enable_cfs \ 
--log_dir=/home/ubuntu/log-mysos/slave  \ 
--frameworks_home=/home/ubuntu/mysos/vagrant/bin

mysos-scheduler:


mysos_scheduler \
    --port=55001 \
    --framework_user=ubuntu \
    --mesos_master=zk://1.125.1.5:2181/mesos/master \
    --executor_uri=/home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zip \
    --executor_cmd=/home/ubuntu/mysos/vagrant/bin/mysos_executor.sh \
    --zk_url=zk://1.125.1.5:2181/mysos \
    --admin_keypath=/home/ubuntu/mysos/vagrant/etc/admin_keyfile.yml \
    --framework_failover_timeout=1m \
    --framework_role=mysos \
    --framework_authentication_file=/home/ubuntu/mysos/vagrant/etc/fw_auth_keyfile.yml \
    --scheduler_keypath=/home/ubuntu/mysos/vagrant/etc/scheduler_keyfile.txt \
    --executor_source_prefix='vagrant.devcluster' \
    --executor_environ='[{"name": "MYSOS_DEFAULTS_FILE", "value": "/etc/mysql/conf.d/my5.6.cnf"}]'

now, when I try to create a cluster using the following command:

curl -X POST mysos_host_ip:55001/clusters/test_cluster3 --form "cluster_user=mysos"

on mysos scheduler:


I0707 14:12:27.885504 13297 connection.py:276] Sending request(xid=119): Exists(path='/mysos/state/clusters', watcher=None)
I0707 14:12:27.886852 13297 connection.py:360] Received response(xid=119): ZnodeStat(czxid=9180, mzxid=9180, ctime=1436278298268, mtime=1436278298268, version=0, cversion=1, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=9181)
I0707 14:12:27.887795 13297 connection.py:276] Sending request(xid=120): Exists(path='/mysos/state/clusters/test_cluster2', watcher=None)
I0707 14:12:27.888803 13297 connection.py:360] Received response(xid=120): ZnodeStat(czxid=9181, mzxid=9214, ctime=1436278298278, mtime=1436278345754, version=29, cversion=0, aversion=0, ephemeralOwner=0, dataLength=1400, numChildren=0, pzxid=9181)
I0707 14:12:27.889123 13297 connection.py:276] Sending request(xid=121): SetData(path='/mysos/state/clusters/test_cluster2', data="ccopy_reg\n_reconstructor\np1\n(cmysos.scheduler.state\nMySQLCluster\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'encrypted_password'\np6\ng1\n(cnacl.utils\nEncryptedMessage\np7\nc__b  uiltin__\nstr\np8\nS'A\\xdeI!J\\xef\\x86\\xae\\\\\\xd3\\x92!\\xef0\\xab\\x91\\xf1\\xab\\xbcP\\x95\\xd1\\x18<\\xcf\\xe5}Gu\\xc9\\nK&\\xd3\\x0eAF\\x80D\\x89T\\x06s\\xc1w\\xf4\\x1c\\xe9\\xa4\\xe5\\x10$\\xa2\\x94\\r\\x86\\x00=8&\\xff'\ntRp9\n(dp10\nS'_ciphertext'\np11\nS'\\xcf\\xe5}Gu\\xc9\\nK&\\xd3\\x0eAF\\x80D\\x89T  \\x06s\\xc1w\\xf4\\x1c\\xe9\\xa4\\xe5\\x10$\\xa2\\x94\\r\\x86\\x00=8&\\xff'\np12\nsS'_nonce'\np13\nS'A\\xdeI!J\\xef\\x86\\xae\\\\\\xd3\\x92!\\xef0\\xab\\x91\\xf1\\xab\\xbcP\\x95\\xd1\\x18<'\np14\nsbsS'backup_id'\np15\nNsS'name'\np16\nS'test_cluster2'\np17\nsS'mem'\np18\ng1\n(ctwitter.common.quantity\nAmount\np19\n  g3\nNtRp20\n(dp21\nS'_unit'\np22\ng1\n(ctwitter.common.quantity\nData\np23\ng3\nNtRp24\n(dp25\nS'_multiplier'\np26\nI1048576\nsS'_display'\np27\nS'MB'\np28\nsbsS'_amount'\np29\nI512\nsbsS'cpus'\np30\nF1\nsS'num_nodes'\np31\nI1\nsS'tasks'\np32\n(dp33\nsS'user'\np34\nS'mysos'\np35\nsS'members'\np36\n(dp37\nsS'master  _id'\np38\nNsS'next_epoch'\np39\nI0\nsS'next_id'\np40\nI15\nsS'disk'\np41\ng1\n(g19\ng3\nNtRp42\n(dp43\ng22\ng1\n(g23\ng3\nNtRp44\n(dp45\ng26\nI1073741824\nsg27\nS'GB'\np46\nsbsg29\nI2\nsbsb.", version=-1)
I0707 14:12:27.909009 13297 connection.py:360] Received response(xid=121): ZnodeStat(czxid=9181, mzxid=9215, ctime=1436278298278, mtime=1436278347889, version=30, cversion=0, aversion=0, ephemeralOwner=0, dataLength=1133, numChildren=0, pzxid=9181)
I0707 14:12:27.909272 13297 launcher.py:484] Checkpointed the status update for task mysos-test_cluster2-14 of cluster test_cluster2
I0707 14:12:28.751266 13297 launcher.py:185] Launcher test_cluster2 accepted offer 20150707-140838-83983617-5050-13042-22 on Mesos slave 20150707-140838-83983617-5050-13042-0 (1.125.1.5)
I0707 14:12:28.751960 13297 launcher.py:305] Executor will use environment variable: {u'name': u'MYSOS_DEFAULTS_FILE', u'value': u'/etc/mysql/conf.d/my5.6.cnf'}
I0707 14:12:28.752923 13297 connection.py:276] Sending request(xid=122): Exists(path='/mysos/state/clusters', watcher=None)
I0707 14:12:28.754126 13297 connection.py:360] Received response(xid=122): ZnodeStat(czxid=9180, mzxid=9180, ctime=1436278298268, mtime=1436278298268, version=0, cversion=1, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=9181)
I0707 14:12:28.754930 13297 connection.py:276] Sending request(xid=123): Exists(path='/mysos/state/clusters/test_cluster2', watcher=None)
I0707 14:12:28.755887 13297 connection.py:360] Received response(xid=123): ZnodeStat(czxid=9181, mzxid=9215, ctime=1436278298278, mtime=1436278347889, version=30, cversion=0, aversion=0, ephemeralOwner=0, dataLength=1133, numChildren=0, pzxid=9181)
I0707 14:12:28.756345 13297 connection.py:276] Sending request(xid=124): SetData(path='/mysos/state/clusters/test_cluster2', data="ccopy_reg\n_reconstructor\np1\n(cmysos.scheduler.state\nMySQLCluster\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'encrypted_password'\np6\ng1\n(cnacl.utils\nEncryptedMessage\np7\nc__b  uiltin__\nstr\np8\nS'A\\xdeI!J\\xef\\x86\\xae\\\\\\xd3\\x92!\\xef0\\xab\\x91\\xf1\\xab\\xbcP\\x95\\xd1\\x18<\\xcf\\xe5}Gu\\xc9\\nK&\\xd3\\x0eAF\\x80D\\x89T\\x06s\\xc1w\\xf4\\x1c\\xe9\\xa4\\xe5\\x10$\\xa2\\x94\\r\\x86\\x00=8&\\xff'\ntRp9\n(dp10\nS'_ciphertext'\np11\nS'\\xcf\\xe5}Gu\\xc9\\nK&\\xd3\\x0eAF\\x80D\\x89T  \\x06s\\xc1w\\xf4\\x1c\\xe9\\xa4\\xe5\\x10$\\xa2\\x94\\r\\x86\\x00=8&\\xff'\np12\nsS'_nonce'\np13\nS'A\\xdeI!J\\xef\\x86\\xae\\\\\\xd3\\x92!\\xef0\\xab\\x91\\xf1\\xab\\xbcP\\x95\\xd1\\x18<'\np14\nsbsS'backup_id'\np15\nNsS'name'\np16\nS'test_cluster2'\np17\nsS'mem'\np18\ng1\n(ctwitter.common.quantity\nAmount\np19\n  g3\nNtRp20\n(dp21\nS'_unit'\np22\ng1\n(ctwitter.common.quantity\nData\np23\ng3\nNtRp24\n(dp25\nS'_multiplier'\np26\nI1048576\nsS'_display'\np27\nS'MB'\np28\nsbsS'_amount'\np29\nI512\nsbsS'cpus'\np30\nF1\nsS'num_nodes'\np31\nI1\nsS'tasks'\np32\n(dp33\nVmysos-test_cluster2-15\np34\ng1\n(cmysos.scheduler.state\nMySQL  Task\np35\ng3\nNtRp36\n(dp37\nS'hostname'\np38\nV1.125.1.5\np39\nsS'task_id'\np40\ng34\nsS'mesos_slave_id'\np41\nV20150707-140838-83983617-5050-13042-0\np42\nsS'cluster_name'\np43\ng17\nsS'state'\np44\nI6\nsS'port'\np45\nI31400\nsbssS'user'\np46\nS'mysos'\np47\nsS'members'\np48\n(dp49\nsS'master_id'\np50\nNsS'next  _epoch'\np51\nI0\nsS'next_id'\np52\nI16\nsS'disk'\np53\ng1\n(g19\ng3\nNtRp54\n(dp55\ng22\ng1\n(g23\ng3\nNtRp56\n(dp57\ng26\nI1073741824\nsg27\nS'GB'\np58\nsbsg29\nI2\nsbsb.", version=-1)
I0707 14:12:28.763336 13297 connection.py:360] Received response(xid=124): ZnodeStat(czxid=9181, mzxid=9216, ctime=1436278298278, mtime=1436278348756, version=31, cversion=0, aversion=0, ephemeralOwner=0, dataLength=1400, numChildren=0, pzxid=9181)
I0707 14:12:28.763725 13297 launcher.py:202] Launching task mysos-test_cluster2-15 on Mesos slave 20150707-140838-83983617-5050-13042-0 (1.125.1.5)
I0707 14:12:29.879532 13297 launcher.py:395] Updating state of task mysos-test_cluster2-15 of cluster test_cluster2 from TASK_STAGING to TASK_LOST
E0707 14:12:29.879740 13297 launcher.py:443] Task mysos-test_cluster2-15 is now in terminal state TASK_LOST with message 'Executor terminated'
W0707 14:12:29.879869 13297 launcher.py:474] Slave mysos-test_cluster2-15 of cluster test_cluster2 failed to start running

on mesos-master:


I0707 14:12:19.743690 13045 master.cpp:3559] Sending 1 offers to framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:19.778714 13043 master.cpp:2169] Processing reply for offers: [ 20150707-140838-83983617-5050-13042-19 ] on slave 20150707-140838-83983617-5050-13042-0 at slave(1)@1.125.1.5:5051 (1.125.1.5) for framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:19.779111 13043 master.hpp:829] Adding task mysos-test_cluster2-12 with resources cpus(mysos):0.99; mem(mysos):480; disk(mysos):2047; ports(mysos):[31531-31531] on slave 20150707-140838-83983617-5050-13042-0 (1.125.1.5)
I0707 14:12:19.779166 13043 master.cpp:2318] Launching task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 with resources cpus(mysos):0.99; mem(mysos):480; disk(mysos):2047; ports(mysos):[31531-31531] on slave 20150707-140838-83983617-5050-13042-0 at slave(1)@1.125.1.5:5051 (1.125.1.5)
I0707 14:12:19.779368 13043 hierarchical_allocator_process.hpp:563] Recovered cpus(mysos):3; mem(mysos):512; disk(mysos):17952; ports(mysos):[31000-31530, 31532-32000](total allocatable: cpus%28mysos%29:3; mem%28mysos%29:512; disk%28mysos%29:17952; ports%28mysos%29:[31000-31530, 31532-32000]) on slave 20150707-140838-83983617-5050-13042-0 from framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.833107 13049 master.cpp:3229] Executor mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 on slave 20150707-140838-83983617-5050-13042-0 at slave(1)@1.125.1.5:5051 (1.125.1.5) exited with status 1
I0707 14:12:21.833279 13049 hierarchical_allocator_process.hpp:563] Recovered cpus(mysos):0.01; mem(mysos):32; disk(mysos):1 (total allocatable: cpus(mysos):3.01; mem(mysos):544; disk(mysos):17953; ports(mysos):[31000-31530, 31532-32000]) on slave 20150707-140838-83983617-5050-13042-0 from framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.873121 13050 master.cpp:3180] Forwarding status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.873198 13050 master.cpp:3146] Status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 from slave 20150707-140838-83983617-5050-13042-0 at slave(1)@1.125.1.5:5051 (1.125.1.5)
I0707 14:12:21.873272 13050 master.hpp:847] Removing task mysos-test_cluster2-12 with resources cpus(mysos):0.99; mem(mysos):480; disk(mysos):2047; ports(mysos):[31531-31531] on slave 20150707-140838-83983617-5050-13042-0 (1.125.1.5)
I0707 14:12:21.873401 13050 hierarchical_allocator_process.hpp:563] Recovered cpus(mysos):0.99; mem(mysos):480; disk(mysos):2047; ports(mysos):[31531-31531](total allocatable: cpus%28mysos%29:4; mem%28mysos%29:1024; disk%28mysos%29:20000; ports%28mysos%29:[31000-32000]) on slave 20150707-140838-83983617-5050-13042-0 from framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.887816 13046 master.cpp:2661] Forwarding status update acknowledgement bdc5ce90-70c6-4ac4-bf7a-edde3b20c791 for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 to slave 20150707-140838-83983617-5050-13042-0 at slave(1)@1.125.1.5:5051 (1.125.1.5)

on mesos-slave:


I0707 14:12:19.779917 13068 slave.cpp:1002] Got assigned task mysos-test_cluster2-12 for framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:19.780154 13068 slave.cpp:3536] Checkpointing FrameworkInfo to '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/framework.info'
I0707 14:12:19.780479 13068 slave.cpp:3543] Checkpointing framework pid '[email protected]:50160' to '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/framework.pid'
I0707 14:12:19.780894 13068 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000' from gc
I0707 14:12:19.781018 13068 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000' from gc
I0707 14:12:19.781136 13068 slave.cpp:1112] Launching task mysos-test_cluster2-12 for framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:19.782254 13068 slave.cpp:3857] Checkpointing ExecutorInfo to '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12/executor.info'
I0707 14:12:19.782737 13068 slave.cpp:3972] Checkpointing TaskInfo to '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12/runs/e89291b4-cf6e-43ad-9423-e2740beb9f4d/tasks/mysos-test_cluster2-12/task.info'
I0707 14:12:19.782922 13064 containerizer.cpp:394] Starting container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d' for executor 'mysos-test_cluster2-12' of framework '20150707-140838-83983617-5050-13042-0000'
I0707 14:12:19.782939 13068 slave.cpp:1222] Queuing task 'mysos-test_cluster2-12' for executor mysos-test_cluster2-12 of framework '20150707-140838-83983617-5050-13042-0000
I0707 14:12:19.785490 13064 mem.cpp:479] Started listening for OOM events for container e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:19.786022 13064 mem.cpp:293] Updated 'memory.soft_limit_in_bytes' to 512MB for container e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:19.786504 13068 cpushare.cpp:338] Updated 'cpu.shares' to 1024 (cpus 1) for container e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:19.787155 13064 mem.cpp:358] Updated 'memory.limit_in_bytes' to 512MB for container e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:19.787747 13068 cpushare.cpp:359] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 100ms (cpus 1) for container e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:19.789216 13068 linux_launcher.cpp:191] Cloning child process with flags = 0
I0707 14:12:19.790909 13068 containerizer.cpp:678] Checkpointing executor's forked pid 13451 to '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12/runs/e89291b4-cf6e-43ad-9423-e2740beb9f4d/pids/forked.pid'
I0707 14:12:19.793015 13068 containerizer.cpp:510] Fetching URIs for container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d' using command '/usr/local/libexec/mesos/mesos-fetcher'
I0707 14:12:20.824784 13070 containerizer.cpp:882] Destroying container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d'
E0707 14:12:20.825043 13067 slave.cpp:2485] Container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d' for executor 'mysos-test_cluster2-12' of framework '20150707-140838-83983617-5050-13042-0000' failed to start: Failed to fetch URIs for container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d': exit status 256
I0707 14:12:20.826287 13070 cgroups.cpp:2208] Freezing cgroup /sys/fs/cgroup/freezer/mesos/e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:20.827714 13063 cgroups.cpp:1375] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/e89291b4-cf6e-43ad-9423-e2740beb9f4d after 1.239808ms
I0707 14:12:20.828982 13063 cgroups.cpp:2225] Thawing cgroup /sys/fs/cgroup/freezer/mesos/e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:20.830205 13063 cgroups.cpp:1404] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/e89291b4-cf6e-43ad-9423-e2740beb9f4d after 1.078016ms
I0707 14:12:21.826225 13070 containerizer.cpp:997] Executor for container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d' has exited
I0707 14:12:21.831550 13067 slave.cpp:2596] Executor 'mysos-test_cluster2-12' of framework 20150707-140838-83983617-5050-13042-0000 exited with status 1
E0707 14:12:21.831750 13065 slave.cpp:2866] Failed to unmonitor container for executor mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000: Not monitored
I0707 14:12:21.832567 13067 slave.cpp:2088] Handling status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 from @0.0.0.0:0
W0707 14:12:21.832794 13064 containerizer.cpp:788] Ignoring update for unknown container: e89291b4-cf6e-43ad-9423-e2740beb9f4d
I0707 14:12:21.833605 13064 status_update_manager.cpp:320] Received status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.833945 13064 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.872320 13064 status_update_manager.cpp:373] Forwarding status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000 to [email protected]:5050
I0707 14:12:21.888372 13064 status_update_manager.cpp:398] Received status update acknowledgement (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.888582 13064 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_LOST (UUID: bdc5ce90-70c6-4ac4-bf7a-edde3b20c791) for task mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.927053 13064 slave.cpp:2732] Cleaning up executor 'mysos-test_cluster2-12' of framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.927609 13064 slave.cpp:2807] Cleaning up framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.927803 13067 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12/runs/e89291b4-cf6e-43ad-9423-e2740beb9f4d' for gc 6.99998926540444days in the future
I0707 14:12:21.927835 13068 status_update_manager.cpp:282] Closing status update streams for framework 20150707-140838-83983617-5050-13042-0000
I0707 14:12:21.927999 13067 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12' for gc 6.99998926446815days in the future
I0707 14:12:21.928270 13067 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12/runs/e89291b4-cf6e-43ad-9423-e2740beb9f4d' for gc 6.99998926416593days in the future
I0707 14:12:21.928318 13067 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000/executors/mysos-test_cluster2-12' for gc 6.99998926392296days in the future
I0707 14:12:21.928351 13067 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000' for gc 6.99998926238222days in the future
I0707 14:12:21.928390 13067 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150707-140838-83983617-5050-13042-0/frameworks/20150707-140838-83983617-5050-13042-0000' for gc 6.99998926207111days in the future

when I check the mesos-slave logs. I see this:


E0707 14:12:20.825043 13067 slave.cpp:2485] Container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d' for executor 'mysos-test_cluster2-12' of framework '20150707-140838-83983617-5050-13042-0000' failed to start: Failed to fetch URIs for container 'e89291b4-cf6e-43ad-9423-e2740beb9f4d': exit status 256
E0707 14:12:21.831750 13065 slave.cpp:2866] Failed to unmonitor container for executor mysos-test_cluster2-12 of framework 20150707-140838-83983617-5050-13042-0000: Not monitored

and here is my sandbox stderr which explains more:

I0707 15:47:32.898041 15795 fetcher.cpp:76] Fetching URI '/home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zip'
I0707 15:47:32.898449 15795 fetcher.cpp:179] Copying resource from '/home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zip' to '/tmp/mesos/slaves/20150707-154245-83983617-5050-14919-0/frameworks/20150707-154245-83983617-5050-14919-0000/executors/mysos-test_cluster2-78/runs/fe5ddc90-5cdf-49fd-8013-e8eb0e451e3f'
cp: cannot stat â/home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zipâ: No such file or directory
E0707 15:47:32.909354 15795 fetcher.cpp:184] Failed to copy '/home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zip' : Exit status 256
Failed to fetch: /home/ubuntu/mysos/dist/mysos-0.1.0-dev0.zip
Failed to synchronize with slave (it's probably exited)

when I check my dist dir mysos-0.1.0_dev0-py2.7.egg is there. but no .zip files!!
What have I missed during the installation? It must have made it at some point!

A few notes about my setup:
- all of the services run on the same node, but they use the private-ip(1.125.1.5), not localhost.
- my network has a Man-in-the-Middle proxy

Any idea what is wrong here?

Interrupt the backup restore process if the user deletes the cluster while the instance is still restoring...

Current When you delete a cluster Mysos doesn’t interrupt the restore process… it only deletes it after it’s done. This means if your restore takes a hour you have to wait that long for the resources to be released...

We should instead interrupt any executor subprocess and kill it right away.

Export scheduler stats.

Cannot build package

Or it might just be me you cannot figure out how.

Steps to reproduce:

First, start with a clean Ubuntu 14.04 x64 server (e.g. from digitalocean.com)
Then, run these commands:

# ===========
# Follow instructions from https://docs.mesosphere.com/getting-started/datacenter/install/
# Setup
apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
CODENAME=$(lsb_release -cs)

# Add the repository
echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \
  sudo tee /etc/apt/sources.list.d/mesosphere.list
apt-get update

apt-get install mesos marathon
# ===========

apt-get install git
git clone https://github.com/twitter/mysos.git
cd mysos
apt-get install python-pip
pip install virtualenv
virtualenv venv
source venv/bin/activate
cd 3rdparty
wget http://downloads.mesosphere.io/master/ubuntu/14.04/mesos-0.22.1-py2.7-linux-x86_64.egg
wheel convert mesos-0.22.1-py2.7-linux-x86_64.egg
cd ..

python setup.py install
mysos_scheduler

Expected: The scheduler should start?
Actual:

Traceback (most recent call last):
  File "/root/mysos/venv/bin/mysos_scheduler", line 9, in <module>
    load_entry_point('mysos==0.1.0.dev0', 'console_scripts', 'mysos_scheduler')()
  File "/root/mysos/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 552, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/root/mysos/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2672, in load_entry_point
    return ep.load()
  File "/root/mysos/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2344, in load
    self.require(*args, **kwargs)
  File "/root/mysos/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2361, in require
    items = working_set.resolve(reqs, env, installer)
  File "/root/mysos/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 833, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pynacl<1,>=0.3.0' distribution was not found and is required by the application

Production executor entry point broken

The mysos_executor entry point references mysos.executor.mysos_executor:proxy_main, which does not exist. What is the proper way to get an executor for use outside of vagrant?

Set `ExecutorInfo.source` for task stats tracking.

finding the MySQL Cluster port

I'm trying to find an easy way to get the host+port of the cluster that gets created with curl POST.
I tried writing a simple client that just gets the host and port from zokeeper by using the utilities provided at https://github.com/twitter/commons/tree/master/src/java/com/twitter/common/zookeeper, and/or
https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper
however the porblem is:

i dont know which class to use/call since there is no javadoc/ doc available.
I tried running a few of the classes (e,g, cli.py) but they all have dependency to the twitter commons, which I cant manage to build (pants does not work behind the proxy of my server)

Can you help me with my two questions:

is there a ready built client that I can just run? if so, can you tell me where is it?
if I have to write a client, which class (on java or python) should I use?

I appreciate any help.

Provide reference/demo implementations for APIs without a public implementation.

Such as PackageInstaller and BackupStore.

Master and slave have equal Mysql server ids

Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).

don't start replication.

about mysos-0.1.0-dev0.zip

Hello, sorry for the stupid question.

What is mysos-0.1.0-dev0.zip and where can i find it or how i can build it?

cluster.py should properly handle retryable ZK failures.

e.g. sessions expirations, timeouts, etc.

Move off twitter.common.app and use conventional main methods and optparse

Using twitter.common.app package is a bit awkward with setuptools. main() methods have to be wrapped in a proxy_main().

Rename mysos_scheduler flag --framework_authentication_file to --framework_authentication_credential

For better clarity on the meaning of the flag.

Allow operators to define MySQL instance sizes (e.g. in T-shirt sizes)

The operator should be able to specify a map between size names and (mem, cpus, disk) sizes and the user can specify the cluster size by name (SMALL, MEDIUM, LARGE).

Could you provide a docker environment including all mysos package? we use my mesos master and zookeeper.

Support long-running MySQL clusters

To efficiently support long-running MySQL clusters we need at least the following things:
#48
#49
#41

This is an epic issue in JIRA's terminology.

Upgrade Mesos python bindings dependency to the latest version

i.e. 0.22.0

Current 0.20.1 binding is susceptible to issues such as:
https://issues.apache.org/jira/browse/MESOS-1392

Running MySQL on shared Mesos slave pool

Currently Mysos requires Mesos slaves to dedicate all resources to it. MySQL usually requires special boxes with large disks. However for certain test use cases this may not be true and it would be nice if Mysos can be configured to support colocating its tasks with tasks from other Mesos frameworks on shared slaves.

Allow users to specify the MySQL package version/distribution to use.

The admins can use a flag to map the version and distribution to the location of the package.

Handle missing Hadoop config dir.

Currently Mysos doesn't gracefully handle the case where the directory referenced in $HADOOP_CONF_DIR doesn't exist.

Leverage GTID to properly failover MySQL master instances.

The current Mysos master failover code is capable of reliably detecting the dead instance, sending queries to find the "most current" slave and sending the commands to promote the new master and reparent the slaves. However without GTID our current scripts that Mysos invokes to do these things aren't sufficient because some files need to be copied out of band by tools such as MHA.

If we leverage GTID in MySQL 5.6 we can make failover really work without relying on external tools.

Look into disk IO isolation

As a storage framework running in a multi-tenant environment Mysos may require support from Mesos to properly isolate disk traffic.

The Mesos ticket to track: https://issues.apache.org/jira/browse/MESOS-1979

"Mysos scheduler is still connecting...".

run up the mesos scheduler and listener on 55001 port. The page of http://myhost:55001 always display "Mysos scheduler is still connecting...". Below is scheduler's logs. Is somethings wrong?
My command is:
1 #!/bin/sh
2
3 ZK_HOST=10.175.100.231
4 API_PORT=55001
5
6 # NOTE: In --executor_environ we are pointing MYSOS_DEFAULTS_FILE to an empty MySQL defaults file.
7 # The file 'my5.6.cnf' is pre-installed by the 'mysql-server-5.6' package on the VM.
8 mysos_scheduler
9 --port=$API_PORT
10 --framework_user=vagrant
11 --mesos_master=zk://$ZK_HOST:2184/mesos
12 --executor_uri=/home/huajianfeng/.tox/distshare/mysos-0.1.0-dev0.zip
13 --executor_cmd=/home/huajianfeng/incubator-cotton-master/vagrant/bin/mysos_executor.sh
14 --zk_url=zk://$ZK_HOST:2184/mysos
15 --admin_keypath=/home/huajianfeng/incubator-cotton-master/vagrant/etc/admin_keyfile.yml
16 --framework_failover_timeout=1m
17 --framework_role=*
18 --scheduler_keypath=/home/huajianfeng/incubator-cotton-master/vagrant/etc/scheduler_keyfile.txt
19 --executor_source_prefix='vagrant.devcluster' \
20 --executor_environ='[{"name": "MYSOS_DEFAULTS_FILE", "value": "/etc/mysql/conf.d/my5.6.cnf"}]'

my output is:
I1124 00:13:35.048789 178455 mysos_scheduler.py:219] Extracted web assets into /tmp/mysos
I1124 00:13:35.048928 178455 mysos_scheduler.py:244] Starting Mysos scheduler
I1124 00:13:35.050659 178455 connection.py:566] Connecting to 10.175.100.231:2184
I1124 00:13:35.051512 178455 connection.py:276] Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=10000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
I1124 00:13:35.055763 178455 client.py:378] Zookeeper connection established, state: CONNECTED
I1124 00:13:35.058011 178455 mysos_scheduler.py:250] Using ZooKeeper (path: /mysos) for state storage
I1124 00:13:35.058290 178455 connection.py:276] Sending request(xid=1): GetData(path='/mysos/state/scheduler', watcher=None)
I1124 00:13:35.059406 178455 connection.py:360] Received response(xid=1): ("ccopy_reg\n_reconstructor\np1\n(cmysos.scheduler.state\nScheduler\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'framework_info'\np6\ncmesos.interface.mesos_pb2\nFrameworkInfo\np7\n(tRp8\n(dp9\nS'serialized'\np10\nS'\n\x07vagrant\x12\x05mysos!\x00\x00\x00\x00\x00\x00N@(\x012\x01*'\np11\nsbsS'clusters'\np12\ng1\n(ctwitter.common.collections.orderedset\nOrderedSet\np13\ng3\nNtRp14\n(dp15\nS'map'\np16\n(dp17\nsS'end'\np18\n(lp19\nNag19\nag19\nasbsb.", ZnodeStat(czxid=259437, mzxid=259437, ctime=1448290870564, mtime=1448290870564, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength=410, numChildren=0, pzxid=259437))
I1124 00:13:35.059720 178455 mysos_scheduler.py:262] Successfully restored scheduler state
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@716: Client environment:host.name=hare229
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@724: Client environment:os.arch=3.16.0-30-generic
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@725: Client environment:os.version=#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015
I1124 00:13:35.066289 178455 sched.cpp:164] Version: 0.25.0
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@733: Client environment:user.name=pangbingqiang
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/huajianfeng/incubator-cotton-master
2015-11-24 00:13:35,066:178455(0x7f36717f2700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.175.100.231:2184 sessionTimeout=10000 watcher=0x7f3690e18692 sessionId=0 sessionPasswd= context=0x7f3630000e90 flags=0
2015-11-24 00:13:35,070:178455(0x7f365ffff700):ZOO_INFO@check_events@1703: initiated connection to server [10.175.100.231:2184]
2015-11-24 00:13:35,073:178455(0x7f365ffff700):ZOO_INFO@check_events@1750: session establishment complete on server [10.175.100.231:2184], sessionId=0x15134d48dc00014, negotiated timeout=10000
I1124 00:13:35.073601 178493 group.cpp:331] Group process (group(1)@10.175.102.229:58721) connected to ZooKeeper
I1124 00:13:35.073673 178493 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1124 00:13:35.073709 178493 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I1124 00:13:35.076272 178487 detector.cpp:156] Detected a new leader: (id='56')
I1124 00:13:35.076438 178492 group.cpp:674] Trying to get '/mesos/info_0000000056' in ZooKeeper
W1124 00:13:35.077445 178479 detector.cpp:444] Leading master [email protected]:5050 is using a Protobuf binary format when registering with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see MESOS-2340)
I1124 00:13:35.077500 178479 detector.cpp:481] A new leading master ([email protected]:5050) is detected
I1124 00:13:35.077623 178476 sched.cpp:262] New master detected at [email protected]:5050
I1124 00:13:35.077847 178476 sched.cpp:272] No credentials provided. Attempting to register without authentication
Bottle v0.11.6 server starting up (using CherryPyServer())...
Listening on http://0.0.0.0:55001/
Hit Ctrl-C to quit.

Collapse all history up to this point.

@davelester @caniszczyk ship?

Leverage Mesos persistent resources primitives to retain the state of failed instances.

With this feature Mysos scheduler can ask Mesos to run the a new instance of mysqld and recover the work directory left by the previously failed instance. This requires the host that the work dir is on remains online. If it is not, the new instance needs to be launched onto a new host and recover from the HDFS snapshot.

Document the Vagrant demo VM setup.

Mysos CLI.

It would be nice to have a CLI that wraps around the RESTful APIs.

is mysos runable?

I run up the mesos scheduler and listener on 5500 port. The page of http://myhost:5500 always display "Mysos scheduler is still connecting...". Below is scheduler's logs. Is somethings wrong?

My command is:

python ./scheduler.py \
    --mesos_master=zk://172.31.15.246:2181/mesos \
    --port=5500 \
    --framework_user=mysos \
    --executor_uri=/home/vagrant/mysos/dist/mysos-0.1.0-dev0.zip \
    --executor_cmd=/home/vagrant/mysos/vagrant/bin/mysos_executor.sh \
    --zk_url=zk://172.31.15.246:2181/mysos \
    --admin_keypath=/home/ubuntu/mysos/vagrant/etc/admin_keyfile.yml \
    --framework_failover_timeout=1m \
    --framework_role=mysos \
    --scheduler_keypath=/home/ubuntu/mysos/vagrant/etc/scheduler_keyfile.txt \
    --executor_source_prefix='vagrant.devcluster' \
    --executor_environ='[{"name": "MYSOS_DEFAULTS_FILE", "value": "/etc/mysql/conf.d/my5.6.cnf"}]'

and the scheduler.py content is:

#!/usr/bin/python

from os.path import join, abspath
import sys

ROOT_DIR = abspath('./')
sys.path.insert(0, ROOT_DIR)

from mysos.scheduler.mysos_scheduler import proxy_main

if __name__ == '__main__':
    proxy_main()

output is:

root@mingqi-dev:~/mysos# ./scheduler.sh
I0603 15:34:57.013078 17733 mysos_scheduler.py:177] Options in use: {'framework_failover_timeout': '1m', 'twitter_common_log_simple'
: False, 'verbose': None, 'twitter_common_app_daemon_stdout': '/dev/null', 'twitter_common_log_scribe_category': 'python_default', '
api_port': 5500, 'twitter_common_log_log_dir': '/var/tmp', 'twitter_common_app_daemonize': False, 'twitter_common_app_ignore_rc_file
': False, 'twitter_common_app_profiling': False, 'work_dir': '/tmp/mysos', 'twitter_common_app_pidfile': None, 'twitter_common_log_s
cribe_buffer': False, 'executor_source_prefix': 'vagrant.devcluster', 'election_timeout': '60s', 'twitter_common_app_rc_filename': False, 'framework_role': 'mysos', 'executor_environ': '[{"name": "MYSOS_DEFAULTS_FILE", "value": "/etc/mysql/conf.d/my5.6.cnf"}]', 'twitter_common_log_scribe_log_level': 'NONE', 'executor_uri': '/home/vagrant/mysos/dist/mysos-0.1.0-dev0.zip', 'twitter_common_log_disk_log_level': 'NONE', 'twitter_common_log_stderr_log_level': 'ERROR', 'framework_authentication_file': None, 'state_storage': 'zk', 'executor_cmd': '/home/vagrant/mysos/vagrant/bin/mysos_executor.sh', 'twitter_common_app_profile_output': None, 'framework_user': 'mysos', 'zk_url': 'zk://172.31.15.246:2181/mysos', 'twitter_common_app_debug': False, 'twitter_common_log_scribe_port': 1463, 'twitter_common_log_scribe_host': 'localhost', 'scheduler_keypath': '/home/ubuntu/mysos/vagrant/etc/scheduler_keyfile.txt', 'installer_args': None, 'backup_store_args': None, 'mesos_master': 'zk://172.31.15.246:2181/mesos', 'admin_keypath': '/home/ubuntu/mysos/vagrant/etc/admin_keyfile.yml', 'twitter_common_app_daemon_stderr': '/dev/null'}
I0603 15:34:57.016001 17733 mysos_scheduler.py:219] Extracted web assets into /tmp/mysos
I0603 15:34:57.016411 17733 mysos_scheduler.py:244] Starting Mysos scheduler
I0603 15:34:57.027303 17733 connection.py:566] Connecting to 172.31.15.246:2181
I0603 15:34:57.028554 17733 connection.py:276] Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=10000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
I0603 15:34:57.031117 17733 client.py:378] Zookeeper connection established, state: CONNECTED
I0603 15:34:57.034564 17733 mysos_scheduler.py:250] Using ZooKeeper (path: /mysos) for state storage
I0603 15:34:57.034933 17733 connection.py:276] Sending request(xid=1): GetData(path='/mysos/state/scheduler', watcher=None)
I0603 15:34:57.036081 17733 connection.py:360] Received response(xid=1): ("ccopy_reg\n_reconstructor\np1\n(cmysos.scheduler.state\nScheduler\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'framework_info'\np6\ncmesos.interface.mesos_pb2\nFrameworkInfo\np7\n(tRp8\n(dp9\nS'serialized'\np10\nS'\\n\\x05mysos\\x12\\x05mysos!\\x00\\x00\\x00\\x00\\x00\\x00N@(\\x012\\x05mysosB\\x05mysos'\np11\nsbsS'clusters'\np12\ng1\n(ctwitter.common.collections.orderedset\nOrderedSet\np13\ng3\nNtRp14\n(dp15\nS'map'\np16\n(dp17\nsS'end'\np18\n(lp19\nNag19\nag19\nasbsb.", ZnodeStat(czxid=17, mzxid=17, ctime=1433316463490, mtime=1433316463490, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength=422, numChildren=0, pzxid=17))
I0603 15:34:57.036864 17733 mysos_scheduler.py:262] Successfully restored scheduler state
I0603 15:34:57.039196 17733 sched.cpp:139] Version: 0.20.1
2015-06-03 15:34:57,041:17733(0x7f90d37fe700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-06-03 15:34:57,053:17733(0x7f90d37fe700):ZOO_INFO@log_env@716: Client environment:host.name=mingqi-dev
2015-06-03 15:34:57,053:17733(0x7f90d37fe700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-06-03 15:34:57,054:17733(0x7f90d37fe700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-44-generic
2015-06-03 15:34:57,054:17733(0x7f90d37fe700):ZOO_INFO@log_env@725: Client environment:os.version=#73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014
2015-06-03 15:34:57,054:17733(0x7f90d37fe700):ZOO_INFO@log_env@733: Client environment:user.name=ubuntu
2015-06-03 15:34:57,055:17733(0x7f90d37fe700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2015-06-03 15:34:57,055:17733(0x7f90d37fe700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ubuntu/mysos
2015-06-03 15:34:57,055:17733(0x7f90d37fe700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=172.31.15.246:2181 sessionTimeout=10000 watcher=0x7f90e525bcc0 sessionId=0 sessionPasswd=<null> context=0x7f90cc0013a0 flags=0
2015-06-03 15:34:57,057:17733(0x7f90d09c9700):ZOO_INFO@check_events@1703: initiated connection to server [172.31.15.246:2181]
2015-06-03 15:34:57,059:17733(0x7f90d09c9700):ZOO_INFO@check_events@1750: session establishment complete on server [172.31.15.246:2181], sessionId=0x14db7918a73000e, negotiated timeout=10000
I0603 15:34:57.060551 17745 group.cpp:313] Group process (group(1)@127.0.0.1:38704) connected to ZooKeeper
I0603 15:34:57.060619 17745 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0603 15:34:57.060670 17745 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
Bottle v0.11.6 server starting up (using CherryPyServer())...
Listening on http://0.0.0.0:5500/
I0603 15:34:57.076939 17745 detector.cpp:138] Detected a new leader: (id='1')
I0603 15:34:57.077203 17745 group.cpp:658] Trying to get '/mesos/info_0000000001' in ZooKeeper
Hit Ctrl-C to quit.

I0603 15:34:57.085907 17745 detector.cpp:426] A new leading master ([email protected]:5050) is detected
I0603 15:34:57.086035 17745 sched.cpp:235] New master detected at [email protected]:5050
I0603 15:34:57.086138 17745 sched.cpp:243] No credentials provided. Attempting to register without authentication

Allow the user to supply a cluster password when issuing the 'create_cluster' request.

We currently generate a random password for each cluster and return it to the user but this is less convenient for users to get started because they have to remember the password for each cluster.

Cross DC/Mesos pool replication support.

Organizations often enable cross DC replication for MySQL clusters but Mysos currently has tacit assumptions that all hosts are in the same DC, as a Mesos cluster is typically running in one DC. We would need to implement a higher level scheduler that can manage replication across DCs.

Export executor stats.

Use 'disk' resource in Mesos offers

We haven't been using it because it has not been enforced by the Mesos slave and we are planning on leveraging the persistent resource primitives soon.

However with disk not being accounted for, we sometimes run into issues where the disk space has been a resource bottleneck for hosts (e.g. the instance restores from a HUGE backup). We should have a simple implementation first that specifies the disk sources as if they are enforced.

Add Travis CI support (.travis.yml)

Support backing up MySQL master

Mysos currently supports restoring a MySQL cluster from a backup (See BackupStore). We also need to support creating backups on a regular basis. We can add a BackupStore.backup() method in the interface.

Encrypt MySQL cluster passwords in scheduler state.

So the password can be safely persisted along with other scheduler state variables in the state storage.

Assign some resources to the executor

Currently no resources are given to the executor itself. Mesos doesn't like this. Mysos executor should charge the task for some resources.

Questions about the state of mysos within twitter?

hello @xujyan

Not sure what the best medium to post these questions, whether it is github issue or one of the following mailing list. let me know
[email protected]
[email protected]

I have some general questions about the state of mysos/cotton within twitter. My guess is that each of the features currently available within twitter will be slowly open sourced and make generic so that different implementation can be contributed by the community. I am curious what they look like at twitter right now

I am curious about how twitter current using mysos to do failover, it seems like it is using the reparent.sh as you mentioned here that without gtid "some files need to be copied out of band by tools such as MHA", does the version running inside twitter already have integration with MHA? or is ther other mechanism that bring the bin log to new master.

Also i see that there is the concept of BackupStore, does twitter current using HDFS to backup the files? i see some sample code related to it here but not sure if it is just an idea or twitter already have one implementation of it.

Thanks again for open sourcing cotton, we look forward to more news

Better MySQL master failure detection

Currently Mysos executor monitoring the health of MySQL master only by making sure the process is alive.

More subtle failure cases can be captured by more sophisticated health checking (e.g., have executors regularly attempt sql queries against the master and reporting to the scheduler if they fail). The scheduler can then determine a point at which a master election is required.

An elastic solution that allows users to easily scale up and down a MySQL cluster by changing the number of slave instances

But this is neither documented in the user-guide nor part of the http.py, so I assume it is missing. We should add it then :)

twitter-archive / mysos Goto Github PK

mysos's People

Contributors

Stargazers

Watchers

Forkers

mysos's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs