GithubHelp home page GithubHelp logo

thecodeteam / mesos-module-dvdi Goto Github PK

View Code? Open in Web Editor NEW
77.0 77.0 16.0 4.68 MB

Mesos Docker Volume Driver Isolator module

License: Apache License 2.0

Makefile 2.11% Shell 11.19% C++ 50.91% Protocol Buffer 1.00% M4 34.80%

mesos-module-dvdi's People

Contributors

akutz avatar branden avatar cantbewong avatar clintkitson avatar eastsidegeek avatar jdef avatar pradeepchhetri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mesos-module-dvdi's Issues

Support pre-emptive mounts of an external storage volume as an option

Some storage platforms, and some/most file systems prevent concurrent mounts of a volume from multiple users.

If a Mesos agent mount a volume and abnormally terminates, and the same or an alternate workload subsequently tries to mount the volume, without a dismount from the original location, the second mount can potentially fail. Some storage platforms have a mechanism for supporting a "pre-emptive mount" option which will force-ibly remove the first mount when the second mount is attempted.

Resolution of this issue is dependent on resolution of a corresponding issue in the RexRay project. See: rexray/rexray#30

Is it support docker container in this module?

Hi, I install this module in mesos 0.25.0 with rexray 0.3.0 on OpenStack.

I test that example in README, then cinder volume is created.

But, I test with docker container in mesos , cinder volume is not created at all. Here is my json file deploy by Marathon 0.13.0. And I want to know that this module whether support docker container at this moment.

{
    "id": "demo",
    "mem": 512,
    "cpus": 0.5,
    "instances": 1,
    "container": {
        "type": "DOCKER",
        "docker": {
            "network": "BRIDGE",
            "image": "nginx:1.9.6",
            "portMappings": [{
                "containerPort": 80,
                "protocol": "tcp"
            }]
        },
        "volumes": [{
            "containerPath": "/usr/share/nginx/html",
            "hostPath": "/var/lib/rexray/volumes/demo",
            "mode": "RO"
        }]
    },
    "env": {
        "DVDI_VOLUME_NAME": "demo",
        "DVDI_VOLUME_DRIVER": "rexray",
        "DVDI_VOLUME_OPTS": "size=5,iops=150,newfstype=xfs,overwritefs=true",
        "DVDI_VOLUME_CONTAINERPATH": "/var/lib/rexray/volumes/demo"
    }
}

Can't load lib on Ubuntu 16.04

When I try to add the module to mesos it fails to start and spits out:

Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: WARNING: Logging before InitGoogleLogging() is written to STDERR
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: I0113 03:00:01.719745 29605 main.cpp:243] Build: 2016-11-16 01:30:49 by ubuntu
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: I0113 03:00:01.719815 29605 main.cpp:244] Version: 1.1.0
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: I0113 03:00:01.719827 29605 main.cpp:247] Git tag: 1.1.0
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: I0113 03:00:01.719835 29605 main.cpp:251] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: I0113 03:00:01.723484 29605 logging.cpp:194] INFO level logging started!
Jan 13 03:00:01 ip-10-0-4-73 mesos-slave[29629]: Error loading modules: Error opening library: 'libmesos_dvdi_isolator-1.1.0.so': Could not load library 'libmesos_dvdi_isolator-1.1.0.so': /usr/lib/libmesos_dvdi_isolator-1.1.0.so: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
Jan 13 03:00:01 ip-10-0-4-73 systemd[1]: mesos-slave.service: Main process exited, code=exited, status=1/FAILURE 

refactor mount list file IO to use Mesos checkpoint code/template

Code currently uses internal disk IO using an isolator specific implementation based on C++ standard library. Using the existing Mesos slave checkpoint implementation is expected to eliminate/reduce code by reusing an existing, proven internal Mesos component

how to debug

Hi,
how can I debug the fact that mesos slave does not call the module ?
Using dvdcli is fine.

I launched my slave with:

/usr/sbin/mesos-slave --master=zk://a.b.c.d:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --modules=file:///usr/lib/dvdi-mod.json

I can see at startup:

.... --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://a.b.c.d:2181/mesos" --modules="libraries {
  file: "/usr/lib/libmesos_dvdi_isolator-0.23.0.so"
  modules {
    name: "com_emccode_mesos_DockerVolumeDriverIsolator"
  }
}

When I call my docker plugin with dvdcli, I can see my plugin is activated etc... However, when I execute a job, nothing happens.
I have set env variables in mesos offer prototype, in commandinfo, though I cannot check they are correctly set. The job is executed "as usual", without calling the docker plugin.

I use mesos 0.23.0

Thanks

incorrect path to dvdcli on CoreOS

Hello, the library will not work on CoreOS (eg: DC/OS) due to the following:

isolator/isolator/docker_volume_driver_isolator.hpp:
static constexpr char DVDCLI_MOUNT_CMD[] = "/usr/bin/dvdcli mount";
static constexpr char DVDCLI_UNMOUNT_CMD[] = "/usr/bin/dvdcli unmount";

whereas dvdcli will install in /opt/bin on CoreOS:
if [ -n "$IS_COREOS" ]; then
BIN_DIR=/opt/bin
BIN_FILE=$BIN_DIR/$BIN_NAME

isolator should invoke potentially blocking operations async from module API handlers

related to #88, if calls to os::shell to execute dvdcli hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck in STAGING. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.

a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53

Isolator does not seem to acknowledge --work_dir option in Mesos

When testing out mesos-module-dvdi in the context of DC/OS 1.7, a user reported this abort upon integrating the module and restarting the agent:

Jun 05 06:33:23 dcostest-gra1-slavepub01 mesos-slave[4942]: I0605 06:33:23.179944  4951 state.cpp:58] Recovering state from '/tmp/mesos'
Jun 05 06:33:23 dcostest-gra1-slavepub01 mesos-slave[4942]: ABORT: (/pkg/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:114): Result::get() but state == NONE
Jun 05 06:33:23 dcostest-gra1-slavepub01 mesos-slave[4942]: *** Aborted at 1465108403 (unix time) try "date -d @1465108403" if you are using GNU date ***

It looks like the module did not acknowledge the MESOS_WORK_DIR=/var/lib/mesos default in DC/OS, and attempted to read from a missing folder. This was confirmed by creating the folder with appropriate permissions, which un-blocked mesos-slave.

It's possible that this is the correct location for that path, in which case the module may want to create it if it doesn't exist.

0.4.3 on mesos-0.28.2 crashes immediately on startup

ABORT: (../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:114): Result::get() but state == NONE
Jun 13 19:17:13 slave-1 mesos-slave[22560]: *** Aborted at 1465845433 (unix time) try "date -d @1465845433" if you are using GNU date ***
Jun 13 19:17:13 slave-1 mesos-slave[22560]: PC: @     0x7f5180a0f5f7 __GI_raise
Jun 13 19:17:13 slave-1 mesos-slave[22560]: *** SIGABRT (@0x57f8) received by PID 22520 (TID 0x7f517a2dd700) from PID 22520; stack trace: ***
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f51812c8100 (unknown)
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f5180a0f5f7 __GI_raise
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f5180a10ce8 __GI_abort
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @           0x40b71c _Abort()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @           0x40b75c _Abort()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f518218a2db Result<>::get()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f517aaf724f mesos::slave::DockerVolumeDriverIsolator::recover()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f51822a6a55 mesos::internal::slave::MesosContainerizerProcess::recoverIsolators()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f51822b0207 mesos::internal::slave::MesosContainerizerProcess::_recover()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f51822cd797 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKSt4listINS6_5slave14ContainerStateESaISC_EERK7hashsetINS6_11ContainerIDESt4hashISI_ESt8equal_toISI_EESE_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSU_FSS_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f518276e8a1 process::ProcessManager::resume()
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f518276eba7 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f5181066220 (unknown)
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f51812c0dc5 start_thread
Jun 13 19:17:13 slave-1 mesos-slave[22560]:     @     0x7f5180ad021d __clone
Jun 13 19:17:13 slave-1 systemd[1]: mesos-slave.service: main process exited, code=killed, status=6/ABRT

Command used:

Jun 13 19:28:13 slave-1 mesos-slave[27764]: I0613 19:28:13.036092 27723 slave.cpp:194] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --attributes="flavor:m1-slave;java:1.8.0;os:centos7" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="15secs" --docker_store_dir="/tmp/mesos/store/docker" --enforce_container_disk_quota="true" --executor_environment_variables="{"DATACENTER":"foo","JAVA_HOME":"\/usr\/jdk1.8.0_31"}" --executor_registration_timeout="5mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size=
Jun 13 19:28:13 slave-1 mesos-slave[27764]: "2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="/usr/hadoop-2.6.3" --help="false" --hostname="slave-1.redacted.fqdn" --hostname_lookup="true" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="com_emccode_mesos_DockerVolumeDriverIsolator" --launcher_dir="/usr/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://10.211.194.118:2181,10.211.194.121:2181,10.211.194.122:2181/mesos" --modules="libraries {
Jun 13 19:28:13 slave-1 mesos-slave[27764]:   file: "/usr/lib/libmesos_dvdi_isolator-0.28.2.so"
Jun 13 19:28:13 slave-1 mesos-slave[27764]:   modules {
Jun 13 19:28:13 slave-1 mesos-slave[27764]:     name: "com_emccode_mesos_DockerVolumeDriverIsolator"
Jun 13 19:28:13 slave-1 mesos-slave[27764]:   }
Jun 13 19:28:13 slave-1 mesos-slave[27764]: }
Jun 13 19:28:13 slave-1 mesos-slave[27764]: " --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resources="ports:[1025-8999,9011-65535]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"```

Works great on 0.28.1.

utilize slave work_dir for mount list file location, instead of a dedicated isolator parameter

In the v0.23.0 version of Mesos, the slave work directory was not readily available to an isolator implemented as a module. It is expected that this (slave work directory) will be available in an upcoming version of Mesos. When it is available, the existing isolator parameter will be deprecated, and the slave work directory will automatically be used instead. This will simplify configuration of the isolator

Isolator recovery is problematic with legacy mounts.

The isolator determines the 'legacy mounts' (no active container is using that) during slave recovery by just looking at the checkpointed state and will umount those legacy mounts immediately. However, for known orphans, they might not be killed yet (known orphans are those containers that are known by the launcher). The Mesos agent will do an async cleanup on those known orphan containers.

Umounting the volumes while those orphan containers are still using them might be problematic. The correct way is to wait for 'cleanup' to be called for those known orphan containers (i.e., still create Info struct for those orphan containers, and do proper cleanup in 'cleanup' function).

resolve container isolation behavior for special use cases

This describes desired behavior:

  1. What happens if the container path is not specified or is empty?
    no isolation, and mount is visible to all tasks on agent node
  2. What happens is container path is relative (no / prefix)?
    failure returned from prepare()
  3. Currently the isolator allow multiple tasks to ask for a mount of the same volume. When this happens, all tasks share the volume.
    What happens if MULTIPLE tasks ask for the same volume AND also specify a container path to ask for isolation?
    First task succeeds, any subsequent task asking for isolation gets failure on prepare(), since specifying a container path implies an expectation of isolation.
  4. When a container path is specified, and does not exist at prepare() time on the agent node, should the container path directory be automatically created on the agent node?
    Or should this result in a failure?
    If the container path is auto-created, should it be removed when unmount occurs?
    If it should be removed, what happens if a second task arrives bringing use count of container path >1?
    If auto-created container paths are not removed, will proliferation of these over time cause problems such as inode exhaustion?
    Answer:
    Auto-create only if container path is under /tmp. Never remove. (although OS /tmp processing may remove on reboot.)
    If the containerpath is not under /tmp, and does not exist, fail on prepare()
  5. When a mount is isolated with a container path, should the ownership and permissions be copied from container path to mount path?
    Answer: always copy ownership and permissions from containerpath to mountpoint

Implement these behaviors

Support explicitCreate flag from dvdcli

@cantbewong @dvonthenen

rexray/dvdcli#20

There is a 0.2.0 release of dvdcli coming out shortly. I reviewed the current isolator code, and I believe there should not be any net effect as is.

There was a change in the dvdcli to update it against the Docker 1.11.0 volume API. The biggest change is that we now have the ability to check whether a volume exists prior to any operation. Prior, we would implicitly create volumes no matter what when commands were ran with dvdcli.

There is a new flag --explicitCreate=true that need to be set to enable this new functionality. Without it, implicit will still work for mesos-module-dvdi as before and new volumes can be specified with the task. Let's figure out how to get this into the next release of the module where a parameter can be set for the agent that defined whether explicitCreate is used or not.

question: what is the volume life cycle

Docker volume plugin has Create/Mount/Unmount/Delete operations.

Will all those methods be called on container startup/destroy ? At which step of the mesos container life cycle?

Thanks

Using DVDI Module with Docker Containers

Greetings,

Wondering if someone can supply an example of a JSON file where Docker container would mount DVDI provided volume ? Let's say I want to elasticsearch container to mount a volume called elkvol001 as the /data provided by DVDI module. Something along the lines of this:

{
    "id": "elasticsearch",
    "container": {
        "type": "DOCKER",
        "docker": {
            "image": "elasticsearch",
            "network": "BRIDGE"
        }
    },
    "volumes": [{
        "containerPath": "/data",
        "hostPath": "/var/lib/rexray/volumes/dbdata",
        "mode": "RW"
    }],
    "cpus": 0.2,
    "mem": 512.0,
    "env": {
        "DVDI_VOLUME_NAME": "elkvol001",
        "DVDI_VOLUME_DRIVER": "rexray",
        "DVDI_VOLUME_OPTS": "size=5,iops=150,volumetype=io1,newfstype=xfs,overwritefs=true"
    },
    "instances": 1
}

[Question] How can I check the bind mount info via namespace after "dvdcli mount"

Does there are any files can enable me do the check?

The reason is that when I was doing the project of https://issues.apache.org/jira/browse/MESOS-4355 , the shepherd has a question: He think that we do not need do dvdcli umount when cleanup or recover, we want to make sure there is no impact if we did not call unmount when cleanup or recover.

Another point from shepherd is that we run the dvdcli mount in container's namespace, so we do not need to unmount when clean up as the mount point will be managed by the container, once container goes away, the mount point also goes away. So I was verifying those issues and want to get some help from you guys. Thanks.

/etc/mesos-slave/modules

** Environment **

OS: RHEL 7.2
Apache Mesos: 0.26.0
Rexray: 0.3.1
DVDI: 0.4.1-dev

Built DVDI Isolator module for Mesos 0.26.0 as per documentation. Trying to load the load the module on Mesos-slave upon start:

File: _/etc/mesos-slave/modules_

file:///usr/lib/dvdi-mod.json

File _/etc/mesos-slave/isolation_

com_emccode_mesos_DockerVolumeDriverIsolator

Trying to start mesos-slave:

systemctl daemon-reload && systemctl restart mesos-slave

Unfortunately it is failing on startup:

systemctl -l status mesos-slave
● mesos-slave.service - Mesos Slave
   Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mesos-slave.service.d
           └─mesos-slave-containerizer.conf
   Active: activating (auto-restart) (Result: exit-code) since Fri 2016-01-29 15:08:47 EST; 5s ago
  Process: 2202 ExecStart=/usr/bin/mesos-init-wrapper slave (code=exited, status=1/FAILURE)
 Main PID: 2202 (code=exited, status=1/FAILURE)

Jan 29 15:08:47 node1.local.net systemd[1]: Unit mesos-slave.service entered failed state.
Jan 29 15:08:47 node1.local.net systemd[1]: mesos-slave.service failed.
[root@node1 mesos-slave]# 

_error log entries_

Jan 29 15:00:00 node1.local.net mesos-slave[1741]: Error loading modules: Error opening library: '/usr/lib/dvdi_isolator-0.26.0.so': Could not load library '/usr/lib/dvdi_isolator-0.26.0.so': /usr/lib/dvdi_isolator-0.26.0.so: cannot dynamically load executable
Jan 29 15:00:00 node1.local.net systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Jan 29 15:00:00 node1.local.net systemd: Unit mesos-slave.service entered failed state.
Jan 29 15:00:00 node1.local.net systemd: mesos-slave.service failed.

Any clues where I should be focusing on ?

Thanks,

Alex

Failure scenarios

We need to ensure that we have covered the recovery scenarios considering the information we have in 0.23.0.

Some of which can be

  • Agent crashes, recovers, tasks still running
  • Agent crashes, recovers, some tasks finished
    ..

hung on unmount operation

I created a task in Marathon, watched it come up, let it run for a few min, then suspended the app (sets instances to zero). I ended up with a hung unmount op on the slave the task was running on. The task appears as KILLED in mesos. I have a single slave node cluster. Now I can't launch a new task trying to use a (different) rexray vol. Attempting to do so results in a task stuck in STAGING. So this hung unmount op is blocking somewhere.

logs:

Apr 02 19:23:11 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:11.981403  1134 slave.cpp:3528] executor(1)@10.0.0.249:53756 exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.070981  1136 containerizer.cpp:1608] Executor for container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9' has exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.071041  1136 containerizer.cpp:1392] Destroying container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9'
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.072430  1133 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.073606  1132 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
 after 1.126912ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.074779  1139 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.075850  1139 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9f
a9 after 1.031936ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.076619  1133 docker_volume_driver_isolator.cpp:356] rexray/test is being unmounted on cleanup()
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.078660  1133 docker_volume_driver_isolator.cpp:362] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vo
lumename=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=test volumeOpts=map[]
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-8677785c moduleName=default-docker volumeID=vol-23fe
679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/test"
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-23
fe679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker run
Async=false volumeID=vol-23fe679a
Apr 02 19:23:14 ip-10-0-0-249.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close

/proc/mounts:

core@ip-10-0-0-249 ~ $ cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=7688856k,nr_inodes=1922214,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
/dev/xvda9 / ext4 rw,relatime,data=ordered 0 0
/dev/xvda3 /usr ext4 ro,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
tmpfs /media tmpfs rw,nosuid,nodev,noexec,relatime 0 0
systemd-1 /boot autofs rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct 0 0
tmpfs /tmp tmpfs rw 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
xenfs /proc/xen xenfs rw,relatime 0 0
/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0

Create continer failed due to DVDI_VOLUME_NAME is not valid

I was using 0.27.0 dvd driver and the env json is as following:
root@mesos002:~/src/mesos/27/mesos/build# cat /opt/env.json
{
"DVDI_VOLUME_NAME": "123",
"DVDI_VOLUME_DRIVER": "convoy"
}

I was always getting this error when launch a task
E0314 20:17:41.391485 2008 docker_volume_driver_isolator.cpp:485] Environment variable DVDI_VOLUME_NAME rejected because it's value contains prohibited characters

Checkpointed data should not be put in agent's metadata workdir by default.

We should avoid that since that directory is owned by the Mesos agent. Mesos does not expect an isolator to put checkpointed data there. We should put it under /var/run/mesos/isolators/.

The directory in which the checkpointed data is put should be cleaned up on reboot. Otherwise, the isolator will try to recovery the checkpointed data after the reboot, thinking that mounts are still there. Therefore, putting checkpointed data under /var/run is recommended as it'll get cleaned up on reboot.

Apply style recommendations from code review

limit code to 80 characters per line.

All comments should end with a punctuation mark.

Capitalize the beginning of comments.

Make Error messages generally begin with capital letters, they should not end with punctuation marks.

Make a bit more liberal use of carriage returns to provide vertical spacing. I would have a look at the Mesos code and try to emulate the patterns used there: often a blank line before/after an if statement (unless a preceding declaration/assignment is directly related to the if), separating declarations & assignments into logical groupings using blank lines, etc...

I notice in your code that you show a general preference for working with stringstreams rather than manipulating strings; I'm not sure how much it matters, but the Mesos codebase relies primarily on string manipulation.

Make sure that you're taking advantage of all the using statements at the beginning of the code, using std::string for example.

Check out the indent rules in the style guide; there are a couple blocks in the isolator code that are indented four spaces rather than two.

The includes at the beginning of your code should be divided into logical groupings with newlines; you can check out the Mesos code for examples of this.

Avoid using namespace directives in Mesos, instead utilizing standard namespace declarations.

Good use of the stout library's foreach helper, but revert to the standard C++11 range-based for loop in several places; for consistency, you could stick to the foreach function exclusively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.