Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7153

no failover on mesos/marathon multimaster

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: API
    • Labels:

      Description

      Hey there,

      I'm having trouble setting up failover docker containers, it seems that no matter if my docker host is down, containers are not rebalanced on another host.

       

      I'm trying a simple topology, with 3 masters (only one active) and a 1 quorum. My master args are :

       /usr/sbin/mesos-master --zk=zk://foo.bar.0.1:2181,foo.bar.0.2:2181,foo.bar.0.3:2181/mesos --ip=foo.bar.0.1 --cluster=mesos_cluster --log_dir=/var/log/mesos --work_dir=/var/mesos --advertise_ip=foo.bar.0.1 --hostname=foobar --quorum=1

      and my slaves are all configured on the same pattern :
      /usr/sbin/mesos-slave --ip=foo.bar.0.1 --master=zk://foo.bar.0.1:2181,foo.bar.0.2:2181,foo.bar.0.3:2181/mesos --containerizers=mesos,docker --log_dir=/var/log/mesos --work_dir=/var/mesos --docker_config=/root/.dockercfg --hostname=foobar --resources=file:///etc/mesos-resources.txt

      I'm rolling 10 instances of the attached json, changing the mock-producer-x value and PARTITION value to assign 10 different numbers

      In this context, my application is properly deployed and runs smoothly on my 3 nodes.

      My problem occurs if one of my nodes falls down (here, I'm doing a reboot or ifconfig eth1 down), the assigned containers are not seen as "running" on marathon anymore, BUT if I try and restart them, they are properly restarted elsewhere. When the missing node pops back in the cluster, its containers are still up (if it's just a simple network failure) and my issue comes from the fact that they are never killed nor rebalanced, in any way.

      [EDIT] : I also tried the --recover=cleanup option, who failed all my agents.

        Attachments

          Activity

            People

            • Assignee:
              theonlydoo theonlydoo
              Reporter:
              theonlydoo theonlydoo
              Team:
              Orchestration Team
              Watchers:
              Karsten Jeschkies, Matthias Eichstedt, theonlydoo
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: