Details

      Description

      Hi !

      I'm having an issue where in case of connectivity loss with the ZooKeeper servers, the Marathon cluster would be inoperable without manual intervention.

      Here's some details about the issue:

      • Restart a ZooKeeper instance
      • Marathon's leader is automatically killed (and restarted) and another assumes the role of master
      • I do have access to Marathon's UI but any task operations stays in Waiting/Deploying forever
      • If I restart the two remaining Marathon services, everything is back to normal, I can deploy and scale my apps

      I get that the leader might be killed if ZK connectivity is lost, but I thought the cluster would survive without my having to restart all Marathon instances.

      I am running ZooKeeper 3.4.5, Mesos 1.0.0 and Marathon 1.1.2.

      Regards,
      Antoine.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                GitHub_apognu Antoine POPINEAU (Inactive)
                Team:
                Orchestration Team
                Watchers:
                Jason Gilanfarr (Inactive), Matthias Eichstedt
              • Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: