Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-6975

Added abdication of leadership on disconnect from the Mesos master (plus other leadership defeats/elections rework)

    Details

    • Type: Task
    • Status: Resolved
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      This PR includes quite a few things.
      1. It includes the abdication of leadership on disconnect from the Mesos master. Thus, when the Scheduler calls disconnected() Marathon no longer commits suicide. Instead, it abdicates its leadership position. It is possible that this Marathon instance is re-elected which basically translates into "Wait until the master comes back." But under normal circumstances a different Marathon instance will be elected.
      2. I reworked the MarathonSchedulerService class to properly handle leadership defeats and elections.
      As I learnt that MesosSchedulerDriver objects cannot be "restarted". Once one calls stop() on the driver it cannot be started again via join(), run() or start(). Thus, I instantiate a new driver after every stop(). This insures that a new driver is ready and waiting to be started the instance of Marathon is elected.

      I added comments in the code to help with understanding.

      Note that I am using the term "leadership" loosely as Marathon can be elected as leader via Zookeeper or by default in the case of no HA. Basically, with my changes a single non-HA configuration will result in the Marathon instance assuming leadership from its internal perspective.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              GitHub_marc-barry Marc Barry (Inactive)
              Team:
              Orchestration Team
              Watchers:
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: