Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4193

packages/marathon: marathon bootstrap relies on zk-1.zk node to be available

    Details

    • Sprint:
      Marathon 2018-30
    • Story Points:
      1

      Description

      Overview

      marathon systemd unit requires zk-1.zk node to be available in order to launch marathon service. This break HA pattern.

      https://github.com/dcos/dcos/blob/738e738bfd367b9e0786af27938a0f9a40989414/packages/marathon/build#L47

      ...
      EnvironmentFile=-/var/lib/dcos/marathon/environment
      Environment=JAVA_HOME=${JAVA_HOME}
      ExecStartPre=/bin/ping -c1 leader.mesos
      ExecStartPre=/bin/ping -c1 zk-1.zk
      ExecStartPre=/opt/mesosphere/bin/bootstrap dcos-marathon
      ExecStart=/opt/mesosphere/bin/marathon.sh
      

      If for any reason zk-1.zk node is unavailable (e.g. because of network partition) the marathon service won't launch even if ZK ensemble is healthy with one node down.

      CC Dominik Dary

      Affected versions

      Currently the only released DC/OS versions with this bug are 1.11.5 and 1.11.6; however, the change causing the issue is currently merged and en route for release in 1.10, 1.12, and 1.13.

      Workaround

      As for the workaround, customers that have been affected DC/OS version (1.11.5, 1.11.6, and, potentially future versions if we don't get the fix merged before the next release):

      1. Edit /etc/systemd/system/dcos-marathon.service
      2. Remove the line ExecStartPre=/bin/ping -c1 zk-1.zk
      3. Run systemctl daemon-reload
      4. Repeat for all masters

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ken Ken Sipe
                Reporter:
                mhrabovcin.c Martin Hrabovcin
                Team:
                Orchestration Team
                Watchers:
                Arthur Johnson, Craig Neth, Jan-Philip Gehrcke, Ken Sipe, Lisa Gunn, Marcus Alvarez, Martin Hrabovcin, Matthias Eichstedt, Mergebot, Tim Harper
              • Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: