Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-1713

Parent issue: Marathon does not re-use reserved resources for which a lost task is associated

    Details

    • Sprint:
      Marathon Sprint 7-2017

      Description

      This is a parent issue to aggregate the handful of sub-issues related to resident tasks.

      (check indicates it is merged to master. Please see https://github.com/mesosphere/marathon/issues/5206 for the backport to 1.4 status)

      • [x] MGI-5141 - Marathon fails to release reserved resources for deleted apps
      • [x] MGI-5154 - "Kill and wipe" does not actually kill task
      • [x] MGI-5162 - UnreachableStrategy configuration has no effect; should not be used by resident tasks
      • [x] MGI-5164 - localVolumes Instance property is lost with Marathon restart
      • [x] MGI-5206 - Killing a lost resident task results in expunge
      • [ ] MGI-5283 - Workaround required for unreachable resident tasks

      – original –

      I've recorded a video to show the problem:

      http://screencast.com/t/Lkgdi6tIEG6

      In effect, Mesos tells Marathon a task was lost during a reconciliation (for a variety of reasons, but in this demonstrated occurrence it is lost because the mesos-slave id is forcibly changed and a new ID comes up on the same mesos-slave IP address). Then, Marathon responds to that by reserving a new set of resources and persistent volume, and launching a new task.

      The expected behavior should be that Marathon should reuse the reserved resources (which it can't because it thinks there is a task running there... status.state == Unknown from looking at the protobuf hexdump in zookeeper). If it can't use the reserved resources because it thinks something might be running then it should not launch additional persistent volumes (when push comes to shove, if it can't satisfy 0% over capacity and 0% under capacity thresholds, it should heed the 0% over capacity limit).

        Attachments

          Activity

            People

            • Assignee:
              tharper Tim Harper
              Reporter:
              tharper Tim Harper
              Team:
              Orchestration Team
              Watchers:
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: