Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7429

Resident Tasks lose resources and hog offers.

    Details

      Description

      If I specify mesos_role with marathon 1.4.1 and mesos 1.2.0 and make an application definition with persistent volumes, then the resources will be reserved under that role, and future resource offers will have that role, even if I specify resource roles of * in the marathon app definition.

      Marathon has an error mode listed in the comments where it will hog offers and never allow them to be used again for that node, requiring the mesos machine to be completely wiped so that marathon can re-acquire the resources correctly.

      This particular setup has multiple resident tasks on a single mesos agent. So there is only one agent that this app-id receives valid resources from (due to CLUSTER constraints), and nominally 16 of these resident tasks can fit on one mesos agent.

      I can tell marathon is hogging offers because I can see the outstanding offers on the #/offers endpoint on the mesos master.
       

      Note that this seems to primarily be triggered when we change the minimum healthy threshold. We have not encountered this bug when doing "normal" fetch URI changes.

       

      For example, changing

       

        "upgradeStrategy": {
          "minimumHealthCapacity": 0.9,
          "maximumOverCapacity": 0
        }
      

      to

        "upgradeStrategy": {
          "minimumHealthCapacity": 0.5,
          "maximumOverCapacity": 0
        }
      

       

        Attachments

        1. mesoslog_after_start.log
          18 kB
        2. serv.py
          1 kB
        3. test.app
          0.7 kB

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              drcrallen drcrallen
              Team:
              Orchestration Team
              Watchers:
              brugidou, drcrallen, Egor Ryashin, egor-ryashin, Ken Sipe, Matthias Eichstedt, Roman Leventov
            • Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: