• Type: Task
    • Status: Resolved
    • Priority: High
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: DC/OS 1.12.0, RI-2
    • Component/s: Orchestration
    • Labels:


      Currently, Marathon instances have a condition that is inferred from the combined task states. This has several downsides, especially for resident or unreachable tasks: It is unclear what the goal for an instance is.

      Scenario 1: Unreachable task
      When a task is unreachable and a user requests it to be killed, Marathon cannot fulfill that request because a call to Mesos will have no effect – the task is unreachable. The only way Marathon can currently provide the wanted functionality is by expunging metadata about this instance and it's associated tasks, and eventually kill it in case it is seen again. However, this is hard to debug and read from the logs since a task is killed without actual information about why. Treating this situation by setting a goal state on the instance and then retrying would be easier to comprehend/debug. If the task would be seen again, Marathon could kill it knowing that the instance's goal state requires this action.

      Scenario 2: Resident task
      When a task using persistent volumes terminates, Marathon keeps the instance round in order to retain the reservation info. This is for at least two reasons:
      a) When upgrading a service, the instance's reservation is needed again to launch a new task on.
      b) When a user requests to kill a task ad scale down the number of instances, Marathon doesn't know whether the reservation shall be kept or not, since the API does not clearly specify that semantic. The default is to keep the instance and reservation in case the user wants to scale up again later and retain access to the persistent volume data.
      There is a workaround in place for when a user wants to get rid of an instance (with running task or not) that uses a reservation and persistent volume: kill?wipe=true. This workaround will wipe instance metadata off the repository and kill the associated tasks. Similar to treating unreachable tasks, this makes debugging edge cases harder.

      Especially for MARATHON-8167, it is crucial to have a clear way to express this instance should be running, the tasks associated with this instance shall be killed and the instance expunged, or the tasks associated with this instance shall be killed but the instance should be retained.


          Issue Links



              • Assignee:
                alenavarkockova Alena Varkockova
                matthias.eichstedt Matthias Eichstedt
                Orchestration Team
                Aleksey Dukhovniy, Alena Varkockova, daltonmatos, Karsten Jeschkies, Matthias Eichstedt, Tim Harper
                Karsten Jeschkies, Matthias Eichstedt, Tim Harper
              • Watchers:
                6 Start watching this issue


                • Created: