Scope of this ticket is to better document unexpected edge cases that occur when canceling deployments.
Please see COPS-3012 for more info and discussion and examples. Description from that customer issue:
Customer is deploying sets of Marathon apps via group deployment, where one or ore of the apps will depend on other apps.
Sometimes, one of the dependencies will fail (e.g., invalid image tag), so they have to go in and manually suspend/fix/start the dependency.
When they do this, the dependent app will say it is updated, but will not actually update.
For example, deploying the following:
where /a/2 is dependent on /a/1.
Now, try do a group update with the following
(again, with the same dependency).
/a/1/ will fail, as expected, and /a/2 will hang, as expected.
However, if you suspend /a/1 (in order to fix it, for example), then /a/2 will be displayed as if the upgrade to the target version has completed. This shows up both in the API and in the DC/OS and Marathon UIs. If /a/1 is fixed and redeployed, /a/2 will not be restarted because Marathon assumes the target version is already fulfilled. This can only be worked around by e.g. adding a label to /a/2 in order to have Marathon restart all tasks.