Given an application with a GROUP_BY:2 constraint, and two instances:
In the event that instance 2 becomes unreachable and marked an active, Marathon will spin up a new instance, instance 3. However, the placement decision is made irrespective of the fact that instance 2 is unreachable. This means that the following scenarios possible:
At this point, the instances do meet the specified constraints. However, the problem becomes when instance 2 is expunged.
We have two instances running, as is specified by the app definition, but the instances violate the specified constraints.
We can make the logic more intelligent, potentially, if we have placement constraints evaluate with the assumption that unreachable inactive tasks will eventually be gone.
Therefore, in the above situation, the offer matcher would have the following instances as an input to the placement constraints:
Therefore, the next logical placement would be:
However, there are consequences to such an approach. Presume that we have a UNIQUE constraint, or, perhaps, a MAX_PER:1 constraint, and we overscale in such a way that we will violate the constraint if the inactive instance does happen to come back.
Now, let's also assume that the kill strategy is specified to delete the oldest instance first, and, instance 1 is the oldest.
We will need to modify the kill strategy for constraints that apply in the context of other values, preferring to kill those that violate constraints, first, before applying the specified kill strategy.
Given a Mesos cluster with the following nodes and agent attributes:
and a Marathon app definition with a kill policy of kill-oldest, a placement constraint of color:UNIQUE, unreachableStrategy of 5:300 and target instance count of 2
and I manually kill-and-scale the instance placed on a color:BLUE node (making the instance on node color:RED oldest)
and I scale the app back up to two instances
When I kill the Mesos agent on the node with attribute color:BLUE (on which one of the instances is running)
Then Marathon should scale up another instance on the other agent with color:BLUE
When I restart the Mesos agent previously killed, and the instance running on it becomes reachable again
Then Marathon should prefer to kill that instance, rather than the oldest, and the unique constraint should be satisfied
(Similar scenario for MAX_PER:1, and GROUP_BY:2)