When a resident task gets lost, Marathon does not try to replace it until it receives an offer containing the related reservations/volumes. Mesos, sadly, will not tell us that the task is gone when the agent comes back. To Mesos, the tasks is perpetually unreachable. This prompted us to implement a work-around in which we use the offer stream as a surrogate signal for task-gone: if we see our reservation again, we conclude that the instance marked as unreachable is in fact definitely gone.
However, if Marathon is not trying to launch anything else, it will suppress offers and in return not receive offers, so even if the agent came back, Marathon will not be able to launch a new task.
As a workaround, this issue is scoped to periodically revive offers in case we have resident unreachable tasks in state.