The current task reconciliation has several know design flaws:
- Marathon will always initiate an implicit reconciliation along with an explicit one. For all commonly known tasks, this means that 2 similar status updates will be received (one for the implicit call, one for the explicit one).
- The process has no notion of finished. Marathon will initiate a reconciliation, but does not wait for status updates to arrive.
- If no status updates for tasks are received, Marathon will act on the state loaded from zk, even if that is outdated. As seen in related issues, Marathon will e.g. perform health checks against unreachable tasks, which will fail and result in kill requests that are eventually not answered by mesos if the agent has been disconnected and not yet re-registered.
It would be favorable if Marathon would be explicit about the unknown status of tasks, and would not act on these until it receives a status update from Mesos.