Details

    • Story Points:
      3

      Description

      When a resident task gets lost, Marathon does not try to replace it until it receives an offer containing the related reservations/volumes. Mesos, sadly, will not tell us that the task is gone when the agent comes back. To Mesos, the tasks is perpetually unreachable. This prompted us to implement a work-around in which we use the offer stream as a surrogate signal for task-gone: if we see our reservation again, we conclude that the instance marked as unreachable is in fact definitely gone.

      However, if Marathon is not trying to launch anything else, it will suppress offers and in return not receive offers, so even if the agent came back, Marathon will not be able to launch a new task.

      As a workaround, this issue is scoped to periodically revive offers in case we have resident unreachable tasks in state.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kjeschkies Karsten Jeschkies
                Reporter:
                matthias.eichstedt Matthias Eichstedt
                Team:
                Orchestration Team
                Watchers:
                Karsten Jeschkies, Matthias Eichstedt
                Reviewers:
                Karsten Jeschkies, Tim Harper
              • Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: