Details

    • Type: Task
    • Status: Resolved
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: DC/OS 1.13.0
    • Component/s: dcos-cli, marathon
    • Labels:
      None
    • Epic Link:
    • Sprint:
      Orchestration 2018-34, Orchestration 2018-35
    • Story Points:
      2

      Description

      Background

      Marathon has a feature in which it will reduce the rate at which it relaunches tasks in the event of consecutive failures. It is called "back off", and by default, the delay will start at 1 second and grow up to 1 hour.

      Currently, this delay gets reset if anything service-related gets modified (i.e., the command we are launching, the docker image, the resource requirements, etc.). More specifically, the delay will not get reset if a deployment-only parameter is updated (such as constraints, upgrade strategy, etc.). It is possible that a task could be failing because it depends on a database that is in a bad state, and not because of anything wrong with the task itself. In these cases, the operator will need a mechanism to reset/override the manual task launch delay once the issue is corrected in the dependency. Marathon currently has an API call to do so and is documented here (DELETE /v2/queue/{app-id/pod-id}/delay), and the open source Marathon UI exposes control over it for apps.

      There currently does not appear to exist a mechanism to reset task launch delay using the CLI.

      Implementation proposal

      (TODO)

      Some potential ideas for ways to control this:

      • dcos marathon task reset-delay <pod-id/app-id>
      • dcos marathon delay reset <pod-id/app-id>
        • Could go along with a command dcos marathon delay show <pod-id/app-id> ?
      • (some design required here)

      Acceptance Criteria

      As a user,
      Given I launch a pod named "/crunchy" that fails if "/mysql" isn't running
      And the task launch delay accumulates up to several minutes
      when I read the existing documentation and understand how this works, and
      When I launch "/mysql"
      And I run DCOS cli command to reset the delay for /crunchy
      Then /crunchy launches immediately

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarunguptaakirala Tarun Gupta Akirala
                Reporter:
                tharper Tim Harper
                Team:
                Orchestration Team
                Watchers:
                Armand Grillet (Inactive), Ken Sipe, Mergebot, Tarun Gupta Akirala, Tim Harper
              • Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: