Details

    • Type: Task
    • Status: Resolved
    • Priority: Medium
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: State Handling
    • Labels:

      Description

      I've come into a situation wherein I can't delete a Marathon app, or kill the tasks associated with it.

      Symptoms:

      • The app does not appear in the list of apps returned by the `/v2/apps` endpoint.
      • The app does appear when you go to the {{`/v2/apps/ {appId}

        }}` endpoint.

      • The app is in the list of apps in the zookeeper `/marathon/state` node.
      • If I kill the only task associated with the app, then Marathon restarts it.
      • If I try to delete the app via the API, Marathon returns a 404 with the Message `"App '/apollo.canary.git54607924.configa53dc013' does not exist"`

      This was picked up by our monitoring scripts that look for 'lost' tasks, as described in MGI-4580 (which we've since updated to check for this failure mode).

      I know that this is 1.1, and I can't be confident whether it affects newer versions, but I'm kind of reluctant to upgrade to a newer version:

      • MGI-4580 has meant that we've rolled back 1.3 deployments to 1.1 everywhere at Yelp.
      • MGI-4768 means it's scary to move forward to 1.4

      I've also tried a rolling restart of the Marathon cluster to see if it helps, but to no avail.

      I've got lots of logs for analysis, so let me know which specific parts can help.

      Thanks!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Rob-Johnson Rob Johnson
              Team:
              Orchestration Team
              Watchers:
              Jason Gilanfarr (Inactive), Karsten Jeschkies
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: