Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-2340

Delete (rollback) deployment is dangerous

    Details

    • Type: Task
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Deployments
    • Labels:

      Description

      Let's consider situation when application is running and is healthy. Let the deployment be scheduled. New version of app cannot be started (for whatever) reason. Then user delete the deployments (by clicking rollback in UI or just sending DELETE on endpoint without force). Let assume that app cannot be redeployed (e.g. artifact is unavailable). After this procedure user end up with 0 running instances!

      Example:
      1. resource we can control is an URL of http server accesible from mesos agents that we can start/stop on demand (e.g. python -m SimpleHTTPServer on local machine)
      2. Deploy application

       json
         {
           "id": "/rollbacktest",
           "cmd": "sleep 30 && wget <resource we can control> && python -m SimpleHTTPServer $PORT0",
           "cpus": 0.1,
           "mem": 32,
           "disk": 0,
           "instances": 5,
           "healthChecks": [
             {
               "path": "/",
               "protocol": "HTTP",
               "portIndex": 0,
               "gracePeriodSeconds": 5,
               "intervalSeconds": 60,
               "timeoutSeconds": 20,
               "maxConsecutiveFailures": 3,
               "ignoreHttp1xx": false
             }
           ],
           "portDefinitions": [
             {
               "port": 10486,
               "protocol": "tcp",
               "labels": {}
             }
           ]
         }
         

      3. Update this application

       json
         {
           "id": "/rollbacktest",
           "cmd": "sleep 30 && wget <resource we can control> && fail",
           "cpus": 0.1,
           "mem": 32,
           "disk": 0,
           "instances": 5,
           "healthChecks": [
             {
               "path": "/",
               "protocol": "HTTP",
               "portIndex": 0,
               "gracePeriodSeconds": 5,
               "intervalSeconds": 60,
               "timeoutSeconds": 20,
               "maxConsecutiveFailures": 3,
               "ignoreHttp1xx": false
             }
           ],
           "portDefinitions": [
             {
               "port": 10486,
               "protocol": "tcp",
               "labels": {}
             }
           ]
         }
         

      4. Application is failing.
      5. Disable resource we can control.
      6. Rollback to previous version.
      7. We end up with 0 running tasks.

      IMO when user want to rollback deployment, expected behavior will be to do not restart currently running application if has same version as version we want to rollback. Or behave as normal deployment and take in to account upgradeStrategy settings.

      Releated to: MGI-1554

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                janisz janisz
                Team:
                Marathon Team
              • Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Zendesk Support