Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4353

marathon-lb marathon health check api change breaks marathon-lb front-to-end mapping and causes service downtime

    Details

    • Type: Bug
    • Status: Open
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: DC/OS 1.11.2
    • Fix Version/s: None
    • Component/s: marathon-lb
    • Labels:

      Description

      Contract change of marathon api call http://marathon.mesos:8080/v2/apps?embed=apps.tasks breaks marathon-lb app frontend-backend mapping. A being-deployed instance could be treated as healthy and put into haproxy.cfg with new host:port and causes haproxy Layer4 error and prevents the access to the service until the next health check as specified in the marathon service definition. 

      Working Version (DC/OS 1.8.6, Marathon 1.3.2, marathon-lb v1.12.1)

      Breaking Version (DC/OS 1.11.2, Marathon 1.6.392, marathon-lb v1.12.1)

      for an unhealthy instance, api "tasks/task/healthCheckResults" entry is changed from 

      "healthCheckResults": [
      {
      "alive": false,
      "consecutiveFailures": 0,
      "firstSuccess": null,
      "lastFailure": null,
      "lastSuccess": null,
      "lastFailureCause": null,
      "taskId": "enterpriseengineering_dev_java-rest-template-sample.c6dac998-d30e-11e8-952b-70b3d5800003"
      }
      ]

      in marathon 1.3.2

      to 

      "healthCheckResults": []

      in marathon 1.6.392

       

      This contract change breaks the marathon-lb marathon_lb.py

      if 'healthCheckResults' not in task:  at line 1524 and 1638 and cause unhealthy instance treated as healthy. 

      The possible fix is to change the line to 

      if 'healthCheckResults' not in task or not task['healthCheckResults']:

      to handle both no-key and empty list case.

       

      Please let us know if this is new bug that we should fix in the mentioned version.

        Attachments

          Activity

            People

            • Assignee:
              jasonkoelker Jason Koelker
              Reporter:
              justinzhou088 justinzhou088
              Team:
              Networking Team
              Watchers:
              Deepak Goel, justinzhou088
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: