Details

    • Sprint:
      Marathon Sprint 1.10-6
    • Story Points:
      2

      Description

      Health checks seem to be broken.  This was discovered in the SI test `test_health_failed_check`

      https://github.com/mesosphere/marathon/blob/master/tests/system/marathon_common_tests.py#L512

       

      It launches an app with the following:

      {'id': 'healthy', 'mem': 128, 'healthChecks': [{'portIndex': 0, 'protocol': 'HTTP', 'maxConsecutiveFailures': 1, 'timeoutSeconds': 2, 'path': '/', 'intervalSeconds': 2}], 'cpus': 0.5, 'disk': 0, 'instances': 1, 'constraints': [['hostname', 'LIKE', '10.0.0.105']], 'cmd': '/opt/mesosphere/bin/python -m http.server $PORT0'}

       

      It detects the port defined as PORT0 then blocks the inbound of that port on the node that the task is running on for 7 seconds (longer than 2 seconds)

      10.0.0.105 $ sudo iptables -I INPUT -p tcp --dport 11843 -j DROP

      normally the unhealthy is determined and the task is killed and relaunched.   There is no unhealthy detection in the latest marathon.

      it appears there are recent changes in this area of code:

      https://github.com/mesosphere/marathon/commit/1982caa0347c9f0f4bc811e40f4b168682c77884#diff-073c60620ab193bf799ffe432ed7a314

      https://mesosphere.slack.com/archives/C1U6FPSTT/p1497474660291188

      it looks like custom timeouts may not be implemented in the latest changes... 

      here is readiness checks: https://github.com/mesosphere/marathon/blob/83a5e99a72b42fffd28e649869ec0617652b2d84/src/main/scala/mesosphere/marathon/core/readiness/impl/ReadinessCheckExecutorImpl.scala#L82

      timeouts for health checks are not set:

      https://github.com/mesosphere/marathon/blob/83a5e99a72b42fffd28e649869ec0617652b2d84/src/main/scala/mesosphere/marathon/core/health/impl/HealthCheckWorkerActor.scala#L93

       

       

        Attachments

          Activity

            People

            • Assignee:
              junterstein Johannes Unterstein
              Reporter:
              ken Ken Sipe
              Team:
              Orchestration Team
              Watchers:
              Johannes Unterstein, Ken Sipe
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: