Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8188

Marathon health checks suddenly become abnormal after a long period of normalcy

    Details

    • Type: Task
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      After a long period of time, marathon health checks suddenly become abnormal. But the marathon UI still shows normal and haproxy.cfg did not flushed. We must restart marathon for recovering.

      The logs shows:

      ```

      {"log":"[2018-04-27 18:31:23,952] INFO Received health result for app [/mbank/capp-mbank-config] version [2018-01-30T16:13:36.117Z]: [Healthy(task [mbank_capp-mbank-config.8685a40d-05d8-11e8-b6bb-024201b801c8],2018-01-30T16:13:36.117Z,2018-04-27T18:31:23.952Z)] (mesosphere.marathon.health.HealthCheckActor:marathon-akka.actor.default-dispatcher-690)\n","stream":"stdout","time":"2018-04-27T18:31:23.953177066Z"} {"log":"[2018-04-27 18:39:30,236] INFO reconcile [/jdpt/jdpt-query-service] with latest version [2018-04-24T16:48:41.536Z] (mesosphere.marathon.health.MarathonHealthCheckManager$$EnhancerByGuice$$941a6eec:ForkJoinPool-2-worker-333)\n","stream":"stdout","time":"2018-04-27T18:39:30.236457083Z"} {"log":"[2018-04-27 00:40:13,888] WARN ErrorClosed(Connection reset by peer) in response to GET request to /info with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-721)\n","stream":"stdout","time":"2018-04-27T00:40:13.888629355Z"} {"log":"[2018-04-27 01:45:23,107] WARN ErrorClosed(Connection reset by peer) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-637)\n","stream":"stdout","time":"2018-04-27T01:45:23.10816488Z"} {"log":"[2018-04-27 02:10:23,607] WARN ErrorClosed(Connection reset by peer) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-702)\n","stream":"stdout","time":"2018-04-27T02:10:23.60777922Z"} {"log":"[2018-04-27 06:17:51,300] WARN Premature connection close (the server doesn't appear to support request pipelining) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-660)\n","stream":"stdout","time":"2018-04-27T06:17:51.30100404Z"} {"log":"[2018-04-27 07:38:25,353] WARN ErrorClosed(Connection reset by peer) in response to GET request to /info with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-670)\n","stream":"stdout","time":"2018-04-27T07:38:25.353336065Z"} {"log":"[2018-04-27 08:43:23,998] WARN ErrorClosed(Connection reset by peer) in response to GET request to /info with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-691)\n","stream":"stdout","time":"2018-04-27T08:43:23.99921881Z"} {"log":"[2018-04-27 11:59:35,257] WARN ErrorClosed(Connection reset by peer) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-677)\n","stream":"stdout","time":"2018-04-27T11:59:35.257747314Z"} {"log":"[2018-04-27 16:00:35,250] WARN ErrorClosed(Connection reset by peer) in response to GET request to /info with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-686)\n","stream":"stdout","time":"2018-04-27T16:00:35.250493551Z"} {"log":"[2018-04-27 16:26:15,428] WARN Premature connection close (the server doesn't appear to support request pipelining) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-643)\n","stream":"stdout","time":"2018-04-27T16:26:15.428131582Z"} {"log":"[2018-04-27 16:26:15,428] WARN Premature connection close (the server doesn't appear to support request pipelining) in response to GET request to /health with 5 retries left, retrying... (spray.can.client.HttpHostConnectionSlot:marathon-akka.actor.default-dispatcher-643)\n","stream":"stdout","time":"2018-04-27T16:26:15.428283473Z"}

      ```

      For a long time, there is no following logs any more:
      ```
      INFO Received health result for app [/mbank/capp-mbank-config]...
      ```

      This isssue has been created automatically from Marathon GitHub Issue 6188.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              marathon-bot Marathon Bot
              Team:
              Orchestration Team
              Watchers:
              Marathon Bot
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: