Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7731

Marathon 1.4.6 - five minutes before marathon starts healthchecks after re-deployments

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Medium
    • Resolution: Cannot Reproduce
    • Affects Version/s: Marathon 1.4.6
    • Fix Version/s: None
    • Component/s: Health & Readiness
    • Labels:

      Description

      We were running marathon version 1.3.9, and have since upgraded to 1.4.6. Now that we are on 1.4.6, when we re-deploy applications (and it happens in single or ha-mode), the newly launched applications are stuck in an 'unknown' state.

      It appears that is takes about 5 minutes for marathon to add and start the healthcheck after the application is deployed and running.

      [2017-08-28 13:08:27,959] INFO Processing LaunchEphemeral(Instance(instance [autopay-payfile-adapter.marathon-fb61822e-8bf1-11e7-8f6c-0242ac110003],AgentInfo(devops-app-biojsrgw.us-east-1.aws.piadp.com,Some(0eaa99cd-f760-43d9-8d7a-f2b8ceb778ab-S10),Vector(name: "node_type"
      type: TEXT
      text

      { value: "app" }

      )),InstanceState(Created,2017-08-28T13:08:27.895Z,None,None),Map(task [example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003] -> LaunchedEphemeral(task [example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003],2017-08-28T13:08:21.684Z,Status(2017-08-28T13:08:27.895Z,None,None,Created,NetworkInfo(devops-app-biojsrgw.us-east-1.aws.piadp.com,Vector(31745),List())))),2017-08-28T13:08:21.684Z,UnreachableEnabled(5 minutes,10 minutes))) for instance [example-app.marathon-fb61822e-8bf1-11e7-8f6c-0242ac110003] (mesosphere.marathon.core.launcher.impl.OfferProcessorImpl:ForkJoinPool-2-worker-7)
      [2017-08-28 13:08:27,963] INFO POSTing to all endpoints. (mesosphere.marathon.core.event.impl.callback.HttpEventActor:marathon-akka.actor.default-dispatcher-4)
      [2017-08-28 13:08:27,963] INFO Processing LaunchEphemeral(Instance(instance [pi-api-pcc.marathon-fb61d050-8bf1-11e7-8f6c-0242ac110003],AgentInfo(devops-app-dnlugosy.us-east-1.aws.piadp.com,Some(0eaa99cd-f760-43d9-8d7a-f2b8ceb778ab-S13),Vector(name: "node_type"
      type: TEXT
      text

      { value: "app" }

      [2017-08-28 13:08:29,576] INFO Acknowledge status update for task example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003: TASK_RUNNING () (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$849fff9e:ForkJoinPool-2-worker-19)

      (...a number of these repeat over and over, until finally...)

      [2017-08-28 13:13:51,705] INFO 10.204.64.20 - - [28/Aug/2017:13:13:51 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1740 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-1031)
      [2017-08-28 13:13:56,710] INFO 10.204.64.20 - - [28/Aug/2017:13:13:56 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1740 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-373)
      [2017-08-28 13:14:01,705] INFO 10.204.64.20 - - [28/Aug/2017:13:14:01 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1740 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-1033)
      [2017-08-28 13:14:06,709] INFO 10.204.64.20 - - [28/Aug/2017:13:14:06 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1741 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-42)
      [2017-08-28 13:14:11,683] INFO 10.204.64.20 - - [28/Aug/2017:13:14:11 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1739 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-765)
      [2017-08-28 13:14:16,697] INFO 10.204.64.20 - - [28/Aug/2017:13:14:16 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1741 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-370)
      [2017-08-28 13:14:21,714] INFO 10.204.64.20 - - [28/Aug/2017:13:14:21 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1739 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-1035)
      [2017-08-28 13:14:24,995] INFO Already running 1 instances of /example-app. Not scaling. (mesosphere.marathon.SchedulerActions:ForkJoinPool-2-worker-29)
      [2017-08-28 13:14:24,998] INFO reconcile [/example-app] with latest version [2017-08-28T13:08:21.684Z] (mesosphere.marathon.core.health.impl.MarathonHealthCheckManager:ForkJoinPool-2-worker-19)
      [2017-08-28 13:14:24,999] INFO addAllFor [/example-app] version [2017-08-28T13:08:21.684Z] (mesosphere.marathon.core.health.impl.MarathonHealthCheckManager:ForkJoinPool-2-worker-19)
      [2017-08-28 13:14:24,999] INFO Adding health check for app [/example-app] and version [2017-08-28T13:08:21.684Z]: [MarathonHttpHealthCheck(180 seconds,20 seconds,5 seconds,3,Some(ByIndex(0)),None,Some(/health),HTTP,false)] (mesosphere.marathon.core.health.impl.MarathonHealthCheckManager:ForkJoinPool-2-worker-19)
      [2017-08-28 13:14:25,000] INFO Starting health check actor for app [/example-app] version [2017-08-28T13:08:21.684Z] and healthCheck [MarathonHttpHealthCheck(180 seconds,20 seconds,5 seconds,3,Some(ByIndex(0)),None,Some(/health),HTTP,false)] (mesosphere.marathon.core.health.impl.HealthCheckActor:marathon-akka.actor.default-dispatcher-4)
      [2017-08-28 13:14:25,007] INFO Received status update for task example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003: TASK_RUNNING (Reconciliation: Latest task state) (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$28802605:Thread-550)
      [2017-08-28 13:14:25,010] INFO Acknowledge status update for task example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003: TASK_RUNNING (Reconciliation: Latest task state) (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$849fff9e:ForkJoinPool-2-worker-29)
      [2017-08-28 13:14:25,033] INFO Received status update for task example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003: TASK_RUNNING (Reconciliation: Latest task state) (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$28802605:Thread-584)
      [2017-08-28 13:14:25,050] INFO Acknowledge status update for task example-app.fb61822e-8bf1-11e7-8f6c-0242ac110003: TASK_RUNNING (Reconciliation: Latest task state) (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$849fff9e:ForkJoinPool-2-worker-23)
      [2017-08-28 13:14:26,707] INFO 10.204.64.20 - - [28/Aug/2017:13:14:26 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1741 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-1037)
      [2017-08-28 13:14:30,029] INFO Received health result for app [/example-app] version [2017-08-28T13:08:21.684Z]: [Healthy(instance [example-app.marathon-fb61822e-8bf1-11e7-8f6c-0242ac110003],2017-08-28T13:08:21.684Z,2017-08-28T13:14:30.029Z,true)] (mesosphere.marathon.core.health.impl.HealthCheckActor:marathon-akka.actor.default-dispatcher-2)
      [2017-08-28 13:14:31,685] INFO 10.204.64.20 - - [28/Aug/2017:13:14:31 +0000] "GET //devops.internal.elb.us-east-1.aws.piadp.com:8080/v2/apps//example-app?embed=app.taskStats&embed=app.readiness HTTP/1.1" 200 1815 "http://devops.internal.elb.us-east-1.aws.piadp.com:8080/ui/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$28f774fb:qtp1879219501-765)

      This isssue has been created automatically from Marathon GitHub Issue 5492.

        Attachments

          Activity

            People

            • Assignee:
              tharper Tim Harper
              Reporter:
              marathon-bot Marathon Bot
              Team:
              Orchestration Team
              Watchers:
              Ken Sipe, Marathon Bot
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: