Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8043

Document application state monitoring best practices

    Details

      Description

      Can we create a page about how to monitor a specific marathon application. We are in a need to monitor applications in a shared environment among multiple development teams. They have different requirements about how to monitor their apps. E.g. they want to be notified if a deployment stuck in deployment or waiting state, or not all running instances are with same version for an extended period.

      I created this issue as a result of discussions in https://mesos.slackarchive.io/marathon/page-56/ts-1516815495000201

      As far as I see the UI components adds higher level abstraction to the states that are queryable from any Marathon endpoint, mostly documented here: https://mesosphere.github.io/marathon/docs/marathon-ui.html#application-status-reference . There used to be states like Running, Deploying, Suspended, Delayed, Waiting, Unknown, Overcapacity, Unscheduled which users want to monitor the state of the applications has to replicate the logic for those.

      I can also see some troubleshooting section in marathon docs, but that doesn't really help much about how to implement a monitor to catch such cases without a human involved: https://mesosphere.github.io/marathon/docs/waiting.html 

      ~2 years ago I implemented some application state checks here, and some may not be relevant today: https://github.com/bergerx/prom_marathon_app_exporter/blob/master/README.md#alerts-on-prometheus 

       

        Attachments

          Activity

            People

            • Assignee:
              joshearlenbaugh Josh Earlenbaugh
              Reporter:
              bergerx Bekir Dogan
              Team:
              Orchestration Team
              Watchers:
              zemmet
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: