Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8043

Document application state monitoring best practices



      Can we create a page about how to monitor a specific marathon application. We are in a need to monitor applications in a shared environment among multiple development teams. They have different requirements about how to monitor their apps. E.g. they want to be notified if a deployment stuck in deployment or waiting state, or not all running instances are with same version for an extended period.

      I created this issue as a result of discussions in https://mesos.slackarchive.io/marathon/page-56/ts-1516815495000201

      As far as I see the UI components adds higher level abstraction to the states that are queryable from any Marathon endpoint, mostly documented here: https://mesosphere.github.io/marathon/docs/marathon-ui.html#application-status-reference . There used to be states like Running, Deploying, Suspended, Delayed, Waiting, Unknown, Overcapacity, Unscheduled which users want to monitor the state of the applications has to replicate the logic for those.

      I can also see some troubleshooting section in marathon docs, but that doesn't really help much about how to implement a monitor to catch such cases without a human involved: https://mesosphere.github.io/marathon/docs/waiting.html 

      ~2 years ago I implemented some application state checks here, and some may not be relevant today: https://github.com/bergerx/prom_marathon_app_exporter/blob/master/README.md#alerts-on-prometheus 





            • Assignee:
              joshearlenbaugh Josh Earlenbaugh
              bergerx Bekir Dogan
              Orchestration Team
              Bekir Dogan, zemmet
            • Watchers:
              2 Start watching this issue


              • Created: