DC/OS - 1.9.6
Metronome - 0.2.4
Mission-critical jobs such as backup and repair for Cassandra framework nodes are scheduled via metronome in our production environment.
Recently we've learned these jobs aren't launching since some time now.
Validating in logs (Metronome leader, Mesos master leader, Mesos slave in the relevant private agent) shows nothing when the time arrives in which the job is expected to launch as per its schedule.
Other calls to Metronome complete OK, such as removing and recreating a scheduled job and running it as a one-off.
For now, we are working through this by running the jobs manually - but we'd like to see a way to overcome this quickly as you can understand.
Enclosed please find is a job definition for example and the relevant logs as mentioned above.