Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: DC/OS 1.9.6
    • Fix Version/s: None
    • Component/s: metronome

      Description

      Versions:
      DC/OS - 1.9.6
      Metronome - 0.2.4 

      Mission-critical jobs such as backup and repair for Cassandra framework nodes are scheduled via metronome in our production environment.

      Recently we've learned these jobs aren't launching since some time now.

      Validating in logs (Metronome leader, Mesos master leader, Mesos slave in the relevant private agent) shows nothing when the time arrives in which the job is expected to launch as per its schedule.

      Other calls to Metronome complete OK, such as removing and recreating a scheduled job and running it as a one-off. 

      For now, we are working through this by running the jobs manually - but we'd like to see a way to overcome this quickly as you can understand.

      Enclosed please find is a job definition for example and the relevant logs as mentioned above.

        Attachments

          Activity

            People

            • Assignee:
              marco.monaco Marco Monaco
              Reporter:
              avikalvo Avi Kalvo
              Team:
              Orchestration Team
              Watchers:
              Alena Varkockova, Avi Kalvo, Ken Sipe, Matthias Eichstedt
            • Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: