Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8420

Be extremely conservative when creating a new framework ID

    Details

      Description

      Background / Overview

      One issue we've encountered frequently with Marathon is the framework ID suddenly changing. This is very destructive, as it will cause Marathon to relaunch all of the tasks anew, since reconciliation for Marathons tasks with Mesos returns that they are not running as the tasks were launched by a different framework ID.

      Historically, the main reason we have seen this behavior is due to zookeeper state corruption (failed migration, accidentally removed record, etc.)

      In the future, if Marathon ever encounters a scenario where it is unable to read its framework ID, it should rather fail hard and crash, rather than automatically creating a new framework ID, and then launching all of the tasks. (See "fail loud and proud" Marathon cultural value).

      Possible Implementation

      One possible solution would be to create a specific Framework ID pseudo-record when Marathon first launches, and the state is completely empty (nosy nodes are created yet). This pseudo-record could be used to give Marathon permission to create a new framework ID. Alternatively, we could just simply assign some random UUID during the first Marathon initialization if Mesos is okay with framework generated framework IDs.

      Acceptance Criteria

      Given a Marathon instance with several tasks running
      When if I manually delete the zookeeper record describing the framework ID
      And I restart Marathon
      Then Marathon should crash with a message explaining that the framework ID record is missing, with a link to documentation for more information

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tharper Tim Harper
                Reporter:
                tharper Tim Harper
                Team:
                Orchestration Team
                Watchers:
                daltonmatos, Matthias Eichstedt, Mergebot, Tim Harper
              • Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: