One issue we've encountered frequently with Marathon is the framework ID suddenly changing. This is very destructive, as it will cause Marathon to relaunch all of the tasks anew, since reconciliation for Marathons tasks with Mesos returns that they are not running as the tasks were launched by a different framework ID.
Historically, the main reason we have seen this behavior is due to zookeeper state corruption (failed migration, accidentally removed record, etc.)
In the future, if Marathon ever encounters a scenario where it is unable to read its framework ID, it should rather fail hard and crash, rather than automatically creating a new framework ID, and then launching all of the tasks. (See "fail loud and proud" Marathon cultural value).
One possible solution would be to create a specific Framework ID pseudo-record when Marathon first launches, and the state is completely empty (nosy nodes are created yet). This pseudo-record could be used to give Marathon permission to create a new framework ID. Alternatively, we could just simply assign some random UUID during the first Marathon initialization if Mesos is okay with framework generated framework IDs.
Given a Marathon instance with several tasks running
When if I manually delete the zookeeper record describing the framework ID
And I restart Marathon
Then Marathon should crash with a message explaining that the framework ID record is missing, with a link to documentation for more information