Currently there is the "global" instance of Marathon (the Services Tab) and Chronos (Metronome) in DCOS. This works as a great space for "clusterwide" shared setup.
From a multi-tenant standpoint, The ability to install more instances of Marathon/Chronos-Metronome (I am using both Chronos Metronome as Chronos has a UI right now, Metronome does not). is a great way to split up various tenants into roles and also distribute administrative burden. Basically, with some basic authentication, each instance can be "administrated" by each tenants administrator. So I as a super admin, can run instances of Marathon or Chronos in my Marathon (Services) that have specific constraints (roles etc) for each tenant, and then I can specify http credentials for the tenant instances of Marathon and Chronos for a the tenant administrator to work with. They can setup more users, etc, however, the instance itself is run by me, and started by me, and configured by me.
This means if I say this instance is only works with the dev role, the tenant can't run tasks on that marathon using the prod role etc. Yet, they could easily run tasks, add other users who can submit jobs etc.
What I would like to see, based on the story above is the ability to start marathon/chronos (metronome when it gets a UI) with task based limits/settings. So I start an instance of marathon, I can create a json file of hard coded app settings (for some settings) and maximums/minimums for other settings.
A simple example: I want to provide a dev instance where people can have freedom to run how ever many dev things they want, but I set a CPU maximum to be 3 cpu shares. Even if a user has the ability to submit their own marathon jobs, if they set their CPU to 4 it will pull it back to 3 because the instance of marathon it submitted to has a CPU max.
A more complex example is users tasks can run as, (a list of users, if the runas user specified in the task it fails) This allows security to be somewhat controlled in multi-tenant environments while still allowing for self service administration. I.e. I can have a users submit jobs, but set limits on what they can can't do, giving them the freedom to develop/try things, without issuing a burden on the "super admins" and without damage to the rest of the cluster.
Another example, I may be using the CNI and have overlay networks with rules setup. I may setup rules that say in Dev, Dev can only talk to other Dev items, and thus, want all tasks submitted to marathon/chronos to be forced to use the dev overlay network. This could be something I enforce at the Marathon level.
Same for constraints, I may want all dev jobs to always have a set of constraints, pinning them to certain dev nodes.
Basically I am looking for the ability to run marathon with a json list of rules, for (probably all) settings of a marathon/chronos task. These rules should be matching (to a list) should be contains (found in items of the list) and we should look at for items that specified as numbers allowing GT and LT (and LTorE and GTorE). These rules obvious don't apply to the "master" Marathon and Metronome instances in DCOS, however, any downstream instance can be limited as such.
The advantages of this setup are a one time administration cost to determine the rules that the tenant would be forced to follow, but that one time cost would be repaid many times over by allowing tenants to be self sufficient and not relying on either constant interaction with the super admin, or a level of trust that won't "break things"
I would love discussion on this, I think this adds a great level of isolation and security, while still aiming to have self sufficient multi-tenancy.