Setup is three servers, each running the zookeeper, mesos-master and marathon services (plus consul and dnsmasq for internal hostname resolution) Additionally, there are 6 slaves running the mesos-agent service (as well as consul and dnsmasq).
Command line flags for mesos-master
Command line flags for marathon
all hostnames are resolvable
I'm starting with mesos-master and marathon leading on server 01.mesos-master.service.internal. with no tasks running.
I start a simple task:
See that marathon is running it fine.
I force a leader election. marathon is now leading at 03.mesos-master.service.internal/03.marathon.service.internal
I then try to suspend the task in the marathon ui. I use the ui for the leading marathon service.
The task is still running
Eventually marathon marks the task as suspended, but I still see that the task is running in the mesos ui
I've been experimenting with all sorts of different configurations. I've tried upgrading marathon and mesos. I've tried ips instead of hostnames. I can't seem to resolve this issue. We've actually been running mesos and marathon for a year and a half and it seems that this issue exists in all of our clusters. If this is a configuration issue, please help me figure out what I'm doing wrong. As is it, any errant marathon leader election keeps us from deploying/controlling our applications.