There have been a number of issues here with some work done... but there may still be an edge case where zk resolution leads to a zombie marathon (the process doesn't die but it isn't leading any longer)
We should confirm that Curator does not consider the retry policy properly when DNS resolution issues are happening (this is a problem because we depend on our retry policy to tell Marathon to suicide). We believe leader abdication is now working because it crashes, but standby curator elections could become zombies, and if the leader restarts, it could become a new standby zombie just as well.
Following the confirmation of said behavior, let's collaborate with the curator maintainers to get a fix in. If it's what I think it is, then the fix should be simple (I think it's possible they forgot to have proper exception handling around the DNS resolution portion of the connection routine is all).