We noticed today that the "latest-soak-cluster: root-marathon appears to be flapping" and "latest-soak-cluster: root-marathon leader appears to be flapping" monitors were muted, and were not receiving any data. Looking at the monitor histories, it appears that the last data point arrived on June 15.
Looking into the logs of the soak-monitors systemd unit, we find the following:
Looking at the soak-cluster-monitor code, we find the following likely source of the MarathonMonitor error: https://github.com/mesosphere/soak-cluster-monitor/blob/9182d570aa73b8d28d577b770712881899844567/monitors/marathon.py#L32-L33
Indeed, comparing the output of /marathon/metrics from a couple different clusters, we see that the 1.9 soak cluster provides a "gauges" field with entries like this:
while the latest soak cluster returns a "gauges" field with entries like this:
It seems that recently Marathon changed to a new metrics library, which likely caused this change in output; see this Slack discussion.