During scale testing, we discovered messages such as these in logs for dcos-telegraf.service on a master:
E! Error: statsd message queue full. We have dropped 70000 messages so far. You may want to increase allowed_pending_messages in the config
This was a surprise. There were no indications from Telegraf's internal metrics that metrics were being dropped.
Telegraf's inputs.statsd plugin reports metrics for TCP, but none for UDP. It should report metrics for the number of UDP messages received and dropped, as well as any other that may be useful, such as the size of the pending messages queue.
We should also make sure the dcos_statsd input plugin reports similar metrics.