Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4044

Telegraf dcos_statsd input may cause Mesos to fail to launch containers

    Details

      Description

      A few integration tests on https://github.com/dcos/dcos/pull/3366 and its EE bump occasionally failed with an odd error:

      Application deployment failed, reason: Failed to launch container: discarded; Abnormal executor termination: unknown container
      

      Kevin Klues investigated, and it looks like Telegraf is occasionally serving 404s to Mesos when it tries to make an HTTP request to telegraf with the statsd host/port for the container:

      2018-09-05 01:26:44: E0905 01:26:44.550675  2425 isolator.cpp:213] Received unexpected response code '404' when posting 'ContainerStartRequest' for container 'debug-582f73e5-3c76-4ab9-b25b-4cd54cd32ae1'
      2018-09-05 01:26:44: I0905 01:26:44.574218  2430 slave.cpp:3633] Asked to kill task integration-test-sleep-app-mesos-authz-346d204b-a764-4d7b-96ac-2e4edbde8db7.be8e5dea-b0aa-11e8-abee-70b3d5800001 of framework 119c6ef9-527f-44a2-a4ad-790d2cb2f96c-0000
      

      This prevents Mesos from running the container.

        Attachments

          Activity

            People

            • Assignee:
              philip Philip Norman
              Reporter:
              branden Branden Rolston
              Team:
              Cluster Ops Team
              Watchers:
              Branden Rolston, Daniel Baker, Lee Hambley (Inactive), Philip Norman
            • Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: