Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8348

Marathon SI test `test_external_volume` is flaky

    Details

    • Type: Bug
    • Status: Open
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: DC/OS 1.12
    • Fix Version/s: DC/OS 1.12.0
    • Component/s: Storage Volumes
    • Labels:
      None
    • Build artifact:
      Marathon-v1.7.174

      Description

      Marathon SI test for an external volume here fails ~1/10 times.

      Marathon will get a TASK_FAILED status update with the reason:

      Received status update for task external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36: TASK_FAILED (Failed to launch container: Unexpected termination of the subprocess: time="2018-08-05T22:11:29Z" level=error msg="VolumeDriver.Mount: docker-legacy: Mount: marathon-si-test-vol-d44ef6f275034d28a142b534a668437b: failed: "
      2018-08-05 22:12:20: ) (mesosphere.marathon.MarathonScheduler:Thread-45)

      Slave doesn't give more details away:

      12626:2018-08-05 22:10:24: I0805 22:10:24.474071 27309 paths.cpp:745] Creating sandbox '/var/lib/mesos/slave/slaves/51523a73-d40b-4d53-b38f-5a19874faac6-S2/frameworks/51523a73-d40b-4d53-b38f-5a19874faac6-0001/executors/external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36/runs/ed7353bb-58f2-40c2-9cff-509971bf7ae3' for user 'root'
      12627:2018-08-05 22:10:24: I0805 22:10:24.474871 27309 paths.cpp:748] Creating sandbox '/var/lib/mesos/slave/meta/slaves/51523a73-d40b-4d53-b38f-5a19874faac6-S2/frameworks/51523a73-d40b-4d53-b38f-5a19874faac6-0001/executors/external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36/runs/ed7353bb-58f2-40c2-9cff-509971bf7ae3'
      12628:2018-08-05 22:10:24: I0805 22:10:24.475085 27309 slave.cpp:8997] Launching executor 'external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36' of framework 51523a73-d40b-4d53-b38f-5a19874faac6-0001 with resources [{"allocation_info":{"role":"slave_public"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"slave_public"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] in work directory '/var/lib/mesos/slave/slaves/51523a73-d40b-4d53-b38f-5a19874faac6-S2/frameworks/51523a73-d40b-4d53-b38f-5a19874faac6-0001/executors/external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36/runs/ed7353bb-58f2-40c2-9cff-509971bf7ae3'
      12630:2018-08-05 22:10:24: I0805 22:10:24.476027 27309 slave.cpp:3527] Launching container ed7353bb-58f2-40c2-9cff-509971bf7ae3 for executor 'external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36' of framework 51523a73-d40b-4d53-b38f-5a19874faac6-0001
      12632:2018-08-05 22:10:24: I0805 22:10:24.476752 27309 containerizer.cpp:1256] Starting container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12633:2018-08-05 22:10:24: I0805 22:10:24.477326 27309 containerizer.cpp:3025] Transitioning the state of container ed7353bb-58f2-40c2-9cff-509971bf7ae3 from PROVISIONING to PREPARING
      12634:2018-08-05 22:10:24: I0805 22:10:24.494971 27305 memory.cpp:478] Started listening for OOM events for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12635:2018-08-05 22:10:24: I0805 22:10:24.495479 27305 memory.cpp:590] Started listening on 'low' memory pressure events for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12636:2018-08-05 22:10:24: I0805 22:10:24.496009 27305 memory.cpp:590] Started listening on 'medium' memory pressure events for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12637:2018-08-05 22:10:24: I0805 22:10:24.496539 27305 memory.cpp:590] Started listening on 'critical' memory pressure events for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12638:2018-08-05 22:10:24: I0805 22:10:24.501562 27310 cpu.cpp:101] Updated 'cpu.shares' to 204 (cpus 0.2) for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12639:2018-08-05 22:10:24: I0805 22:10:24.501662 27304 memory.cpp:198] Updated 'memory.soft_limit_in_bytes' to 64MB for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12640:2018-08-05 22:10:24: I0805 22:10:24.502427 27304 memory.cpp:227] Updated 'memory.limit_in_bytes' to 64MB for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12641:2018-08-05 22:10:24: I0805 22:10:24.503028 27310 cpu.cpp:121] Updated 'cpu.cfs_period_us' to 100ms and 'cpu.cfs_quota_us' to 20ms (cpus 0.2) for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12650:2018-08-05 22:10:24: W0805 22:10:24.833055 27305 containerizer.cpp:2355] Skipping status for container ed7353bb-58f2-40c2-9cff-509971bf7ae3 because: Container does not exist
      12651:2018-08-05 22:10:24: W0805 22:10:24.833856 27306 containerizer.cpp:2262] Skipping resource statistic for container ed7353bb-58f2-40c2-9cff-509971bf7ae3 because: Unknown container
      12850:2018-08-05 22:11:24: W0805 22:11:24.855736 27310 containerizer.cpp:2355] Skipping status for container ed7353bb-58f2-40c2-9cff-509971bf7ae3 because: Container does not exist
      12851:2018-08-05 22:11:24: W0805 22:11:24.856288 27308 containerizer.cpp:2262] Skipping resource statistic for container ed7353bb-58f2-40c2-9cff-509971bf7ae3 because: Unknown container
      12865:2018-08-05 22:11:29: E0805 22:11:29.797806 27309 slave.cpp:6180] Container 'ed7353bb-58f2-40c2-9cff-509971bf7ae3' for executor 'external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36' of framework 51523a73-d40b-4d53-b38f-5a19874faac6-0001 failed to start: Unexpected termination of the subprocess: time="2018-08-05T22:11:29Z" level=error msg="VolumeDriver.Mount: docker-legacy: Mount: marathon-si-test-vol-d44ef6f275034d28a142b534a668437b: failed: "
      12866:2018-08-05 22:11:29: I0805 22:11:29.797945 27309 containerizer.cpp:2411] Destroying container ed7353bb-58f2-40c2-9cff-509971bf7ae3 in PREPARING state
      12867:2018-08-05 22:11:29: I0805 22:11:29.797971 27309 containerizer.cpp:3025] Transitioning the state of container ed7353bb-58f2-40c2-9cff-509971bf7ae3 from PREPARING to DESTROYING
      12868:2018-08-05 22:11:29: I0805 22:11:29.798318 27311 container_assigner.cpp:101] Unregistering container_id[value: "ed7353bb-58f2-40c2-9cff-509971bf7ae3"].
      12869:2018-08-05 22:11:29: W0805 22:11:29.798382 27433 container_assigner_strategy.cpp:192] No ephemeral-port reader had been assigned to container[value: "ed7353bb-58f2-40c2-9cff-509971bf7ae3"], cannot unregister
      12870:2018-08-05 22:11:29: I0805 22:11:29.798415 27433 container_state_cache_impl.cpp:145] Removing container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/ed7353bb-58f2-40c2-9cff-509971bf7ae3]
      12871:2018-08-05 22:11:29: E0805 22:11:29.798449 27433 container_state_cache_impl.cpp:148] Failed to remove container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/ed7353bb-58f2-40c2-9cff-509971bf7ae3]: No such file or directory
      12872:2018-08-05 22:11:29: W0805 22:11:29.798476 27306 disk.cpp:453] Ignoring cleanup for unknown container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      12873:2018-08-05 22:11:29: I0805 22:11:29.798702 27306 isolator.cpp:600] Unmounting the volume with driver 'rexray' and name 'marathon-si-test-vol-d44ef6f275034d28a142b534a668437b' for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      13039:2018-08-05 22:12:20: I0805 22:12:20.763944 27306 isolator.cpp:644] Removed the checkpoint directory at '/var/lib/mesos/isolators/docker/volume/ed7353bb-58f2-40c2-9cff-509971bf7ae3' for container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      13042:2018-08-05 22:12:20: E0805 22:12:20.787039 27309 slave.cpp:5618] Failed to update resources for container ed7353bb-58f2-40c2-9cff-509971bf7ae3 of executor 'external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36' running task external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36 on status update for terminal task, destroying container: Container not found
      13043:2018-08-05 22:12:20: W0805 22:12:20.787101 27309 composing.cpp:604] Attempted to destroy unknown container ed7353bb-58f2-40c2-9cff-509971bf7ae3
      13051:2018-08-05 22:12:20: I0805 22:12:20.819828 27307 gc.cpp:95] Scheduling '/var/lib/mesos/slave/slaves/51523a73-d40b-4d53-b38f-5a19874faac6-S2/frameworks/51523a73-d40b-4d53-b38f-5a19874faac6-0001/executors/external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36/runs/ed7353bb-58f2-40c2-9cff-509971bf7ae3' for gc 1.99999051426963days in the future
      13053:2018-08-05 22:12:20: I0805 22:12:20.819912 27307 gc.cpp:95] Scheduling '/var/lib/mesos/slave/meta/slaves/51523a73-d40b-4d53-b38f-5a19874faac6-S2/frameworks/51523a73-d40b-4d53-b38f-5a19874faac6-0001/executors/external-volume-mesos-app-b74e2c2a784c4b4cbbde2ba4b1820c6a.59fe52c9-98fc-11e8-a6c9-4ee9a2204d36/runs/ed7353bb-58f2-40c2-9cff-509971bf7ae3' for gc 1.99999051292741days in the future

        Attachments

          Activity

            People

            • Assignee:
              sethmason Seth Mason
              Reporter:
              adukhovniy Aleksey Dukhovniy
              Team:
              Storage Team
              Watchers:
              Aleksey Dukhovniy
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: