Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4575

Add timeout while trying to recover overlay

    Details

    • Type: Bug
    • Status: Integrating
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: DC/OS 1.10.9, DC/OS 1.11.8, DC/OS 1.12.0
    • Fix Version/s: None
    • Component/s: networking
    • Labels:
    • Sprint:
      Networking: RI-10 Sprint 38, Networking: RI-10 Sprint 39, Networking: RI-11 Sprint 40, Networking: RI-11 Sprint 41, Networking: RI-12 Sprint 42
    • Story Points:
      13

      Description

      While debugging COPS-4167, it was discovered that mesos overlay master doesn't have a timeout [1] while trying to recover overlay. This sometimes causes mesos overlay master to hang at the recovery stage. It requires manual intervention to bring mesos overlay master out of this state. A similar implementation in mesos has a timeout [2]

      [1] https://github.com/dcos/dcos-mesos-modules/blob/master/overlay/master.cpp#L1521
      [2] https://github.com/apache/mesos/blob/master/src/master/registrar.cpp#L342

        Attachments

          Activity

            People

            • Assignee:
              sergeyurbanovich Sergey Urbanovich
              Reporter:
              dgoel Deepak Goel
              Team:
              Networking Team
              Watchers:
              Deepak Goel, Sergey Urbanovich
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: