Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4575

Add timeout while trying to recover overlay

    Details

    • Sprint:
      Networking: RI-10 Sprint 38, Networking: RI-10 Sprint 39, Networking: RI-11 Sprint 40, Networking: RI-11 Sprint 41, Networking: RI-12 Sprint 42
    • Story Points:
      13

      Description

      While debugging COPS-4167, it was discovered that mesos overlay master doesn't have a timeout [1] while trying to recover overlay. This sometimes causes mesos overlay master to hang at the recovery stage. It requires manual intervention to bring mesos overlay master out of this state. A similar implementation in mesos has a timeout [2]

      [1] https://github.com/dcos/dcos-mesos-modules/blob/master/overlay/master.cpp#L1521
      [2] https://github.com/apache/mesos/blob/master/src/master/registrar.cpp#L342

        Attachments

          Activity

            People

            • Assignee:
              sergeyurbanovich Sergey Urbanovich
              Reporter:
              dgoel Deepak Goel
              Team:
              Networking Team
              Watchers:
              Deepak Goel, Sergey Urbanovich
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: