Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-3584

Slave node unable to rejoin the cluster after restart

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: High
    • Resolution: Done
    • Affects Version/s: DC/OS 1.11.0
    • Fix Version/s: None
    • Component/s: dcos-net
    • Labels:
    • Component Version:
    • Sprint:
      Networking Team 1.12 Sprint 7
    • Story Points:
      3

      Description

      Hello,

      I have a production environment built on aws using the latest template of dc/os 1.11. After I accidentally restart a node, this machine was unable to rejoin the cluster.  I've checked the services on the current node and figured out some dcos services were failed like dcos-net.service and the systemd-timesyncd wasn't start spamming the following message:
      `systemd-timesyncd.service: Job systemd-timesyncd.service/start failed with result 'dependency'` 
      How can I fix it and treat this issue to avoid future occurrences?

      Regards  

        Attachments

          Activity

            People

            • Assignee:
              dgoel Deepak Goel
              Reporter:
              ndases ndases
              Team:
              Networking Team
              Watchers:
              Deepak Goel, ndases
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: