Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-2362

Lashup: Gradually remove unrelated nodes.


    • Story Points:


      Lashup uses a gossip protocol to keep track of nodes that it shares information with.

      It does not currently remove nodes that are no longer present.

      This may cause issues in the following situation:

      Cluster X: Node A, B, C

      Cluster Y: Node D, E, F

      Remove Node C.  Later on (several days later), create a new node with the same IP address as Node C, but in Cluster Y.

      Nodes A and B will continue to think they're supposed to talk to the node with IP C, and then the two clusters will get bridged.


      Proposed solution:

      • For every node, on some (random?) interval T, periodically look at Mesos state.
      • If there are node IPs that are not in Mesos state that are in the local lashup gossip state, perform the following:
        • Mark it as 'absent' (or something)
        • After some X number of interval Ts, perform the following:
          • Remove it from the local gossip list
          • Stop talking with it (block inbound connections from it) (blacklist)
          • Stop advertising it to other gossip neighbors
          • Do not propagate the removal (to prevent inadvertent removal from the other cluster)
        • After some A * X * T period, remove it from the blacklist (to support situations where the node might be added back to the same cluster).

      Or something like this.



          Issue Links



              • Assignee:
                dgoel Deepak Goel
                justinlee Justin Lee (Inactive)
                Networking Team
                cbuben, Deepak Goel, Jan Repnak, Justin Lee (Inactive), mimmus, Sergey Urbanovich
              • Watchers:
                6 Start watching this issue


                • Created: