Lashup uses a gossip protocol to keep track of nodes that it shares information with.
It does not currently remove nodes that are no longer present.
This may cause issues in the following situation:
Cluster X: Node A, B, C
Cluster Y: Node D, E, F
Remove Node C. Later on (several days later), create a new node with the same IP address as Node C, but in Cluster Y.
Nodes A and B will continue to think they're supposed to talk to the node with IP C, and then the two clusters will get bridged.
- For every node, on some (random?) interval T, periodically look at Mesos state.
- If there are node IPs that are not in Mesos state that are in the local lashup gossip state, perform the following:
- Mark it as 'absent' (or something)
- After some X number of interval Ts, perform the following:
- Remove it from the local gossip list
- Stop talking with it (block inbound connections from it) (blacklist)
- Stop advertising it to other gossip neighbors
- Do not propagate the removal (to prevent inadvertent removal from the other cluster)
- After some A * X * T period, remove it from the blacklist (to support situations where the node might be added back to the same cluster).
Or something like this.