Affects Version/s: DC/OS 1.11.1
Sprint:Networking Team 1.12 Sprint 3, Networking Team 1.12 Sprint 4
I run DCOS on top of CoreOS (stable 1688.5.3).
The last upgrade from 1.11.0 to 1.11.1 failed.
dcos-mesos-slave service fail to start.
It doesn't find appropriate spartan interface
I0427 15:18:10.000000 6213 io_runner_impl.cpp:42] - found 'spartan', but family is '10' (!= '2')
I0427 15:18:10.000000 6213 io_runner_impl.cpp:34] - found 'vtep1024' (!= 'spartan')
F0427 15:18:10.000000 6213 io_runner_impl.cpp:59] Interface named 'spartan' was not found, see list above. Check configuration of 'listen_interface'.
Indeed spartan interface doesn't have ipv4 address when i run ip addr
dcos-net started but received kill signal
systemctl status dcos-net
● dcos-net.service - DC/OS Net: A distributed systems & network overlay orchestration engine
Loaded: loaded (/etc/systemd/system/dcos-net.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: signal) since Fri 2018-04-27 15:22:59 UTC; 2s ago
Process: 6621 ExecStart=/opt/mesosphere/bin/dcos-net-env foreground (code=killed, signal=KILL)
Process: 6615 ExecStartPre=/opt/mesosphere/bin/bootstrap dcos-net (code=exited, status=0/SUCCESS)
I don't know where I can find better log to troubleshoot this failure.
I tested the following scenario :
After the lost of 2 nodes : i stopped migration and tried the following
1) I installed a new 1.11.0 node with my configuration
=> It joined the dcos cluster without any problem
2 ) I upgraded to 1.11.1 ( my 3 masters nodes are already 1.11.1 ) with the same configuration
=> mesos slave failed to start.
Any help would be appreciate.