In my setup, we use a VIP to connect to pgpool from a set of java services.
The VIPs timeout idle connections by default and this caused the services to fail.
Anyone running microservices on DCOS using database connections via VIPs will most likely have this failure.
This is documented in the https://dcos.io/docs/1.9/networking/load-balancing-vips/virtual-ip-addresses/ FAQ at the bottom, but does not include a detailed discussion of the configuration to either control or mitigate the situation.
I found the following adjustments to be necessary to avoid the problem:
a. Enable the JDBC driver tcp keep alive setting.
b. Reset the VM default `net.ipv4.tcp_keepalive_time` from `7200` seconds down to `3600` seconds.
c. Reset the HikariCP max connection lifetime from 30 minutes down to 4 minutes.
A FAQ documentation update is probably sufficient, but I'm not familiar enough with the details of minuteman to do this completely.
This issue may also impact the DC/OS installer to set the proper TCP keep alive settings.