Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-1591

Long running Connections to VIPs timeout and cause service failures.


    • Type: Task
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: DC/OS 1.10.2
    • Fix Version/s: None
    • Component/s: networking


      In my setup, we use a VIP to connect to pgpool from a set of java services.
      The VIPs timeout idle connections by default and this caused the services to fail.

      Anyone running microservices on DCOS using database connections via VIPs will most likely have this failure.

      This is documented in the https://dcos.io/docs/1.9/networking/load-balancing-vips/virtual-ip-addresses/ FAQ at the bottom, but does not include a detailed discussion of the configuration to either control or mitigate the situation.

      I found the following adjustments to be necessary to avoid the problem:
      a. Enable the JDBC driver tcp keep alive setting.
      b. Reset the VM default `net.ipv4.tcp_keepalive_time` from `7200` seconds down to `3600` seconds.
      c. Reset the HikariCP max connection lifetime from 30 minutes down to 4 minutes.

      A FAQ documentation update is probably sufficient, but I'm not familiar enough with the details of minuteman to do this completely.

      This issue may also impact the DC/OS installer to set the proper TCP keep alive settings.




            • Assignee:
              dgoel Deepak Goel
              jzampieron@zproject.net Jeffrey Zampieron
              Networking Team
              Deepak Goel, deric, Jeffrey Zampieron, Shafique Hassan
            • Watchers:
              4 Start watching this issue


              • Created: