Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-3602

L4LB unstable when something is deployed in the cluster

    Details

      Description

      We currently have a setup with 41 machines + 3 masters in a Hetzner (Germany) server provider.

      Most of our web applications talk to each other using VIPs and L4LB addresses. Whenever we deploy something in the cluster, we see that every application using L4LB addresses start to timeout.

      They eventually come back up but it might take a minute or two.

       

      net.ipv4.tcp_abort_on_overflow = 0
      net.ipv4.tcp_adv_win_scale = 1
      net.ipv4.tcp_allowed_congestion_control = cubic reno
      net.ipv4.tcp_app_win = 31
      net.ipv4.tcp_autocorking = 1
      net.ipv4.tcp_available_congestion_control = cubic reno
      net.ipv4.tcp_base_mss = 512
      net.ipv4.tcp_challenge_ack_limit = 1000
      net.ipv4.tcp_congestion_control = cubic
      net.ipv4.tcp_dsack = 1
      net.ipv4.tcp_early_retrans = 3
      net.ipv4.tcp_ecn = 2
      net.ipv4.tcp_fack = 1
      net.ipv4.tcp_fastopen = 0
      net.ipv4.tcp_fastopen_key = 00000000-00000000-00000000-00000000
      net.ipv4.tcp_fin_timeout = 60
      net.ipv4.tcp_frto = 2
      net.ipv4.tcp_invalid_ratelimit = 500
      net.ipv4.tcp_keepalive_intvl = 75
      net.ipv4.tcp_keepalive_probes = 9
      net.ipv4.tcp_keepalive_time = 7200
      net.ipv4.tcp_limit_output_bytes = 262144
      net.ipv4.tcp_low_latency = 0
      net.ipv4.tcp_max_orphans = 131072
      net.ipv4.tcp_max_ssthresh = 0
      net.ipv4.tcp_max_syn_backlog = 1024
      net.ipv4.tcp_max_tw_buckets = 131072
      net.ipv4.tcp_mem = 756210    1008283    1512420
      net.ipv4.tcp_min_tso_segs = 2
      net.ipv4.tcp_moderate_rcvbuf = 1
      net.ipv4.tcp_mtu_probing = 0
      net.ipv4.tcp_no_metrics_save = 0
      net.ipv4.tcp_notsent_lowat = -1
      net.ipv4.tcp_orphan_retries = 0
      net.ipv4.tcp_reordering = 3
      net.ipv4.tcp_retrans_collapse = 1
      net.ipv4.tcp_retries1 = 3
      net.ipv4.tcp_retries2 = 15
      net.ipv4.tcp_rfc1337 = 0
      net.ipv4.tcp_rmem = 4096    87380    6291456
      net.ipv4.tcp_sack = 1
      net.ipv4.tcp_slow_start_after_idle = 1
      net.ipv4.tcp_stdurg = 0
      net.ipv4.tcp_syn_retries = 6
      net.ipv4.tcp_synack_retries = 5
      net.ipv4.tcp_syncookies = 1
      net.ipv4.tcp_thin_dupack = 0
      net.ipv4.tcp_thin_linear_timeouts = 0
      net.ipv4.tcp_timestamps = 1
      net.ipv4.tcp_tso_win_divisor = 3
      net.ipv4.tcp_tw_recycle = 0
      net.ipv4.tcp_tw_reuse = 0
      net.ipv4.tcp_window_scaling = 1
      net.ipv4.tcp_wmem = 4096    16384    4194304
      net.ipv4.tcp_workaround_signed_windows = 0
      net.ipv4.vs.secure_tcp = 0
      

       

       

      During the deploy, we also see the following in the logs:

       

      Jun  5 15:14:03 n08-sx61-r01-ht kernel: IPVS: __ip_vs_del_service: enter

       

      Docker version 17.09.0-ce, build afdb6d4

      Kernel 3.10.0-514.10.2.el7.x86_64

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sergeyurbanovich Sergey Urbanovich
                Reporter:
                grillorafael grillorafael
                Team:
                Networking Team
                Watchers:
                Arthur Johnson (Inactive), Deepak Goel, grillorafael, halldorh, mainred, Marco Monaco, Rafael Abreu, Sergey Urbanovich, Shafique Hassan, skoo87, templed
              • Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: