Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-3538

dcos-net: dcos_dns_handler crashes if upstreams are not available

    Details

    • Story Points:
      3

      Description

      I saw following crash on my DC/OS cluster running DC/OS master

      2018-05-22 03:04:30 =CRASH REPORT====
        crasher:
          initial call: dcos_dns_handler:init/4
          pid: <0.2428.0>
          registered_name: []
          exception error: {function_clause,[{lists,map,[#Fun<dcos_dns_handler.3.14594562>,{{10,0,2,150},61053}],[{file,"lists.erl"},{line,1238}]},{lists,map,2,[{file,"lists.erl"},{line,1239}]},{dcos_dns_handler,resolve,4,[{file,"/pkg/src/dcos-net/_build/prod/lib/dcos_dns/src/dcos_dns_handler.erl"},{line,117}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
          ancestors: [dcos_dns_handler_sj_4,dcos_dns_handler_sj_worker_sup,dcos_dns_handler_sj,sidejob_sup,<0.1202.0>]
          message_queue_len: 0
          messages: []
          links: [<0.1258.0>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 610
          stack_size: 27
          reductions: 464
        neighbours:
      crasher:
          initial call: dcos_dns_handler:-start_worker/3-fun-0-/0
          pid: <0.2429.0>
          registered_name: []
          exception exit: {noproc,[{dcos_dns_handler,udp_worker,3,[{file,"/pkg/src/dcos-net/_build/prod/lib/dcos_dns/src/dcos_dns_handler.erl"},{line,178}]},{proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,232}]}]}
          ancestors: [<0.2428.0>,dcos_dns_handler_sj_4,dcos_dns_handler_sj_worker_sup,dcos_dns_handler_sj,sidejob_sup,<0.1202.0>]
          message_queue_len: 0
          messages: []
          links: []
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 610
          stack_size: 27
          reductions: 864
        neighbours:
      

      This happens when there is a query for `.mesos` domain and Mesos-DNS is not available. Looking at the code it appears that there is an explicit crash if such a situation happens
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L178 (edited)
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L183 (edited)
      Similarly, for tcp
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L204
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L209
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L211
      https://github.com/dcos/dcos-net/blob/master/apps/dcos_dns/src/dcos_dns_handler.erl#L213

      Maybe we could handle this more gracefully and instead of `exit` a warning would be sufficient.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sergeyurbanovich Sergey Urbanovich
                Reporter:
                dgoel Deepak Goel
                Team:
                Networking Team
                Watchers:
                Bily Luo, Deepak Goel, Sergey Urbanovich, skoo87
                Reviewers:
                Deepak Goel
              • Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: