Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4606

Tailor Nginx VTS metrics to yield fine-grained HTTP time-series

    Details

      Description

      Without Nginx Plus, our current way for reporting metrics using the Nginx VTS module we encountered the following problems:

      1. Dynamically resolved upstreams (Mesos, Marathon) appear as ::nogroup upstream.
      2. Only reports 2xx,4xx,5xx instead of exact status codes.
      3. Does not report requesting client & requested URI.

      These 3 can be resolved using the VTS module filter_by_set_key functionality in unintended ways.

      First the Admin Router config needs to be annotated to report custom metrics taken from Nginx variables. An annotation looks like this:

      location /internal/acs/api/v1/ {
          vhost_traffic_status_filter_by_set_key client=$http_user_agent ,upstream=Bouncer,backend=${upstream_addr},status=${status},;
      }
      

      The annotation can be understood as a request is made for the URI /internal/acs/api/v1/ record the value client=$http_user_agent under the key ,upstream=Bouncer,backend=$

      {upstream_addr}

      ,status=$

      {status}

      , where the variables will be filled in at the time of crafting the response but before the key value set is made available as Prometheus time series.

      Since the intended way of reporting metrics with the VTS module is different from our way of use the above mentioned way of reporting metrics through filter annotation leads to badly formatted metrics.
      We'd like to taylor metrics reported by the VTS module using the regex parsing functionality of the Prometheus config for example like so:

      Scraped time series:

      nginx_vts_filter_requests_total{
          filter = ",upstream=Bouncer,backend=,status=401"
          filter_name = "client=Mesos/1.8.0 authorizer (master)"
      } 4
      

      Parsed time series:

      nginx_upstream_client_requests_total{
          upstream = "Bouncer"
          backend = ""
          status = "401"
          client = "Mesos/1.8.0 authorizer (master)"
      } 4
      

      This allows for using appropriate labels in Grafana which enables simplified dashboard creation. The outcome of this story is to parse the upstream, backend, status code and client for each request sent towards Nginx which was reported through the VTS module.

      The aforementioned annotation will potentially lead to multiple useless timeseries being reported for example when the value is a particular status code. We must take care of that in a follow-up story.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                timweidner Tim Weidner
                Reporter:
                timweidner Tim Weidner
                Team:
                Security Team
                Watchers:
                Mergebot, Tim Weidner
              • Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: