Details

      Description

      community details pasted here for convenience: https://github.com/mesosphere/marathon/pull/5487

       

      I noticed that every once in a while I get responses from Marathon that have a status code of 200 but contain no response body, like this:

       

      http PUT http://marathon-host.local:8080/v2/apps/dummy?force=true @app.json HTTP/1.1 200 OK Content-Length: 0 Date: Wed, 23 Aug 2017 16:53:17 GMT Server: Jetty(9.3.6.v20151106)

      This seemed to happen only when request were happening with a high concurrency and I later noticed that it only happened when the client was not directly contacting the leader of the master quorum but one of the non-leading masters. While looking at the logs I noticed that in the log output of the master that forwarded the request there was an exception:

      [2017-08-22 14:54:31,953] WARN //marathon-host.local:8080/v2/apps/dummy (org.eclipse.jetty.server.HttpChannel:qtp848644304-275712) java.lang.RuntimeException: while proxying at mesosphere.marathon.api.LeaderProxyFilter.doFilter(LeaderProxyFilter.scala:117) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:240) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:375) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) at org.eclipse.jetty.server.Server.handle(Server.java:517) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at mesosphere.marathon.api.JavaUrlConnectionRequestForwarder.copyConnectionResponse$1(LeaderProxyFilter.scala:245) at mesosphere.marathon.api.JavaUrlConnectionRequestForwarder.forward(LeaderProxyFilter.scala:282) at mesosphere.marathon.DebugModule$MetricsBehavior$$anonfun$invoke$1.apply(DebugConf.scala:87) at mesosphere.marathon.metrics.Metrics.timed(Metrics.scala:28) at mesosphere.marathon.DebugModule$MetricsBehavior.invoke(DebugConf.scala:86) at mesosphere.marathon.api.LeaderProxyFilter.doFilter(LeaderProxyFilter.scala:114) ... 26 common frames omitted

      So there was a timeout during the proxy operation but the result still showed a response with an "Okay" status code.

      Debugging into Marathon I noticed that only {{ConnectionException}}s are caught in the proxy code where I would suppose every exception should result in a failed response (with different status codes denoting the type of failure).

      This PR is against the releases-1.3 branch, because we still run that minor version and I hope to get a patch release for that minor version.

      The issue is also present in 1.4.x and the current master, so I have also prepared branches for 1.4and master. If needed I can also create pull-requests for those branches.

        Attachments

          Activity

            People

            • Assignee:
              ken Ken Sipe
              Reporter:
              ken Ken Sipe
              Team:
              Orchestration Team
              Watchers:
              Ken Sipe, Pranay Kanwar, Timo Reimann
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: