community details pasted here for convenience:


      I noticed that every once in a while I get responses from Marathon that have a status code of 200 but contain no response body, like this:


      http PUT http://marathon-host.local:8080/v2/apps/dummy?force=true @app.json HTTP/1.1 200 OK Content-Length: 0 Date: Wed, 23 Aug 2017 16:53:17 GMT Server: Jetty(9.3.6.v20151106)

      This seemed to happen only when request were happening with a high concurrency and I later noticed that it only happened when the client was not directly contacting the leader of the master quorum but one of the non-leading masters. While looking at the logs I noticed that in the log output of the master that forwarded the request there was an exception:

      [2017-08-22 14:54:31,953] WARN //marathon-host.local:8080/v2/apps/dummy (org.eclipse.jetty.server.HttpChannel:qtp848644304-275712) java.lang.RuntimeException: while proxying at mesosphere.marathon.api.LeaderProxyFilter.doFilter(LeaderProxyFilter.scala:117) at at at at at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter( at org.eclipse.jetty.servlet.ServletHandler.doHandle( at org.eclipse.jetty.server.handler.ContextHandler.doHandle( at org.eclipse.jetty.servlet.ServletHandler.doScope( at org.eclipse.jetty.server.handler.ContextHandler.doScope( at org.eclipse.jetty.server.handler.ScopedHandler.handle( at org.eclipse.jetty.server.handler.HandlerWrapper.handle( at com.codahale.metrics.jetty9.InstrumentedHandler.handle( at org.eclipse.jetty.server.handler.HandlerCollection.handle( at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle( at org.eclipse.jetty.server.handler.HandlerWrapper.handle( at org.eclipse.jetty.server.Server.handle( at org.eclipse.jetty.server.HttpChannel.handle( at org.eclipse.jetty.server.HttpConnection.onFillable( at$ReadCallback.succeeded( at at$ at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun( at at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( at org.eclipse.jetty.util.thread.QueuedThreadPool$ at Caused by: Read timed out at Method) at at at at at at at at at at at at mesosphere.marathon.api.JavaUrlConnectionRequestForwarder.copyConnectionResponse$1(LeaderProxyFilter.scala:245) at mesosphere.marathon.api.JavaUrlConnectionRequestForwarder.forward(LeaderProxyFilter.scala:282) at mesosphere.marathon.DebugModule$MetricsBehavior$$anonfun$invoke$1.apply(DebugConf.scala:87) at mesosphere.marathon.metrics.Metrics.timed(Metrics.scala:28) at mesosphere.marathon.DebugModule$MetricsBehavior.invoke(DebugConf.scala:86) at mesosphere.marathon.api.LeaderProxyFilter.doFilter(LeaderProxyFilter.scala:114) ... 26 common frames omitted

      So there was a timeout during the proxy operation but the result still showed a response with an "Okay" status code.

      Debugging into Marathon I noticed that only {{ConnectionException}}s are caught in the proxy code where I would suppose every exception should result in a failed response (with different status codes denoting the type of failure).

      This PR is against the releases-1.3 branch, because we still run that minor version and I hope to get a patch release for that minor version.

      The issue is also present in 1.4.x and the current master, so I have also prepared branches for 1.4and master. If needed I can also create pull-requests for those branches.




            • Assignee:
              ken Ken Sipe
              ken Ken Sipe
              Orchestration Team
              Ken Sipe, Pranay Kanwar, Timo Reimann
            • Watchers:
              3 Start watching this issue


              • Created: