Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-3837

Marathon still sending event callbacks to deleted subscribers

    Details

    • Type: Task
    • Status: Resolved
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None

      Description

      This morning I noticed that I had some stale hosts registered in eventSubscriptions, so I DELETEd them:

       sh
      
      stale_urls=(
        http://ec2-54-165-125-93.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ec2-54-165-159-191.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ec2-54-165-161-54.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ec2-54-172-158-186.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ec2-54-172-44-67.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ec2-54-86-189-7.compute-1.amazonaws.com:8000/api/marathon/event_callback
        http://ip-172-21-46-66.ec2.internal:8000/api/marathon/event_callback
      )
      
      for URL in "${stale_urls[@]}"; do
        http DELETE "http://my_marathon_frontend/v2/eventSubscriptions?callbackUrl=${URL}"
      done
      

      Output snippet:

      HTTP/1.1 200 OK
      Connection: keep-alive
      Content-Length: 197
      Content-Type: application/json
      Server: Jetty(8.y.z-SNAPSHOT)
      
      {
          "callbackUrl": "http://ec2-54-165-161-54.compute-1.amazonaws.com:8000/api/marathon/event_callback",
          "clientIp": "172.21.71.160",
          "eventType": "unsubscribe_event",
          "timestamp": "2014-11-10T23:06:00.707Z"
      }
      # etc
      

      After unsubscribing the stale URLs, they were gone from the list:

       sh
      $ curl http://my_marathon_frontend/v2/eventSubscriptions
      {
        "callbackUrls": [
          "http://ec2-54-173-43-83.compute-1.amazonaws.com:8000/api/marathon/event_callback",
          "http://ec2-54-173-62-16.compute-1.amazonaws.com:8000/api/marathon/event_callback",
          "http://ec2-54-88-19-137.compute-1.amazonaws.com:8000/api/marathon/event_callback"
        ]
      }
      

      But in syslog 15 and 30 minutes later, marathon was still printing errors about
      trying to connect to the dead hosts:

      Nov 10 23:11:55 ec2-54-173-62-16 marathon[24838]: [WARN] [11/10/2014 23:11:55.334] [marathon-akka.actor.default-dispatcher-10] [akka://marathon/user/IO-HTTP/host-connector-11/1] Connection attempt to ip-172-21-46-66.ec2.internal:8000 failed in response to POST request to /api/marathon/event_callback with no retries left, dispatching error...
      Nov 10 23:11:55 ec2-54-173-62-16 marathon[24838]: [WARN] [11/10/2014 23:11:55.334] [marathon-akka.actor.default-dispatcher-12] [akka://marathon/user/IO-HTTP/host-connector-11/0] Connection attempt to ip-172-21-46-66.ec2.internal:8000 failed in response to POST request to /api/marathon/event_callback with no retries left, dispatching error...
      Nov 10 23:11:55 ec2-54-173-62-16 marathon[24838]: [WARN] [11/10/2014 23:11:55.334] [marathon-akka.actor.default-dispatcher-8] [akka://marathon/user/IO-HTTP/host-connector-11/2] Connection attempt to ip-172-21-46-66.ec2.internal:8000 failed in response to POST request to /api/marathon/event_callback with no retries left, dispatching error...
      Nov 10 23:11:55 ec2-54-173-62-16 marathon[24838]: [WARN] [11/10/2014 23:11:55.335] [marathon-akka.actor.default-dispatcher-8] [akka://marathon/user/IO-HTTP/host-connector-11/3] Connection attempt to ip-172-21-46-66.ec2.internal:8000 failed in response to POST request to /api/marathon/event_callback with no retries left, dispatching error...
      
      # <snip>
      
      Nov 10 23:28:11 ec2-54-173-62-16 marathon[24838]: [WARN] [11/10/2014 23:28:11.455] [marathon-akka.actor.default-dispatcher-10] [akka://marathon/user/IO-HTTP/host-connector-12/3] Connection attempt to ec2-54-86-189-7.compute-1.amazonaws.com:8000 failed in response to POST request to /api/marathon/event_callback with 5 retries left, retrying...
      
      # etc.
      

      I'm running marathon 0.7.3 in HA mode. There is a haproxy instance in front of
      marathon's http interface, so sequential http requests may not be sent to the
      same marathon host - though the non-master instances appear to successfully
      proxy those requests to the master.

      Marathon commandline:

       sh
      java -Xmx512m \
        -Djava.library.path=/usr/local/lib \
        -Djava.util.logging.SimpleFormatter.format=%2$s%5$s%6$s%n \
        -cp /usr/local/sbin/marathon mesosphere.marathon.Main \
        --event_subscriber http_callback \
        --http_port 8080 \
        --checkpoint \
        --task_launch_timeout 180000 \
        --zk zk://<zookeeper_ips:2181>/marathon \
        --master zk://<zookeeper_ips:2181>/mesos
      

      After restarting the master marathon instance, one of the standby instances
      became master and the stale notifications appear to have stopped.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              GitHub_benley Benjamin Staffin (Inactive)
              Team:
              Orchestration Team
              Watchers:
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: