Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-3445

Cluster of 3, API responses are always empty on 2 of them

    Details

    • Type: Task
    • Status: Resolved
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Hi,

      I have a mesos+marathon cluster of 3 nodes, and /v2/* is always empty/blank on 2 of the 3.

      If I bounce the service on the working node then one of the previously blank nodes starts returning data (and the bounced node returns blank api responses when it comes back online).

      I am running v0.8.2. I also tested with the latest master (0.9.x) and [surprisingly] got the same result.

      I just rebuilt the entire cluster from scratch and everything went smooth and squeaky clean, yet I am still hitting this issue.

      Do you have any suggestions or recommendations re: how to triage this?

       bash
      curl -v http://mesos-primary1a:8080/v2/info
      * Adding handle: conn: 0x7fd120804000
      * Adding handle: send: 0
      * Adding handle: recv: 0
      * Curl_addHandleToPipeline: length: 1
      * - Conn 0 (0x7fd120804000) send_pipe: 1, recv_pipe: 0
      * About to connect() to mesos-primary1a port 8080 (#0)
      *   Trying 192.168.225.83...
      * Connected to mesos-primary1a (192.168.225.83) port 8080 (#0)
      > GET /v2/info HTTP/1.1
      > User-Agent: curl/7.30.0
      > Host: mesos-primary1a:8080
      > Accept: */*
      >
      < HTTP/1.1 200 OK
      < Content-Length: 0
      * Server Jetty(8.y.z-SNAPSHOT) is not blacklisted
      < Server: Jetty(8.y.z-SNAPSHOT)
      <
      * Connection #0 to host mesos-primary1a left intact
      
       bash
      curl -v http://mesos-primary2a:8080/v2/info
      * Adding handle: conn: 0x7f9cb0804000
      * Adding handle: send: 0
      * Adding handle: recv: 0
      * Curl_addHandleToPipeline: length: 1
      * - Conn 0 (0x7f9cb0804000) send_pipe: 1, recv_pipe: 0
      * About to connect() to mesos-primary2a port 8080 (#0)
      *   Trying 192.168.225.95...
      * Connected to mesos-primary2a (192.168.225.95) port 8080 (#0)
      > GET /v2/info HTTP/1.1
      > User-Agent: curl/7.30.0
      > Host: mesos-primary2a:8080
      > Accept: */*
      >
      < HTTP/1.1 200 OK
      < Content-Length: 0
      * Server Jetty(8.y.z-SNAPSHOT) is not blacklisted
      < Server: Jetty(8.y.z-SNAPSHOT)
      <
      * Connection #0 to host mesos-primary2a left intact
      
       bash
      curl -v http://mesos-primary3a:8080/v2/info
      * Adding handle: conn: 0x7fc643004000
      * Adding handle: send: 0
      * Adding handle: recv: 0
      * Curl_addHandleToPipeline: length: 1
      * - Conn 0 (0x7fc643004000) send_pipe: 1, recv_pipe: 0
      * About to connect() to mesos-primary3a port 8080 (#0)
      *   Trying 192.168.225.96...
      * Connected to mesos-primary3a (192.168.225.96) port 8080 (#0)
      > GET /v2/info HTTP/1.1
      > User-Agent: curl/7.30.0
      > Host: mesos-primary3a:8080
      > Accept: */*
      >
      < HTTP/1.1 200 OK
      < Cache-Control: no-cache, no-store, must-revalidate
      < Pragma: no-cache
      < Expires: 0
      < Content-Type: application/json
      < Transfer-Encoding: chunked
      * Server Jetty(8.y.z-SNAPSHOT) is not blacklisted
      < Server: Jetty(8.y.z-SNAPSHOT)
      <
      * Connection #0 to host mesos-primary3a left intact
      {  
         "name": "marathon",
         "http_config": {  
            "assets_path": null,
            "http_port": 8080,
            "https_port": 8443
         },
         "frameworkId": "20150617-224933-1625401536-5050-19960-0000",
         "leader": "mesos-primary3a:8080",
         "event_subscriber": {  
            "type": "http_callback",
            "http_endpoints": [  
               "http://mesos-primary1a:8000/v1/eventBus,http://mesos-primary2a:8000/v1/eventBus,http://mesos-primary3a:8000/v1/eventBus"
            ]
         },
         "marathon_config": {  
            "local_port_max": 20000,
            "local_port_min": 10000,
            "hostname": "mesos-primary3a",
            "master": "zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos",
            "reconciliation_interval": 300000,
            "mesos_role": null,
            "task_launch_timeout": 300000,
            "reconciliation_initial_delay": 15000,
            "ha": true,
            "failover_timeout": 604800,
            "checkpoint": true,
            "webui_url": null,
            "executor": "//cmd",
            "marathon_store_timeout": 2000,
            "mesos_user": "root"
         },
         "version": "0.8.2",
         "zookeeper_config": {  
            "zk_path": "/marathon",
            "zk": "zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/marathon",
            "zk_timeout": 10,
            "zk_hosts": "mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181",
            "zk_future_timeout": {  
               "duration": 10
            }
         },
         "elected": true
      }
      

      Additional note:

      I sure wish there was a basic marathon.log file in /var/log on each of the servers! (like mesos)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              GitHub_jaytaylor Jay Taylor (Inactive)
              Team:
              Orchestration Team
              Watchers:
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: