Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-2725

Possible race condition on simultaneous application launch

    Details

    • Type: Task
    • Status: Resolved
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:

      Description

      Hi Folks!

      I'm running into a problem that appears at first glance to be a race condition on application launch.

      If I launch multiple (in my case, 4) apps at the same time by POSTing to /v2/apps, I notice two things:
      1. They are all four assigned the same (formerly available) port (e.g., port 10000)
      2. I can only delete one of them via the UI or the REST API; the others have to be scaled down to zero before I can successfully delete them.

      The outcome of the first symptom is sort of humorous, actually... the haproxy bridge configures port 10000 (in this example) against all four worker tasks, so that as I refresh, the proxy switches between the different apps. In the short term, I'm calling this RAaaS (Random Application as a Service)

      I modified my launcher to protect app creation, with a 500ms delay between them. The symptoms that I'm seeing go away.

      I know I haven't given any actionable details: logs, payloads, etc, but I wanted to throw this out and see what folks think. If it seems like it may be an issue in Marathon and/or Mesos, let me know and I'll try to come up with a bash script or something to reproduce it.

      I'm running on Marathon 0.7.6, as built by Mesosphere; against Mesos 0.21.1, also built by Mesosphere.

      Many thanks,
      --cb

        Attachments

          Activity

            People

            • Assignee:
              GitHub_kolloch Peter Kolloch (Inactive)
              Reporter:
              cgbaker Chris Baker
              Team:
              Orchestration Team
              Watchers:
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: