Details

    • Sprint:
      Marathon Sprint 1.10-5, Marathon Sprint 1.10-7, Marathon Sprint 1.10-8, Marathon Sprint 1.10-9, Marathon Sprint 1.10-10
    • Customer Issue Status:
      The fixed has been landed onto both master and releases/1.4 branches. Please re-open, if the issue comes back.

      Description

      User ndigati from #shared-marathon:

      Has anyone seen an issue in Marathon v1.4.2 where after a new leader takes over some apps are removed from the Marathon UI and Mesos? I saw this older issue https://jira.mesosphere.com/browse/MARATHON-1773, which is very similar to what we are seeing but it’s not during a migration.
      
      We upgraded from 1.3.10 -> 1.4.1
      
      Ya I can get you logs from when the new leader took over and started removing apps.
      
      Also it’s weird because after we relaunched the removed apps they still have their configuration history (the older versions on the bottom of the configuration tab in the Marathon UI)
      
      The history for the relaunched apps seems complete
      
      Also it seems like this happened twice over the weekend (once on friday and once on saturday) both times with a leader change
      
      Some apps got removed both times, and some were only removed once
      

      Some logs provided:

      INFO  Leader elected (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-4-thread-1)
      INFO  As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$133bd225:pool-4-thread-1)
      INFO  Initiating client connection, connectString=<marathon_1>:2181,<marathon_2>:2181,<marathon_3>:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@608c97f2 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-29)
      INFO  Opening socket connection to server <marathon_3>/xx.xx.xxx.xx:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-29-SendThread(ops120:2181))
      INFO  Socket connection established to <marathon_3>/xx.xx.xxx.xx:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-29-SendThread(ops120:2181))
      INFO  Session establishment complete on server <marathon_3>/xx.xx.xxx.xx:2181, sessionid = 0x35bc6bf3d85008d, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-29-SendThread(ops120:2181))
      INFO  No migration necessary, already at the current version (mesosphere.marathon.storage.migration.Migration:ForkJoinPool-2-worker-71)
      INFO  Session: 0x35bc6bf3d85008d closed (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-71)
      INFO  EventThread shut down (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-29-EventThread)
      INFO  Migration successfully applied for version Version(1, 4, 3, LEGACY) (mesosphere.marathon.storage.migration.Migration:pool-4-thread-1)
      
      ...
      
      ERROR Failed to load /production/app_name:2017-05-18T20:09:28.444Z for group /production (2017-05-18T21:25:45.996Z) (mesosphere.marathon.storage.repository.StoredGroup:ForkJoinPool-2-worker-11)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adukhovniy Aleksey Dukhovniy
                Reporter:
                ivanchernetsky Ivan Chernetsky
                Team:
                Orchestration Team
                Watchers:
                Aleksey Dukhovniy, daltonmatos, fengyehong, Ivan Chernetsky, Johannes Unterstein, Kyle Anderson, Marco Monaco, Mateusz Moneta, Matthias Eichstedt, Michał Łowicki, ndigati, Richard Boyer, Rob Johnson, samart, Tim Harper
              • Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: