Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8064

Marathon 1.4.8 -> 1.5.6 fails with disable_store_cache

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: High
    • Resolution: Done
    • Affects Version/s: Marathon 1.4.8, Marathon 1.5.6
    • Fix Version/s: DC/OS 1.11.3, DC/OS 1.10.8
    • Component/s: Persistence
    • Labels:
      None

      Description

      Edit: launching Marathon with --disable_store_cache leads to the error described below.

      Original Text

      I tried to upgrade marathon from 1.4.8 to fresh new 1.5.6 and got the following error:

       [2018-02-05 15:48:27,783] INFO  Started ServerConnector@7c22e142\{HTTP/1.1,[http/1.1]}\{0.0.0.0:8080} (org.eclipse.jetty.server.ServerConnector:$anon$1 STARTING)
        [2018-02-05 15:48:27,838] INFO  x509=X509@4dcdb371(1,h=[*, marathon],w=[*, *.*, *.*.*]) for SslContextFactory@4570edc5([file:///etc/marathon/conf/marathon.jks,null]) (org.eclipse.jetty.util.ssl.SslContextFactory:$anon$1 STARTING)
        [2018-02-05 15:48:27,886] INFO  Started ServerConnector@86ee37b\{SSL,[ssl, http/1.1]}\{0.0.0.0:8443} (org.eclipse.jetty.server.ServerConnector:$anon$1 STARTING)
        [2018-02-05 15:48:27,887] INFO  Started @4956ms (org.eclipse.jetty.server.Server:$anon$1 STARTING)
        [2018-02-05 15:48:27,890] INFO  All services up and running. (mesosphere.marathon.MarathonApp:JMX exporting thread)
        [2018-02-05 15:48:28,703] ESC[1;31mERROR Fatal error while starting leadership of Some(MarathonSchedulerService [RUNNING]). Exiting now (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-3-thread-1)
        mesosphere.marathon.MigrationFailedException: while migrating storage to major: 1
        minor: 5
        patch: 0
        format: PERSISTENCE_STORE
        at mesosphere.marathon.storage.migration.Migration$$anonfun$applyMigrationSteps$4$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(Migration.scala:97)
        at mesosphere.marathon.storage.migration.Migration$$anonfun$applyMigrationSteps$4$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(Migration.scala:94)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
        at scala.util.Try$.apply(Try.scala:192)
        at scala.util.Failure.recover(Try.scala:216)
        at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
        at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply$mcV$sp(ExecutionContexts.scala:20)
        at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply(ExecutionContexts.scala:20)
        at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply(ExecutionContexts.scala:20)
        at mesosphere.marathon.core.async.package$.propagateContext(package.scala:15)
        at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2.run(ExecutionContexts.scala:20)
        at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
        Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at mesosphere.marathon.core.storage.store.impl.zk.ZkFuture.processResult(ZkFuture.scala:30)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522)
        at org.apache.curator.framework.imps.DeleteBuilderImpl$2.processResult(DeleteBuilderImpl.java:166)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:634)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
        [2018-02-05 15:48:28,703] INFO  Stopping the election service (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-3-thread-1)
        [2018-02-05 15:48:28,705] INFO  backgroundOperationsLoop exiting (org.apache.curator.framework.imps.CuratorFrameworkImpl:Curator-Framework-0)
      

      /etc/marathon/conf/application.ini content:

      --master=zk://10.57.20.49:2181,10.57.20.51:2181,10.57.20.54:2181/mesos
      --zk=zk://10.57.20.49:2181,10.57.20.51:2181,10.57.20.54:2181/marathon
      
      --local_port_max=65535
      --local_port_min=10000
      
      --failover_timeout=604800
      
      --offer_matching_timeout=5000
      --decline_offer_duration=5000
      
      --max_instances_per_offer=10
      
      --launch_tokens=1000
      --launch_token_refresh_interval=10000
      
      --reconciliation_initial_delay=10000
      --reconciliation_interval=300000
      
      --task_tracker_request_timeout=1800000
      --task_update_request_timeout=600000
      
      --launch_queue_request_timeout=600000
      --task_operation_notification_timeout=600000
      
      --zk_session_timeout=60000
      --zk_timeout=600000
      
      --on_elected_prepare_timeout=300000
      --max_actor_startup_time=120000
      
      
      --max_parallel_status_updates=1000
      --max_queued_status_updates=100000
      
      --max_queued_root_group_updates=1000
      --group_manager_request_timeout=600000
      
      --kill_chunk_size=1000
      --kill_retry_timeout=1000
      
      --leader_proxy_connection_timeout=180000
      --leader_proxy_read_timeout=180000
      
      --scale_apps_initial_delay=100
      --scale_apps_interval=1000
      
      --ssl_keystore_password=marathon
      --ssl_keystore_path=/etc/marathon/conf/marathon.jks
      
      --task_launch_confirm_timeout=60000
      --task_launch_timeout=600000
      
      --default_network_name=bridge
      
      --logging_level=info
      
      --enable_features=vips,task_killing,external_volumes,secrets
      
      --reporter_datadog=udp://127.0.0.1:8125?prefix=marathon&interval=10
      
      --plugin_conf=/etc/marathon/conf/plugin-conf.json
      --plugin_dir=/etc/marathon/plugins
      
      --ha
      --leader_proxy_ssl_ignore_hostname
      --disable_store_cache
      

      I haven't found a solution yet.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ivanchernetsky Ivan Chernetsky
                Reporter:
                paveltimofeev Pavel Timofeev
                Team:
                Orchestration Team
                Watchers:
                Karsten Jeschkies, Matthias Eichstedt, Pavel Timofeev
              • Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: