Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-8015

Validation error on applications configured with args on v1.5.0 persistence store migration

    Details

      Description

      We recently tried to upgrade our Marathon (v1.4.3 +2 custom patches) to a vanilla v1.5.1, and ran into a persistence_store migration error (the v1.5.0 one). Trying to narrowing it down (and testing with a lot of vanilla versions between v1.4.3 and v1.5.5), I found the following:

      • the migration fails on app definitions with args but no cmd nor container
      • according to the error prompted (AppDefinition must either contain one of 'cmd' or 'args', and/or a 'container') and the dump of the object, I can see that somehow the first element of args is present in cmd, thus provoking the validation error
      • I was able to reproduce this with a simple app definition with only args starting from any v1.4.x migrating to v1.5.x
      • the persistence_store v1.4.6 migration is ok all the time
      • so far up to v1.5.5 no version solves this issue
      • the app definition provoking the migration bug works perfectly when applied to a v1.5.x Marathon, so only the persistence_store migration step seems impacted

      Here is an example app which generates the validation error:

      {
      "id": "/test-bug",
      "args": ["sleep", "3600"],
      "cpus": 1,
      "disk": 0,
      "mem": 128,
      "instances": 0
      }

      Since the error happens during migration, we just need the configuration to be there, hence the instance number of 0.

      And the log of the migration error (the validation error and the app state object is at the end):

      [2018-01-12 15:08:48,414] ERROR Fatal error while starting leadership of Some(MarathonSchedulerService [RUNNING]). Exiting now (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-3-thread-1)
      mesosphere.marathon.MigrationFailedException: while migrating storage to major: 1
      minor: 5
      patch: 0
      format: PERSISTENCE_STORE
      
      at mesosphere.marathon.storage.migration.Migration$$anonfun$applyMigrationSteps$4$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(Migration.scala:90)
      at mesosphere.marathon.storage.migration.Migration$$anonfun$applyMigrationSteps$4$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(Migration.scala:88)
      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
      at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
      at scala.util.Try$.apply(Try.scala:192)
      at scala.util.Failure.recover(Try.scala:216)
      at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
      at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
      at scala.concurrent.impl.CallbackRunnable.run_aroundBody0(Promise.scala:36)
      at scala.concurrent.impl.CallbackRunnable$AjcClosure1.run(Promise.scala:1)
      at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
      at kamon.scala.instrumentation.FutureInstrumentation$$anonfun$aroundExecution$1.apply(FutureInstrumentation.scala:45)
      at kamon.trace.Tracer$.withContext(TracerModule.scala:58)
      at kamon.scala.instrumentation.FutureInstrumentation.aroundExecution(FutureInstrumentation.scala:44)
      at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:35)
      at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply$mcV$sp(ExecutionContexts.scala:20)
      at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply(ExecutionContexts.scala:20)
      at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2$$anonfun$run$1.apply(ExecutionContexts.scala:20)
      at mesosphere.marathon.core.async.package$.propagateContext(package.scala:15)
      at mesosphere.marathon.core.async.ContextPropagatingExecutionContext$$anon$1$$anon$2.run(ExecutionContexts.scala:20)
      at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
      at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
      at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
      at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      Caused by: mesosphere.marathon.ValidationFailedException: Validation failed: Failure(Set(RuleViolation(App(/test-bug,None,Vector(sleep, 3600),1.15,1,Some(sleep),Set(),None,1.0,Set(),0.0,Map(),,List(),Set(),0,Map(),3600,128.0,0,None,List(),None,Some(Vector
      (PortDefinition(10000,Map(),Some(default),tcp))),List(),None,false,Map(),None,Some(UpgradeStrategy(1.0,1.0)),None,None,Some(2018-01-12T15:04:25.697Z),Some(VersionInfo(2018-01-12T15:04:25.697Z,2018-01-12T15:04:25.697Z)),YOUNGEST_FIRST,Some(UnreachableEnabl
      ed(300,600)),None),AppDefinition must either contain one of 'cmd' or 'args', and/or a 'container'.,Some(value))))

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kjeschkies Karsten Jeschkies
                Reporter:
                komuta Julien Pepy
                Team:
                Orchestration Team
                Watchers:
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: