Details

    • Type: Bug
    • Status: Resolved
    • Priority: High
    • Resolution: Done
    • Affects Version/s: Marathon 1.4.7, Marathon 1.5.5
    • Fix Version/s: None
    • Component/s: Storage Volumes
    • Labels:

      Description

      The issue surfaced when executing the following test on Marathon on Marathon 1.4.7:

      def test_restart_container_with_persistent_volume():
          """A task with a persistent volume, which writes to a file in the persistent volume, is launched.
             The app is killed and restarted and we can still read from the persistent volume what was written to it.
          """
      
          app_def = apps.persistent_volume_app()
          app_id = app_def['id']
      
          client = marathon.create_client()
          client.add_app(app_def)
      
          shakedown.deployment_wait()
      
          tasks = client.get_tasks(app_id)
          assert len(tasks) == 1, "The number of tasks is {} after deployment, but 1 was expected".format(len(tasks))
      
          port = tasks[0]['ports'][0]
          host = tasks[0]['host']
          cmd = "curl {}:{}/data/foo".format(host, port)
          run, data = shakedown.run_command_on_master(cmd)
      
          assert run, "{} did not succeed".format(cmd)
          assert data == 'hello\n', "'{}' was not equal to hello\\n".format(data)
      
          client.restart_app(app_id)
          shakedown.deployment_wait()
      
          @retrying.retry(wait_fixed=1000, stop_max_attempt_number=30, retry_on_exception=common.ignore_exception)
          def check_task_recovery():
              tasks = client.get_tasks(app_id)
              assert len(tasks) == 1, "The number of tasks is {} after recovery, but 1 was expected".format(len(tasks))
      
          check_task_recovery()
      
          port = tasks[0]['ports'][0]
          host = tasks[0]['host']
          cmd = "curl {}:{}/data/foo".format(host, port)
          run, data = shakedown.run_command_on_master(cmd)
      
          assert run, "{} did not succeed".format(cmd)
          assert data == 'hello\nhello\n', "'{}' was not equal to hello\\nhello\\n".format(data)
      

      It failed with the following message:

          - test_marathon_on_marathon.py 
            - test_restart_container_with_persistent_volume ✕ 
      
      >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
      E   AssertionError: '' was not equal to hello\nhello\n
      

      Here is what happened:

      1. The app with a persistent volume started and wrote hello\n into foo file.
      2. The content of the file was read using curl.
      3. The app got restarted and appended another hello\n into the same file.
      4. This time it failed to read the file content back using curl.
      5. After a few seconds I was able to read two {{hello\n}}s back manually, meaning that two appends were successful.

      The app ID in the attached logs is persistent-volume-app-9c5695ddb1b245f1a069b14c9c60e0fb.

      Please note that this issue is probably not fixed on master too. It is flaky, and it fails one time out of a couple of hundred runs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ivanchernetsky Ivan Chernetsky
                Reporter:
                ivanchernetsky Ivan Chernetsky
                Team:
                Orchestration Team
                Watchers:
                Ken Sipe
              • Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: