Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-1524

`dcos-diagnostics --diag` returns false positives during DC/OS install.



      During the debug of https://jira.mesosphere.com/browse/DCOS_OSS-1467 that `dcos-diagnostics --diag` can return zero exit status even though pkgpanda has not finished installing all the units. My guess is that the order the packages are decompressed/installed creates a small window where part of the units are green and `dcos-diagnostics` returns `0` even though not all units have finished installing.

      I think that the root cause for that is the way `dcos-diagnostics` determines the list of units running on DC/OS:


      This function just fetches all the units from disk instead of checking against hardcoded/well know list of units. Pkgpanda quite possibly adds units to this folder incrementally, thus it is posssible, depending on the order of adding units to the folder and the time when `dcos-diagnostics` is run, to get a list of units which has already finished starting/are green even though the pkgpanda has not finished installing all the units. The solution could be listing all required units in a config file and requiring all of them to go green before returning `0` status.

      Detailed debug log can be found in DCOS_OSS-1467. Please drop me a line if something is unclear.


          Issue Links



              • Assignee:
                karl Karl Isenberg (Inactive)
                prozlach Pawel Rozlach
                Cluster Ops Team
                Artem Harutyunyan, Gustav Paul, Jan-Philip Gehrcke, Karl Isenberg (Inactive), Kevin Klues (Inactive), Maksym Naboka (Inactive), Pawel Rozlach
              • Watchers:
                7 Start watching this issue


                • Created: