Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-3804

dcos-checks-poststart: run as root

    Details

      Description

      The output of the dcos-checks-poststart.service does not get formatted properly in the journal. It looks like most of the output gets sent to the journal but isn't associated with the systemd unit.

      For example, if I run `systemctl start dcos-checks-poststart.service` we see the output in the main journal but not in the journal for that specific dc/os unit:

      # Nothing in the unit's logs:
      $ journalctl -u dcos-checks-poststart.service
      Jul 13 16:30:04 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 systemd[1]: Started DC/OS Poststart Checks: Run node-poststart checks.
      Jul 13 16:30:04 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 systemd[1]: Starting DC/OS Poststart Checks: Run node-poststart checks...
      Jul 13 16:30:12 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 systemd[1]: Started DC/OS Poststart Checks: Run node-poststart checks.
      Jul 13 16:30:12 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 systemd[1]: Starting DC/OS Poststart Checks: Run node-poststart checks...
      
      # But in the main journal we find
      $ journalctl
      ...
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0,
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "checks": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "components_master": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "curl": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "exhibitor_majority_expected_masters_connected": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "exhibitor_total_expected_masters_connected": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "ifconfig": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "ip_detect_script": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "172.17.0.2 is a valid IPV4 address\n",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "journald_dir_permissions": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "directory /run/log/journal has the group owner `systemd-journal` and group permissons r-x\n",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "mesos_master_replog_synchronized": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "tar": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "unzip": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "xz": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: },
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "zookeeper_serving": {
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "output": "serving\n",
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: "status": 0
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: }
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: }
      Jul 13 16:31:01 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[18389]: }
      ...
      

      This is really bad as it makes it impossible to figure out what broke when.

      On a hunch I tested what happens when we run the unit as the `root` user and it fixed the logs issue! We now see...

      $ journalctl -u dcos-checks-poststart.service
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 systemd[1]: Starting DC/OS Poststart Checks: Run node-poststart checks...
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0,
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "checks": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "components_master": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "curl": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "exhibitor_majority_expected_masters_connected": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "exhibitor_total_expected_masters_connected": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "ifconfig": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "ip_detect_script": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "172.17.0.2 is a valid IPV4 address\n",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "journald_dir_permissions": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "directory /run/log/journal has the group owner `systemd-journal` and group permissons r-x\n",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "mesos_master_replog_synchronized": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "tar": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "unzip": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "xz": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: },
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "zookeeper_serving": {
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "output": "serving\n",
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: "status": 0
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: }
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: }
      Jul 13 16:33:03 dcos-e2e-e2f6c624-32b7-423b-8ee4-7aaa4cee088e-master-0 dcos-check-runner[19322]: }
      

      Not only does this work around what appears to be a weird systemd bug, it also makes way more sense. The dcos-checks simulate the operator, Dan, periodically checking the health of his cluster. That implies the checks should assume Dan's permissions, which implies root access. Also, it doesn't serve requests or perform any user interaction at all - there is no reason not to run the checks as root.

        Attachments

          Activity

            People

            • Assignee:
              gpaul Gustav Paul
              Reporter:
              gpaul Gustav Paul
              Team:
              Security Team
              Watchers:
              Branden Rolston, Gustav Paul, Tim Weidner
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: