Skip to content

[Bug]: macOS — LoggedInUser() silently falls through to root, masking missing-user metadata in telemetry #63

@swarit-stepsecurity

Description

@swarit-stepsecurity

Summary

On macOS, when the agent runs as root (LaunchDaemon context) and there is no real GUI console user, executor.Real.LoggedInUser() silently falls back to user.Current(), which under root returns the root user with err == nil. device.getDeveloperIdentity() then ships user_identity = "root", and telemetry.Payload.NoUserLoggedIn is incorrectly set to false because "root" matches neither "" nor "unknown".

This is reproducible end-to-end with a self-contained Go program that mirrors the production code (matrix below).

Version

stepsecurity-dev-machine-guard v1.11.0 (darwin/arm64). Behavior is unchanged from at least v1.10.x — the LoggedInUser() fallback chain has been the same since it was introduced.

OS

macOS (any version) — observed/reasoned across:

  • Bare-metal Macs that boot to loginwindow with no auto-login
  • Macs in fast-user-switch / FileVault pre-unlock states
  • Headless Mac mini deployments
  • Virtualized macOS guests (Tart, VirtualBuddy, MacStadium Orka v2, Anka, Apple Virtualization.framework guests, GitHub-hosted M1 runners, Cirrus, etc.) — these tend to compound the issue with UUID-format IOPlatformSerialNumber and templated hostnames, producing rows that look like "no metadata at all"

Root cause walkthrough

  1. device.getDeveloperIdentity()internal/device/device.go

    • Tries env vars USER_EMAIL / DEVELOPER_EMAIL / STEPSEC_DEVELOPER_EMAIL first.
    • LaunchDaemons do not inherit env vars from any user shell — the plist would need an explicit EnvironmentVariables dict, which internal/launchd/launchd.go does not populate. So this arm is dead code for the daemon path in practice.
    • Falls back to exec.LoggedInUser().
  2. executor.Real.LoggedInUser()internal/executor/executor.go (the macOS+root branch):

    stdout, _, _, err := r.Run(ctx, "stat", "-f%Su", "/dev/console")
    if err != nil { return r.CurrentUser() }                                // (a)
    if username == "" || username == "root" || username == "_windowserver" {
        return r.CurrentUser()                                              // (b)
    }
    u, err := user.Lookup(username)
    if err != nil { return r.CurrentUser() }                                // (c)
    return u, nil

    All three failure branches (a), (b), (c) collapse onto r.CurrentUser(), which under a LaunchDaemon succeeds and returns the root user with err == nil.

  3. getDeveloperIdentity() sees err == nil and returns u.Username, i.e. "root".

  4. telemetry.Run()internal/telemetry/telemetry.go:

    NoUserLoggedIn: dev.UserIdentity == "" || dev.UserIdentity == "unknown",

    "root" matches neither, so no_user_logged_in: false. The log.Warn("user identity could not be determined …") line just above also doesn't fire.

Net effect: a real "no user" state is shipped to the backend as "the user is root", and the dashboard has no way to distinguish it from a Mac where someone genuinely ran something as root.

Reproduction

Reproduced on a bare-metal M1 Mac mini running macOS 26.4.1, with a self-contained Go program (~80 lines, no repo dependencies) that mirrors the production logic from executor.go and device.go. The program takes one knob: a forced value to substitute for what stat -f%Su /dev/console returns, so we can drive every branch without having to log out the GUI session on a shared host. The program calls Go's real user.Current() and user.Lookup() against the real OS — only the stat source is substituted.

Reproducer (repro.go)

package main

import (
    "context"
    "flag"
    "fmt"
    "os"
    "os/exec"
    "os/user"
    "runtime"
    "strings"
)

// Mirrors internal/executor/executor.go Real.LoggedInUser().
func loggedInUser(forceConsole string) (*user.User, error) {
    isRoot := os.Getuid() == 0
    if runtime.GOOS != "darwin" || !isRoot {
        return user.Current()
    }
    var username string
    if forceConsole == "<<real>>" {
        out, err := exec.CommandContext(context.Background(), "stat", "-f%Su", "/dev/console").Output()
        if err != nil { return user.Current() }
        username = strings.TrimSpace(string(out))
    } else {
        username = forceConsole
    }
    if username == "" || username == "root" || username == "_windowserver" {
        return user.Current()
    }
    u, err := user.Lookup(username)
    if err != nil { return user.Current() }
    return u, nil
}

// Mirrors internal/device/device.go getDeveloperIdentity().
func getDeveloperIdentity(forceConsole string) string {
    for _, key := range []string{"USER_EMAIL", "DEVELOPER_EMAIL", "STEPSEC_DEVELOPER_EMAIL"} {
        if v := os.Getenv(key); v != "" { return v }
    }
    u, err := loggedInUser(forceConsole)
    if err == nil { return u.Username }
    return "unknown"
}

func main() {
    flag.Parse()
    fmt.Printf("euid=%d  $USER=%q  GOOS=%s\n", os.Geteuid(), os.Getenv("USER"), runtime.GOOS)
    cur, _ := user.Current()
    fmt.Printf("user.Current() → %q\n\n", cur.Username)

    cases := []struct{ label, force string }{
        {"REAL stat output (current host state)", "<<real>>"},
        {"FORCED \"_windowserver\" (loginwindow / no GUI user)", "_windowserver"},
        {"FORCED \"root\"           (console owned by root)", "root"},
        {"FORCED \"\"              (stat command error)", ""},
        {"FORCED \"nosuch_user\"    (user.Lookup failure)", "nosuch_user"},
    }
    for _, c := range cases {
        identity := getDeveloperIdentity(c.force)
        noUser := identity == "" || identity == "unknown"
        fmt.Printf("%s\n  getDeveloperIdentity()    → %q\n  payload.no_user_logged_in → %v\n",
            c.label, identity, noUser)
        if identity == "root" && os.Geteuid() == 0 {
            fmt.Printf("  >>> BUG: ships user_identity=\"root\" with no_user_logged_in=false\n")
        }
        fmt.Println()
    }
}

Build: GOOS=darwin GOARCH=arm64 go build -o repro .
Run: ./repro (as user) and sudo ./repro (as root, mimicking LaunchDaemon).

Reproduction matrix (actual program output)

On a Mac where a real user (devuser) is logged in at the console:

As normal user (non-root) — short-circuits at top of LoggedInUser:

Forced stat /dev/console getDeveloperIdentity() no_user_logged_in
real (devuser) "devuser" false
_windowserver "devuser" false
root "devuser" false
"" "devuser" false
"nosuch_user" "devuser" false

As root (sudo) — the LaunchDaemon path:

Forced stat /dev/console getDeveloperIdentity() no_user_logged_in Status
real (devuser) "devuser" false ✓ correct
_windowserver "root" false BUG
root "root" false BUG
"" (stat error) "root" false BUG
"nosuch_user" (Lookup fail) "root" false BUG

Four out of five branches under root collapse to the wrong answer. The "real-user logged in" case is the only one that produces a correct payload — which is precisely why this hasn't surfaced as a hard failure for typical interactive desktop Macs, but reliably bites headless / loginwindow / virtualized fleets.

Affected scenarios

All macOS-only:

  • Loginwindow / no auto-login — daemon timer fires while the Mac is at the login screen; /dev/console is owned by _windowserver or root. Common on corporate Macs where users log out at end of day and the timer fires overnight.
  • FileVault pre-unlock — daemon fires before any user has unlocked the disk session.
  • Headless Mac mini / SSH-only deployments — no console user ever exists.
  • Virtualized / CI Mac runners (Tart, VirtualBuddy, MacStadium Orka v2, Anka, Cirrus, GitHub-hosted M1 runners, Codemagic). These pile on with UUID-format IOPlatformSerialNumber and templated hostnames, producing telemetry rows where serial_number is a UUID, hostname is fleet-templated, and user_identity is root — three degraded fields at once, which is what gets reported as "no metadata at all" from the field.

Suggested fixes

(1) and (2) are tiny and self-contained, and would on their own fix the most common observed failure mode.

  1. Stop masking "no console user" in LoggedInUser(). When the macOS+root branch can't determine a real console user, return an error rather than falling back to CurrentUser(). getDeveloperIdentity() already handles errors correctly — it falls through to "unknown", which the existing NoUserLoggedIn logic flags correctly.

    // internal/executor/executor.go
    if username == "" || username == "root" || username == "_windowserver" {
        return nil, errors.New("no console user")
    }
    if u, err := user.Lookup(username); err == nil {
        return u, nil
    }
    return nil, errors.New("console user lookup failed")
  2. (Defense-in-depth) Treat a daemon-context "root" as "no user" in the payload.

    // internal/telemetry/telemetry.go
    NoUserLoggedIn: dev.UserIdentity == "" || dev.UserIdentity == "unknown" ||
                    (dev.UserIdentity == "root" && exec.IsRoot()),

    Catches any other code path that ends up with UserIdentity == "root" from a daemon — useful even after fix adding a demo gif #1 in case a future code path regresses.

  3. Populate EnvironmentVariables in the LaunchDaemon plist at install time so the env-var path in getDeveloperIdentity() actually has a chance of working on macOS. Easiest: snapshot the console user's relevant env (launchctl asuser <uid> /usr/bin/env | grep -E '^(USER_EMAIL|DEVELOPER_EMAIL|STEPSEC_DEVELOPER_EMAIL)=') and bake it into the plist's EnvironmentVariables dict.

  4. Don't silently swallow Hostname() errors in device.Gather(). Empty-string hostnames ship through as-is; should at minimum log a warn, and ideally fall back to scutil --get LocalHostName on macOS / /etc/hostname on Linux.

  5. Add a metadata_quality field to the payload — count how many of {hostname, serial_number, os_version, user_identity} are empty/"unknown" so the backend / dashboard can surface fleet-wide degraded rows without per-field special-casing. Currently the per-field log.Warn lines never make it to the backend.

Additional context

  • Linux is unaffected: the systemd --user timer runs as the actual installing user, so LoggedInUser() (which short-circuits to CurrentUser() on non-Darwin) returns the right user. Windows uses a different path. Scope of fix is macOS-only.
  • Related but separate issue (worth tracking independently): on virtualized Macs, IOPlatformSerialNumber is a UUID, which internal/device/device.go passes through verbatim as serial_number since it does no shape validation. Compounding with this issue is what produces the "no metadata at all" reports from CI Mac fleets.
  • Related: [Bug]: Lock contention on Linux install — enable --now timer races with inline post-install telemetry #62 (Linux install lock-race) — different cause, different OS, but both are silent-degradation bugs in install/telemetry that don't surface as hard failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions