You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On macOS, when the agent runs as root (LaunchDaemon context) and there is no real GUI console user, executor.Real.LoggedInUser() silently falls back to user.Current(), which under root returns the root user with err == nil. device.getDeveloperIdentity() then ships user_identity = "root", and telemetry.Payload.NoUserLoggedIn is incorrectly set to false because "root" matches neither "" nor "unknown".
This is reproducible end-to-end with a self-contained Go program that mirrors the production code (matrix below).
Version
stepsecurity-dev-machine-guard v1.11.0 (darwin/arm64). Behavior is unchanged from at least v1.10.x — the LoggedInUser() fallback chain has been the same since it was introduced.
OS
macOS (any version) — observed/reasoned across:
Bare-metal Macs that boot to loginwindow with no auto-login
Macs in fast-user-switch / FileVault pre-unlock states
Headless Mac mini deployments
Virtualized macOS guests (Tart, VirtualBuddy, MacStadium Orka v2, Anka, Apple Virtualization.framework guests, GitHub-hosted M1 runners, Cirrus, etc.) — these tend to compound the issue with UUID-format IOPlatformSerialNumber and templated hostnames, producing rows that look like "no metadata at all"
Tries env vars USER_EMAIL / DEVELOPER_EMAIL / STEPSEC_DEVELOPER_EMAIL first.
LaunchDaemons do not inherit env vars from any user shell — the plist would need an explicit EnvironmentVariables dict, which internal/launchd/launchd.go does not populate. So this arm is dead code for the daemon path in practice.
Falls back to exec.LoggedInUser().
executor.Real.LoggedInUser() — internal/executor/executor.go (the macOS+root branch):
"root" matches neither, so no_user_logged_in: false. The log.Warn("user identity could not be determined …") line just above also doesn't fire.
Net effect: a real "no user" state is shipped to the backend as "the user is root", and the dashboard has no way to distinguish it from a Mac where someone genuinely ran something as root.
Reproduction
Reproduced on a bare-metal M1 Mac mini running macOS 26.4.1, with a self-contained Go program (~80 lines, no repo dependencies) that mirrors the production logic from executor.go and device.go. The program takes one knob: a forced value to substitute for what stat -f%Su /dev/console returns, so we can drive every branch without having to log out the GUI session on a shared host. The program calls Go's real user.Current() and user.Lookup() against the real OS — only the stat source is substituted.
Build: GOOS=darwin GOARCH=arm64 go build -o repro .
Run: ./repro (as user) and sudo ./repro (as root, mimicking LaunchDaemon).
Reproduction matrix (actual program output)
On a Mac where a real user (devuser) is logged in at the console:
As normal user (non-root) — short-circuits at top of LoggedInUser:
Forced stat /dev/console
getDeveloperIdentity()
no_user_logged_in
real (devuser)
"devuser"
false ✓
_windowserver
"devuser"
false ✓
root
"devuser"
false ✓
""
"devuser"
false ✓
"nosuch_user"
"devuser"
false ✓
As root (sudo) — the LaunchDaemon path:
Forced stat /dev/console
getDeveloperIdentity()
no_user_logged_in
Status
real (devuser)
"devuser"
false
✓ correct
_windowserver
"root"
false
✗ BUG
root
"root"
false
✗ BUG
"" (stat error)
"root"
false
✗ BUG
"nosuch_user" (Lookup fail)
"root"
false
✗ BUG
Four out of five branches under root collapse to the wrong answer. The "real-user logged in" case is the only one that produces a correct payload — which is precisely why this hasn't surfaced as a hard failure for typical interactive desktop Macs, but reliably bites headless / loginwindow / virtualized fleets.
Affected scenarios
All macOS-only:
Loginwindow / no auto-login — daemon timer fires while the Mac is at the login screen; /dev/console is owned by _windowserver or root. Common on corporate Macs where users log out at end of day and the timer fires overnight.
FileVault pre-unlock — daemon fires before any user has unlocked the disk session.
Headless Mac mini / SSH-only deployments — no console user ever exists.
Virtualized / CI Mac runners (Tart, VirtualBuddy, MacStadium Orka v2, Anka, Cirrus, GitHub-hosted M1 runners, Codemagic). These pile on with UUID-format IOPlatformSerialNumber and templated hostnames, producing telemetry rows where serial_number is a UUID, hostname is fleet-templated, and user_identity is root — three degraded fields at once, which is what gets reported as "no metadata at all" from the field.
Suggested fixes
(1) and (2) are tiny and self-contained, and would on their own fix the most common observed failure mode.
Stop masking "no console user" in LoggedInUser(). When the macOS+root branch can't determine a real console user, return an error rather than falling back to CurrentUser(). getDeveloperIdentity() already handles errors correctly — it falls through to "unknown", which the existing NoUserLoggedIn logic flags correctly.
Catches any other code path that ends up with UserIdentity == "root" from a daemon — useful even after fix adding a demo gif #1 in case a future code path regresses.
Populate EnvironmentVariables in the LaunchDaemon plist at install time so the env-var path in getDeveloperIdentity() actually has a chance of working on macOS. Easiest: snapshot the console user's relevant env (launchctl asuser <uid> /usr/bin/env | grep -E '^(USER_EMAIL|DEVELOPER_EMAIL|STEPSEC_DEVELOPER_EMAIL)=') and bake it into the plist's EnvironmentVariables dict.
Don't silently swallow Hostname() errors in device.Gather(). Empty-string hostnames ship through as-is; should at minimum log a warn, and ideally fall back to scutil --get LocalHostName on macOS / /etc/hostname on Linux.
Add a metadata_quality field to the payload — count how many of {hostname, serial_number, os_version, user_identity} are empty/"unknown" so the backend / dashboard can surface fleet-wide degraded rows without per-field special-casing. Currently the per-field log.Warn lines never make it to the backend.
Additional context
Linux is unaffected: the systemd --user timer runs as the actual installing user, so LoggedInUser() (which short-circuits to CurrentUser() on non-Darwin) returns the right user. Windows uses a different path. Scope of fix is macOS-only.
Related but separate issue (worth tracking independently): on virtualized Macs, IOPlatformSerialNumber is a UUID, which internal/device/device.go passes through verbatim as serial_number since it does no shape validation. Compounding with this issue is what produces the "no metadata at all" reports from CI Mac fleets.
Summary
On macOS, when the agent runs as root (LaunchDaemon context) and there is no real GUI console user,
executor.Real.LoggedInUser()silently falls back touser.Current(), which under root returns therootuser witherr == nil.device.getDeveloperIdentity()then shipsuser_identity = "root", andtelemetry.Payload.NoUserLoggedInis incorrectly set tofalsebecause"root"matches neither""nor"unknown".This is reproducible end-to-end with a self-contained Go program that mirrors the production code (matrix below).
Version
stepsecurity-dev-machine-guard v1.11.0(darwin/arm64). Behavior is unchanged from at least v1.10.x — theLoggedInUser()fallback chain has been the same since it was introduced.OS
macOS (any version) — observed/reasoned across:
IOPlatformSerialNumberand templated hostnames, producing rows that look like "no metadata at all"Root cause walkthrough
device.getDeveloperIdentity()—internal/device/device.goUSER_EMAIL/DEVELOPER_EMAIL/STEPSEC_DEVELOPER_EMAILfirst.EnvironmentVariablesdict, whichinternal/launchd/launchd.godoes not populate. So this arm is dead code for the daemon path in practice.exec.LoggedInUser().executor.Real.LoggedInUser()—internal/executor/executor.go(the macOS+root branch):All three failure branches
(a),(b),(c)collapse ontor.CurrentUser(), which under a LaunchDaemon succeeds and returns therootuser witherr == nil.getDeveloperIdentity()seeserr == niland returnsu.Username, i.e."root".telemetry.Run()—internal/telemetry/telemetry.go:"root"matches neither, sono_user_logged_in: false. Thelog.Warn("user identity could not be determined …")line just above also doesn't fire.Net effect: a real "no user" state is shipped to the backend as "the user is root", and the dashboard has no way to distinguish it from a Mac where someone genuinely ran something as root.
Reproduction
Reproduced on a bare-metal M1 Mac mini running macOS 26.4.1, with a self-contained Go program (~80 lines, no repo dependencies) that mirrors the production logic from
executor.goanddevice.go. The program takes one knob: a forced value to substitute for whatstat -f%Su /dev/consolereturns, so we can drive every branch without having to log out the GUI session on a shared host. The program calls Go's realuser.Current()anduser.Lookup()against the real OS — only thestatsource is substituted.Reproducer (
repro.go)Build:
GOOS=darwin GOARCH=arm64 go build -o repro .Run:
./repro(as user) andsudo ./repro(as root, mimicking LaunchDaemon).Reproduction matrix (actual program output)
On a Mac where a real user (
devuser) is logged in at the console:As normal user (non-root) — short-circuits at top of
LoggedInUser:stat /dev/consolegetDeveloperIdentity()no_user_logged_indevuser)"devuser"false✓_windowserver"devuser"false✓root"devuser"false✓"""devuser"false✓"nosuch_user""devuser"false✓As root (
sudo) — the LaunchDaemon path:stat /dev/consolegetDeveloperIdentity()no_user_logged_indevuser)"devuser"false_windowserver"root"falseroot"root"false""(stat error)"root"false"nosuch_user"(Lookup fail)"root"falseFour out of five branches under root collapse to the wrong answer. The "real-user logged in" case is the only one that produces a correct payload — which is precisely why this hasn't surfaced as a hard failure for typical interactive desktop Macs, but reliably bites headless / loginwindow / virtualized fleets.
Affected scenarios
All macOS-only:
/dev/consoleis owned by_windowserverorroot. Common on corporate Macs where users log out at end of day and the timer fires overnight.IOPlatformSerialNumberand templated hostnames, producing telemetry rows whereserial_numberis a UUID,hostnameis fleet-templated, anduser_identityisroot— three degraded fields at once, which is what gets reported as "no metadata at all" from the field.Suggested fixes
(1) and (2) are tiny and self-contained, and would on their own fix the most common observed failure mode.
Stop masking "no console user" in
LoggedInUser(). When the macOS+root branch can't determine a real console user, return an error rather than falling back toCurrentUser().getDeveloperIdentity()already handles errors correctly — it falls through to"unknown", which the existingNoUserLoggedInlogic flags correctly.(Defense-in-depth) Treat a daemon-context
"root"as "no user" in the payload.Catches any other code path that ends up with
UserIdentity == "root"from a daemon — useful even after fix adding a demo gif #1 in case a future code path regresses.Populate
EnvironmentVariablesin the LaunchDaemon plist at install time so the env-var path ingetDeveloperIdentity()actually has a chance of working on macOS. Easiest: snapshot the console user's relevant env (launchctl asuser <uid> /usr/bin/env | grep -E '^(USER_EMAIL|DEVELOPER_EMAIL|STEPSEC_DEVELOPER_EMAIL)=') and bake it into the plist'sEnvironmentVariablesdict.Don't silently swallow
Hostname()errors indevice.Gather(). Empty-string hostnames ship through as-is; should at minimum log a warn, and ideally fall back toscutil --get LocalHostNameon macOS //etc/hostnameon Linux.Add a
metadata_qualityfield to the payload — count how many of{hostname, serial_number, os_version, user_identity}are empty/"unknown"so the backend / dashboard can surface fleet-wide degraded rows without per-field special-casing. Currently the per-fieldlog.Warnlines never make it to the backend.Additional context
--usertimer runs as the actual installing user, soLoggedInUser()(which short-circuits toCurrentUser()on non-Darwin) returns the right user. Windows uses a different path. Scope of fix is macOS-only.IOPlatformSerialNumberis a UUID, whichinternal/device/device.gopasses through verbatim asserial_numbersince it does no shape validation. Compounding with this issue is what produces the "no metadata at all" reports from CI Mac fleets.enable --nowtimer races with inline post-install telemetry #62 (Linux install lock-race) — different cause, different OS, but both are silent-degradation bugs in install/telemetry that don't surface as hard failures.