coreutils' comm utility silently corrupts data by performing lossy UTF-8 conversion on all output lines
Low severity
GitHub Reviewed
Published
Apr 22, 2026
to the GitHub Advisory Database
•
Updated Apr 29, 2026
Description
Published by the National Vulnerability Database
Apr 22, 2026
Published to the GitHub Advisory Database
Apr 22, 2026
Last updated
Apr 29, 2026
Reviewed
Apr 29, 2026
The comm utility in uutils coreutils silently corrupts data by performing lossy UTF-8 conversion on all output lines. The implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD). This behavior differs from GNU comm, which processes raw bytes and preserves the original input. This results in corrupted output when the utility is used to compare binary files or files using non-UTF-8 legacy encodings.
References