Single spaces leading up to a tab are treated inconsistently. Sometimes they are replaced by a tab and sometimes not. The info page is vague enough to allow either interpretation, but the variations seem undesirable. If there's a good reason for the behavior, it should be documented. I would note that the POSIX.1 man page is explicit, and allows only for changing an initial sequence of blanks (not at issue here) or two or more blanks leading up to a tab (also not at issue) so that a POSIX-compliant implementation would not do conversions at all in the case of single blanks. This seems consistent with the motivation of making the file smaller, and avoiding changes that do not further that end. Test case: blanks are represented as periods (.) to avoid email mangling. unexpand -t4 -a <<EOF | cat -t abc.def..g EOF abc.def^T.g The blank betweed "c" and "d" is not converted, but the blank after "f" is converted to a tab (^T). It is not at all clear why, since they both lead up to a tab stop. One surmises that the following blank is making a difference, but it's hard to see a motivation for the distinction. I submit that it is just as well not to convert in both cases, as that is most consistent with POSIX. In any event, the documentation should be more clear about what cases are handled and how. Reproducible: Always
I reported this to gnu.org as well. The reply said they don't see the problem in the original code, but they'll add tests based on what I found. I downloaded a fresh copy of coreutils 6.12 and can confirm that this bug does not appear in the original of version 6.12. The GNU guy thought it might have been caused during the addition of i18n, but setting locales (LANG=C) had no effect for me, so I dunno.
This may not be an unexpand(1) bug exactly. It turns out that I was running with LANG and LC_ALL both set to "en_US.utf8". If I set them both to "C", the bugs go away. Unfortunately, that does not work well for me, and I use other locales as well (but haven't tested unexpand in them).
fixed in newer versions as we've dropped the utf8 patchset