ferret@jupiter ~ $ sort -u -b -k1,1 <<<$'a b c\na b c' a b c a b c I believe here that the blanks between 'a' and 'b' are part of field 2, not field 1, and so should not make the lines different. I believe the correct output should be 'a b c\n'. Explaining why this is wrong takes some doing. Here is all the relevant text I could find from info sort. ------- START OF TEXT FROM INFO SORT -------- `-b' `--ignore-leading-blanks' Ignore leading blanks when finding sort keys in each line. By default a blank is a space or a tab, but the `LC_CTYPE' locale can change this. ...... `-t SEPARATOR' `--field-separator=SEPARATOR' Use character SEPARATOR as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the `LC_CTYPE' locale can change this. That is, given the input line ` foo bar', `sort' breaks it into fields ` foo' and ` bar'. The field separator is not considered to be part of either the field preceding or the field following, so with `sort -t " "' the same input line has three fields: an empty field, `foo', and `bar'. However, fields that extend to the end of the line, as `-k 2', or fields consisting of a range, as `-k 2,3', retain the field separators present between the endpoints of the range. ...... A position in a sort field specified with `-k' may have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the start and end positions of a field specification, and if it is inherited from the global options it will be attached to both. If input lines can contain leading or adjacent blanks and `-t' is not used, then `-k' is typically combined with `-b', `-g', `-M', or `-n'; otherwise the varying numbers of leading blanks in fields can cause confusing results. If the start position in a sort field specifier falls after the end of the line or after the end field, the field is empty. If the `-b' option was specified, the `.C' part of a field specification is counted from the first nonblank character of the field. ...... [in the examples section] The inheritance works in this case because `-k 5b,5b' and `-k 5b,5' are equivalent, as the location of a field-end lacking a `.C' character position is not affected by whether initial blanks are skipped. ------- END OF TEXT FROM INFO SORT -------- First we need to establish where exactly sort is choosing to split keys. In order to do this I added a debug printing line to src/sort.c which puts parentheses around where it thinks the first key is (patch so you can try this in the first reply). Here's what we get with a few different options: ferret@jupiter ~/coreutils-7.1~/src $ ./sort -u -k1,1 <<<$' a b c' debug: ( a) b c a b c ferret@jupiter ~/coreutils-7.1~/src $ ./sort -u -k1b,1 <<<$' a b c' debug: (a) b c a b c ferret@jupiter ~/coreutils-7.1~/src $ ./sort -b -u -k1,1 <<<$' a b c' debug: (a )b c a b c ferret@jupiter ~/coreutils-7.1~/src $ ./sort -u -k1b,1b <<<$' a b c' debug: (a )b c a b c The first two seem obviously correct given the documentation. By default the first field includes its leading space. With the b option added to POS1, that leading space has been excluded from the key. In the third and fourth runs, the TRAILING space has been tacked onto the key. Why? As far as I can tell by looking at the code it's doing just what it does with POS1: it's taking whatever position it would normally take, and then pushing forward past any blanks. The thing is, this is a totally bizarre thing to do at the end of a key! It's a limit, not a startpoint, so by pushing forward from there you are ADDING trail blanks, not removing lead blanks. While it makes the code seem orthogonal, the behaviour is unhelpful, confusing, and not as documented (see the last paragraph of the INFO section above). As an aside, and I hesitate to mention this in case it adds to the confusion, sort -b -k1,1.0 works just fine. This is in spite of the documentation stating that a missing '.C' character specifier in POS2 shall act as if '.0' had been specified.
Created attachment 186225 [details, diff] patch I used to debug key positioning in coreutils-7.1 sort
I've checked some other sort implementations. Sort on HPUX 10.20 and 11.11 behaves the same way as this one. Sort on Solaris 9 and Solaris 10 behaves "correctly" (my definition of correctly, from the start of the bug report). Sort from an old version of coreutils (5.2.1) works "correctly".
Good analysis of the bug! The bug was in fact always present in coreutils I think and fixed 1 week after coreutils-7.1 was released :( That debug patch is very useful BTW :) I wonder would a --key-debug option be useful for sort to output something like: ⌈ a⌋ b c
Thanks for the comment. I tried the latest coreutils (which was a thoroughly harrowing experience, I can tell you), and this bug is indeed fixed in that version! So now we just have to wait for the next release and I can close this bug.
Note the next release of coreutils (7.2 due very soon now) will depend on a released version of automake (1.10b), which hopefully will ease the dependencies somewhat.
coreutils-7.2 added to the tree