Unix sort with scientific notation and two columns

60 views Asked by At

I'm scratching my head about the results of sorting two columns with unix sort.

Here's some dummy data in a file called test:

A       2e-12
A       3e-14
A       1e-15
A       1.2e-13
B       1e-13
B       1e-14
C       4e-12
C       3e-12

I would like to sort by column 1 first, then column 2, to produce:

A       1e-15
A       3e-14
A       1.2e-13
A       2e-12
B       1e-14
B       1e-13
C       3e-12
C       4e-12

If I give it just the second column to sort on, it will sort the scientific notation correctly:

sort -g -k2 test
A       1e-15
B       1e-14
A       3e-14
B       1e-13
A       1.2e-13
A       2e-12
C       3e-12
C       4e-12

This stack question addresses a similar problem, but it seems that my test only breaks down when I ask for two columns to sort on.

This other example looks really close to what I want, but when I give separate -k it doesn't alter the behavior for my test set.

These trials:

sort -k1,1 -g  test
sort -k1,1 -g -k1,2  test
sort -k1,1 -g -k2,1  test

Produce:

A       1.2e-13
A       1e-15
A       2e-12
A       3e-14
B       1e-13
B       1e-14
C       3e-12
C       4e-12

And these trials:

sort -g -k2 -k1  test
sort -g -k2 -k1,1  test
sort -g -k2,2 -k1,1  test
sort -k1,1 -g -k2,2 test
sort -k1,1 -g -k2,2  test

Produce:

A       1e-15
B       1e-14
A       3e-14
B       1e-13
A       1.2e-13
A       2e-12
C       3e-12
C       4e-12

I have tested with LANG=C and LC_ALL=C without luck. I'm running this on Red Hat and the version is GNU coreutils 8.22.

1

There are 1 answers

0
CGanote On

I figured it out while writing the stack question, so I thought I'd just go ahead and post the question with my solution.

I was confused about what the -kn,n meant and actually using sort with the --debug flag helped me find the answer.

This question pretty much nails it on the head: always use -kX,X to make sure I'm only considering one field at a time, and then specify g in the numeric field.

sort -k1,1 -k2,2g test
A       1e-15
A       3e-14
A       1.2e-13
A       2e-12
B       1e-14
B       1e-13
C       3e-12
C       4e-12

Yay!