12th September 2013

Dramatic Faster Sorting in Linux Using Nsort

Last year I used a drop-in replacement for the ordinary Linux sort command called nsort from Ordinal Technology. Ordinal's nsort is free but not open-source. One thing is clear, however, it is very fast. nsort was written by Chris Nyberg.

The motivation for looking for a faster sort was as follows. I had to drop all duplicate records from a single Oracle database table. The table had more than 800 million records. It was later found out, i.e., after I already had the solution, that from the initial number of records only 3% of the records would remain, i.e., 97% of the records were indeed duplicates. The solution basically was to extract all data from the table with a small C program. The extracted data was then sorted (sort -u), the result then loaded into the database table again.

Using nsort instead of plain sort runtime was one-third. In my case overall runtime went down from 60 minutes to 20 minutes.

Nsort user guide is the very readable user's guide to nsort.

Benchmarks involving nsort can be found at sortbenchmark.org.

Categories: Linux, programming, performance
Tags: , , , , , , , , , , ,
Author: Elmar Klausmeier