Abhinav Abhinav - 5 months ago 29
Linux Question

Natural Sort of Directory Filenames in C++

I have a directory listing for which I want to retrieve the filenames and put them in a vector of strings such that they are sorted in a "natural" manner. e.g.

{ "10.txt" "0.txt" "2.txt" "1.m" "Jan12" "July13.txt" "Nov25.txt" "Jane" "John" }
should be
{"0.txt" "1.m" "2.txt" "10.txt" "Jan12" "July13.txt" "Nov25.txt" "Jane" "John" }
. What is the easiest way to do this?

Elaborating on "natural" we assume for a string made up from parts of numbers (N) and text (T) such that
...(N)(T)...
, then for
...(N1)(T1)...
and
...(N2)(T2)...
will be
(N1<N2) (<) (T1<T2)
where
(<)
implies left term precedence over right term. In this case, numbers take precedence over text fields if they are in the same position in the string, i.e.
1.z (<) 1_t.txt
.

Is there already a library function to do that kind of sort for alphanumeric strings, or directory entries?

DESIRED order in which files should come. The file names will be stored in a vector of strings.

Abhinav@Abhinav-PC /cygdrive/c/AbhinavSamples/shell
$ ls -lv
total 8
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 1.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:55 1_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:50 3.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 4.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 10.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 10_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 13.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 20.txt

**Simple Sort**
Abhi@Abhi-PC /cygdrive/c/AbhinavSamples/shell
$ ls -l
total 8
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 1.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 10.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 10_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 13.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:55 1_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 20.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:50 3.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 4.txt

Answer

There is a function that does exactly what you want in glibc. Unfortunately it is C, not C++, so if you can live with that here is the simplest possible solution "out of the box", without reimplementing anything and reinventing the wheel. BTW: this is exactly as ls -lv is implemented. The most important part of it is the versionsort function which does the natural sort for you. It is used here as a comparison functin for scandir. The simple example below prints all files/directories in current directory sorted as you wish.

#define _GNU_SOURCE
#include <dirent.h>
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    struct dirent **namelist;
    int n,i;

    n = scandir(".", &namelist, 0, versionsort);
    if (n < 0)
        perror("scandir");
    else
    {
        for(i =0 ; i < n; ++i)
        {
            printf("%s\n", namelist[i]->d_name);
            free(namelist[i]);
        }
        free(namelist);
    }
    return 0;
}