Dan Dan - 1 year ago 39
Linux Question

Linux: Update directory structure for millions of images which are already in prefix-based folders

This is basically a follow-up to Linux: Move 1 million files into prefix-based created Folders

The original question:

I want to write a shell command to rename all of those images into the
following format:

original: filename.jpg new: /f/i/l/filename.jpg

Now, I want to take all of those files and add an additional level to the directory structure, e.g:


Is this possible to do with command line or bash?

Answer Source

One way to do it is to simply loop over all the directories you already have, and in each bottom-level subdirectory create the new subdirectory and move the files:

for d in ?/?/?/; do (
  cd "$d" &&
  printf '%.4s\0' * | uniq -z | 
  xargs -0 bash -c 'for prefix do
                      mkdir -p "$s" && mv "$prefix"* "$s"
                    done' _
) done

That probably needs a bit of explanation.

The glob ?/?/?/ matches all directory paths made up of three single-character subdirectories. Because it ends with a /, everything it matches is a directory so there is no need to test.

( cd "$d" && ...; )

executes ... after cd'ing to the appropriate subdirectory. Putting that block inside ( ) causes it to be executed in a subshell, which means the scope of the cd will be restricted to the parenthesized block. That's easier and safer than putting cd .. at the end.

We then collecting the subdirectories first, by finding the unique initial strings of the files:

printf '%.4s\0' * | uniq -z | xargs -0 ...

That extracts the first four letters of each filename, nul-terminating each one, then passes this list to uniq to eliminate duplicates, providing the -z option because the input is nul-terminated, and then passes the list of unique prefixes to xargs, again using -0 to indicate that the list is nul-terminated. xargs executes a command with a list of arguments, issuing the command several times only if necessary to avoid exceeding the command-line limit. (We probably could have avoided the use of xargs but it doesn't cost that much and it's a lot safer.)

The command called with xargs is bash itself; we use the -c option to pass it a command to be executed. That command iterates over its arguments by using the for arg in syntax. Each argument is a unique prefix; we extract the fourth character from the prefix to construct the new subdirectory and then mv all files whose names start with the prefix into the newly created directory.

The _ at the end of the xargs invocation will be passed to bash (as with all the rest of the arguments); bash -c uses the first argument following the command as the $0 argument to the script, which is not part of the command line arguments iterated over by the for arg in syntax. So putting the _ there means that the argument list constructed by xargs will be precisely $1, $2, ... in the execution of the bash command.