Camilo Martin Camilo Martin - 4 months ago 26
Linux Question

Disk usage of files whose names match a regex, in Linux?

Disclaimer: I'm doing this question (and answer) based on this:


it is not merely OK to ask and answer your own question, it is explicitly encouraged.


Source (emphasis not mine).




So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linux bootable media.

In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:

                ncdu

But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:

.*\.bak$


How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?

Edit: The output is intended to be parseable by a script.

Answer

I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1

Some notes:

  • The -print0 option for find and --files0-from for du are there to avoid issues with whitespace in file names
  • The regular expression is matched against the whole path, e.g. ./dir1/subdir2/file.bak, not just file.bak, so if you modify it, take that into account
  • I used h flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off with k (always use kilobytes)
  • If you remove the tail command, you will additionally see the sizes of particular files and directories

Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.