Camilo Martin - 9 months ago 43
Linux Question

# Disk usage of files whose names match a regex, in Linux?

Disclaimer: I'm doing this question (and answer) based on this:

Source (emphasis not mine).

So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linux bootable media.

In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:

But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:

.*\.bak\$


How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?

Edit: The output is intended to be parseable by a script.

I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
• The -print0 option for find and --files0-from for du are there to avoid issues with whitespace in file names
• The regular expression is matched against the whole path, e.g. ./dir1/subdir2/file.bak, not just file.bak, so if you modify it, take that into account
• I used h flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off with k (always use kilobytes)
• If you remove the tail command, you will additionally see the sizes of particular files and directories