ADPK ADPK - 8 days ago 7
Bash Question

Filter folders whose name is a timestamp

I am writing a generic shell script which filters out files based on given regex.

My shell script:

files=$(find $path -name $regex)


In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:

20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS


I am unable to arrive at the correct regex.

I am able to get the path of the files inside the folder using the regex
'*data.txt'
, as I know the name of the file inside it.

But it gives me the full path of the file, something like

/path/20161128-20:34:33:432813246/data.txt


What I want is simply:

/path/20161128-20:34:33:432813246


Please help me in identifying the correct regex for my requirement

NOTE:

I know how to process the data after

files=$(find $path -name $regex)


But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.

Answer
  • Per POSIX, find's -name -path primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).

    • -name and matches the pattern against the basename (mere filename) part of an input path only
    • -path matches the pattern against the whole pathname (the full path)
  • Both GNU and BSD/macOS find implement nonstandard extensions:

    • -iname and -path, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.
    • -regex and -iregex tests for matching pathnames by regex (regular expression).
      • Caveat: Both implementations offer at least 2 regex dialects to choose from (-E activates support for extended regular expressions in BSD find, and GNU find allows selecting from several dialects with-regextype, but no two dialects are exactly the same across the two implementations. Sticking with POSIX BREs (basic regular expressions) is safe, but their syntax is awkward and their capabilities are limited.

With your folder names following a fixed-width naming scheme, a pattern would work:

$pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Of course, you can take a shortcut if you don't expect false positives:

$pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'

Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

If we put it all together:

files=$(find "$path" -type d -name "$pattern")
  • It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value $pattern.

  • Note that I've added -type d to limit matching to directories (folders), which improves performance.