jlp jlp - 7 months ago 15
Bash Question

How to grep a portion of file name using grep or sed based on a pattern in shell script

I need to get a portion of file name based on a pattern. The file pattern here is not for checking if the file name matches the pattern exactly. The "?"s represent dates, so it can be in the format of YYYYMMDD, or YYYY-MM-DD, and I don't want to get the dates. I guess for now, I will just try to get the letter portion before or after the date portion based on the pattern.

For example, if the file name pattern and the actual file name are:

*_???????? and file name: ab_cd_20160505_efg.txt


I want to grep the string
ab_cd
.
efg
is skipped because it's not part of the pattern.

If the file pattern and the actual file name are:

????-??-??_* and file name: 2016-05-05_abc_def-ghi.csv


(contain both dash and undercore), I want to grep the string
abc_def-ghi
. The
.csv
is skipped because we don't care about the file extension, that's why we didn't give
.csv
in the pattern.

So, can someone let me know how to accomplish these using grep or sed or other command in shell script?

Answer

a two step approach

$ pattern=$(sed 's/*/([^0-9.]+)/;s/?/[0-9]/g' <<< '*_????????');
$ sed -r "s/$pattern.*/\1/" <<< 'ab_cd_12345678_efg.txt'
ab_cd

$ pattern=$(sed 's/*/([^0-9.]+)/;s/?/[0-9]/g' <<< '????-??-??_*');
$ sed -r "s/$pattern.*/\1/" <<< '1234-56-78_abc_def-ghi.csv'
abc_def-ghi

note the double quotes in the second sed command to let bash expand the pattern.