Mahdi Jadaliha Mahdi Jadaliha - 1 year ago 49
R Question

file match pattern R with several possible prefix and suffix

I want to find any files that start with one of the possible prefixes and ends with one of the possible suffixes.

Here is the example:


  1. file can start with "API" or "DB" or "S3" followed with one "_"

  2. then the ID of the query is showing up plus a "." character

  3. and the file name will end up with one of the "JSON" or "SQL" or "TXT"



I used the following code

filesPattern = "[DB|API|S3]_.*.[JSON|SQL|TXT]$"
LIST_OF_FILES = toupper(list.files(dirProcess,
pattern = filesPattern,
ignore.case = T))


which is working somehow, but not accurately. first of all, I don't know how to force start with one of these prefixes. second the "." character before suffix is not checked. there are some other problems as well which I'm not sure if I defined possible prefix and suffix right?!

Finally, how I can get one file with an specific ID? for example:

these are my file names:

[1] "API_GPT.TXT" "API_GPTR.R" "DB_COUNTRY.SQL"
[4] "DB_DECISIONS.SQL" "S3_BUCKET_LIST.R"


and I'm looking to get a file with ID = "DECISIONS".

Answer Source

The [DB|API|S3] is a bracket expression that matches a single char: D, B, |, A, P, I, S, or 3.

You may use

filesPattern = "^(DB|API|S3)_.*\\.(JSON|SQL|TXT)$"
LIST_OF_FILES  = list.files(dirProcess, pattern = filesPattern, ignore.case = TRUE)

Details:

  • ^ - start of string
  • (DB|API|S3) - either of the 3 alternatives (DB, API, S3 substrings)
  • _ - an underscore
  • .* - any 0+ chars, as many as possible
  • \\. - a literal . symbol
  • (JSON|SQL|TXT)- either of the 3 alternatives (JSON, SQL, TXT substrings)
  • $ - end of string.

You do not need toupper() as you are using ignore.case = TRUE argument that makes pattern matching case insensitive.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download