KdgDev KdgDev - 26 days ago 11
Java Question

Regex to replace characters that Windows doesn't accept in a filename

I'm trying to build a regular expression that will detect any character that Windows does not accept as part of a file name (are these the same for other OS? I don't know, to be honest).

These symbols are:

 \ / : * ? "  | 


Anyway, this is what I have:
[\\/:*?\"<>|]


The tester over at http://gskinner.com/RegExr/ shows this to be working.
For the string
Allo*ha
, the
*
symbol lights up, signalling it's been found. Should I enter
Allo**ha
however, only the first
*
will light up. So I think I need to modify this regex to find all appearances of the mentioned characters, but I'm not sure.

You see, in Java, I'm lucky enough to have the function String.replaceAll(String regex, String replacement).
The description says:


Replaces each substring of this string that matches the given regular expression with the given replacement.


So in other words, even if the regex only finds the first and then stops searching, this function will still find them all.

For instance:
String.replaceAll("[\\/:*?\"<>|]","")


However, I don't feel like I can take that risk. So does anybody know how I can extend this?

Answer

Windows filename rules are tricky. You're only scratching the surface.

For example here are some things that are not valid filenames, in addition to the chracters you listed:

                                    (yes, that's an empty string)
.
.a
a.
 a                                  (that's a leading space)
a                                   (or a trailing space)
com
prn.txt
[anything over 240 characters]
[any control characters]
[any non-ASCII chracters that don't fit in the system codepage,
 if the filesystem is FAT32]

Removing special characters in a single regex sub like String.replaceAll() isn't enough; you can easily end up with something invalid like an empty string or trailing ‘.’ or ‘ ’. Replacing something like “[^A-Za-z0-9_.]*” with ‘_’ would be a better first step. But you will still need higher-level processing on whatever platform you're using.

Comments