Avión Avión - 28 days ago 12
Java Question

Sanitizing strings with filenames and extension in Java

Having this four type of file names:


  1. Filename with double extension

  2. Filename with no extension

  3. Filename with dot at the end, and no extension

  4. Filename with a proper name.



Like this:

String doubleexsension = "doubleexsension.pdf.pdf";
String noextension = "noextension";
String nameWithDot = "nameWithDot.";
String properName = "properName.pdf";

String extension = "pdf";


My aim is to sanitze all the types and output only the
filename.filetype
properly. I made a little stupid script in order to make this post:

ArrayList<String> app = new ArrayList<String>();
app.add(doubleexsension);
app.add(properName);
app.add(noextension);
app.add(nameWithDot);

System.out.println("------------");

for(String i : app) {

// Ends with .
if (i.endsWith(".")) {
String m = i + extension;
System.out.println(m);
break;
}

// Double extension
String p = i.replaceAll("(\\.\\w+)\\1+$", "$1");
System.out.println(p);
}


This outputs:

------------
doubleexsension.pdf
properName.pdf
noextension
nameWithDot.pdf


I dont know how can I handle the
noextension
one. How can I do it? When there's no extension, it should take the
extension
value and apped it to the string at the end.

My desired output would be:

------------
doubleexsension.pdf
properName.pdf
noextension.pdf
nameWithDot.pdf


Thanks in advance.

Answer

You may add alternatives to the regex to match all kinds of scenarios:

(?:(\.\w+)\1*|\.|([^.]))$

And replace with $2.pdf. See the regex demo.

Details:

  • (?: - start of grouping, the $ end of string anchor is applied to all the alternatives below (they must be at the end of string)
    • (\.\w+)\1* - duplicated (or not) extensions (. + 1+ word chars repeated zero or more times)
    • | - or
    • \. - a dot
    • | - or
    • ([^.]) - any char that is not a dot captured into Group 2
  • ) - end of the outer grouping
  • $ - end of string.

See Java demo:

List<String> strs = Arrays.asList("doubleexsension.pdf.pdf","noextension","nameWithDot.","properName.pdf");
for (String str : strs)
    System.out.println(str.replaceAll("(?:(\\.\\w+)\\1*|\\.|([^.]))$", "$2.pdf"));