Vova Vova - 4 months ago 37
Java Question

Hibernate @Pattern regexp validator for ASCII symbols

I need to validate that user type only English text.
So it can be Latin letters with some punctuation symbols.
For now I write the following regex:

@NotEmpty
@Pattern(regexp = "^[ \\w \\d \\s \\. \\& \\+ \\- \\, \\! \\@ \\# \\$ \\% \\^ \\* \\( \\) \\; \\\\ \\/ \\| \\< \\> \\\" \\' \\? \\= \\: \\[ \\] ]*$")
private String str;


And it works fine.

But I think about more elegant way: I want to validate that my string contains only ASCII symbols. Can I do it with some special annotation or parameter? Or I need to write my custom validator for that? (can you help me with example in this case).

I want something like:

static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII"); // or "ISO-8859-1" for ISO Latin 1

boolean isValid(String input) {
return asciiEncoder.canEncode(input);
}

Answer

Option 1:

The Strings in Java are always encoded as UTF-16 where the ASCII character set is contained in the range of 0-127. Thus any non-ASCII char will never contain a number from 0 to 127.

str.chars().allMatch(c -> c < 128);

Option 2: Regex

public class Main {
    public static void main(String[] args) {
        char nonAscii = 0x00FF;
        String asciiText = "Day";
        String nonAsciiText = "Night " + nonAscii;
        System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
        System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
    }
}

Option 3: with java.nio.charset.Charset

import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

public class StringUtils {

  static CharsetEncoder asciiEncoder = 
      StandardCharsets.US_ASCII.newEncoder(); 

  public static boolean isPureAscii(String v) {
    return asciiEncoder.canEncode(v);
  }

  public static void main (String args[])
    throws Exception {

     String test = "RĂ©al";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     test = "Real";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
  }
}

Option 4: Using Guava , 3rd party

boolean isAscii = CharMatcher.ascii(someString);

Reference:

Option 1 quotes JeremyP & Julian Lettner from http://stackoverflow.com/a/3585791/1245478

Option 2 quotes Arne from http://stackoverflow.com/a/3585284/1245478

Option 3 quotes RealHowTo from http://stackoverflow.com/a/3585247/1245478

Option 4 quotes Colin D from http://stackoverflow.com/a/3585089/1245478