var accentedCharacters = "àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ";
// Build the full regex
var regex = "^[a-zA-Z" + accentedCharacters + "]+,\\s[a-zA-Z" + accentedCharacters + "]+$";
// Create a RegExp from the string version
regexCompiled = new RegExp(regex);
// regexCompiled = /^[a-zA-ZàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]+,\s[a-zA-ZàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]+$/
var regex = /^.+,\s.+$/;
Which of these three approaches is most suited for the task?
Depends on the task :-) To match exactly all latin characters and there accented versions, the unicode ranges probably provide the best solution. They might be extended to all non-whitespace characters, which could be done using the
\S character class.
I'm forcing a field in a UI to match the format:
last_name, first_name(last [comma space] first)
The most basic problem I'm seeing here are not diacritics, but whitespaces. There are a few names that consist of multiple words, e.g. for titles. So you should go with the most generic, that is allowing everything but the comma that distinguishes first from last name:
But your second solution with the
. character class is just as fine, you only might need to care about multiple commata then.