Sklivvz Sklivvz - 2 months ago 8x
C# Question

Regular expression for validating names and surnames?

Although this seems like a trivial question, I am quite sure it is not :)

I need to validate names and surnames of people from all over the world. How can I do that with a regular expression? If it were only English ones I think that this would cut it:

^[a-z -']+$

However, I need to support also these cases:

  • other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)

  • different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)

  • no numbers or symbols or unnecessary punctuation or runes, etc..

Is there a standard way of validating these fields I can implement to make sure that our website visitors have a great experience and can actually use their name when registering?

I would be looking for something similar to the many "email address" regexes that you can find on google.

For the sake of clarity, I don't need one single regex for the "whole" name. I would expect users to be able to split their name in the two main constituents according to their customs, and not to use suffixes and titles -- which could be contained in other fields if need be.

The main purpose of the question is to validate against XSS and SQL-injection (yes, I already use stored procedures, but I need to future- and idiot-proof the data).

The way any XSS filter will work is by only allowing what is strictly necessary -- not by disallowing known XSS vectors (i.e. disallowing "script", "<", etc...). To get an idea of the incredible variety of attacks that can be used, take a look here:

Sorry for not mentioning this before, and thus making the question a bit more misterious, but I didn't want to read 30 answers translitterating "disallow the < or > and you are safe!".

See here for a good starting point on Unicode character classes in C# Regexes -- which of these are strictly necessary for writing a name? I honestly have no idea of which, but possibly the collective mind of stackoverflow can help?

(I am prepared to force people like Jennifer 8 Lee to write their name in letters ;-)

So, I did "bother" to do it myself, because I think nobody else even tried. Guess what? Apparently I did find a proper answer, posted below! It wasn't that hard.

Can you help me find a valid, existing name or a XSS vector that can break that validation?


I'll try to give a proper answer myself:

The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.

Regarding numbers, there's only one case with an 8. I think I can safely disallow that.

Regarding letters, any letter is valid.

I also want to include space.

This would sum up to this regex:

^[\p{L} \.'\-]+$

This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.

So the validation code should be something like this (untested):

var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[\p{L} \.\-]+$")) 
    throw new ArgumentException("nameParam");
name = name.Replace("'", "&#39;");  //&apos; does not work in IE

Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?

complete tested solution

using System;
using System.Text.RegularExpressions;

namespace test
    class MainClass
    	public static void Main(string[] args)
    		var names = new string[]{"Hello World", 
    			"علاء الدين",
    			"' --",
    		foreach (var nameParam in names)
    			Console.Write(nameParam+" ");
    			var name = nameParam.Trim();
    			if (!Regex.IsMatch(name, @"^[\p{L}\p{M}' \.\-]+$"))
    			name = name.Replace("'", "&#39;");