Martin Sommervold Martin Sommervold - 5 months ago 19
SQL Question

SQL Server find uncommon names in user registry

In a database containing several hundreds of thousands of names (separated on firstnames and lastnames), is there a way to perform analysis to identify uncommon names? The purposes is to produce test data that represents a the full width of ways names can be written.

I guess there's many ways to define uncommon, so what I'm looking for here are both ideas for what to look for, and how to find them. For instance, have people included their title in their name? Non-ascii characters? Leading whitespace? Do people have several first names? Last names? etc.

Answer

You can use the below (informal) query and keep extend/Customize it based your need,

SELECT  *
FROM    ( SELECT    'LeadingWhiteSpaces' AS Flaw ,
                    *
          FROM      Emp
          WHERE     FirstName <> LTRIM(RTRIM(FirstName))
          UNION
          SELECT    'Non-ASCII' AS Flaw ,
                    *
          FROM      Emp
          WHERE     FirstName LIKE '%[^a-z ,-.^0-9]%'
          UNION
          SELECT    'TitlesInName' AS Flaw ,
                    *
          FROM      Emp
          WHERE     LTRIM(RTRIM(FirstName)) LIKE 'MR.%'
                    OR LTRIM(RTRIM(FirstName)) LIKE 'MRs.%'
                    OR LTRIM(RTRIM(FirstName)) LIKE 'Jr.%'
          UNION
          SELECT    'SeveralFirstNames' AS Flaw ,
                    *
          FROM      Emp
          WHERE     LTRIM(RTRIM(FirstName)) LIKE '% %'
        ) A
ORDER BY Flaw