Imran Ali Khan Imran Ali Khan - 17 days ago 8
C# Question

String Functions in SQL instead of C#

I have 2000000 sql records having names of customers. i want to changes as follows.

Sample Data :

Name
अमरीनaa मोयोद्दिनa शेखa
रऊफa वाहेदaa शेखa
शहेदाबेगमaa रऊफaa शेखa
इम्रानa रउफ़aa शेखa
दत्तुaa
कैलास धुमाळ
विलास दत्तु धुमाळ
बिस्मिल्ला बी अ.हमीद खॉन
इस्माईल खॉन अ.हमीद खॉन
नसरीन बेगम इस्माईल खान
अ.हमिद खॉन इमाम खॉन
अजमेर खॉ इमाम खॉ
सुग्रा बी अजमेर खॉ
हनीफाबी अजमेर
गौस अजमेर खान
यासीन अजमेर खान
राबिया बी मुश्ताक अली शेख
मो.सिंकंदर अली अन्वरअली शेख
गफार मोयीन शेख
नंदाबाई अशोक
सचिन आशोक दिवेकर
सोनाजी नामदेव बोराडे
व्दारका राजू गायकवाड
लिलाबाई सोनाजी बोराडे
शारदाबाई राजू जगदाळे
अनिता अर्जुन जगदाळे
मंदा सुनील वाढेकर
विठ्ठल दगडू
सुनिल विश्वनाथ वाढेकर
शिवाजी विश्वनाथ
गयाबाई शिवाजी
बाळू विश्वनाथ
वैशाली बाळू वाढेकर
पांडुरंग नामदेब वाघ नामदेव
हिराबाई पांडुरंग बाघ पांडुरंग
सवीता संतोष किर्तीकर
चंद्रकला प्रल्हाद
संतोष प्रल्हाद
अनिल प्रल्हाद
विजय प्रल्हाद किर्तीकर
राजेंद्र काशिनाथ
हिराबाई राजेंद्र
सुरेश पैठणे
नुतन सुरेश
गौतम पैठणे
शारदा गौतम पैठणे गौतम
राजू अंबादास
शोभाबाई राजू
सुनिता गोटीराम गायकवाड
बाळकृष्ण भानुदास दुलग


with above data having names with First, Middle and Last Names, I wants :


  1. In every words of names aa should be replace with single a which is any where in name ( अमरीनaa will be अमरीन and रऊफaa will be रऊफ)

  2. In every words of names a should be removed which is in in last in word ( शेखa will be शेख and मोयोद्दिनa will be मोयोद्दिन)

  3. If names have more then 2 words then last word will be in fisrt (विलास दत्तु धुमाळ will be धुमाळ विलास दत्तु)

  4. If names have less then 3 words then it will be same as it is कैलास धुमाळ will कैलास धुमाळ and दत्तुaa will be दत्तुaa)

  5. All Words staring and ending blank spaces should be removed.



for Above requirement i am using C# hard code to full fill but its takes 5 to 8 hours to complete, i want this should be done in sql side.

here my C# code :

int _pcount = 0;
string _qr = "";
string _Name = "";
string _FinalName = "";
string _FNAME = "";
string _LastName = "";
string _MiddleName = "";
string _ID = "";
string[] _Split;
List<string> _a = new List<string>();
DataRowCollection _dr = _CDatabase._MGetDataRows("SELECT _ID, _FULLNAME FROM MYTABLE ORDER BY _FULLNAME"); // This is a function will execute a sql and return DataRowCollection
progressBar1.Maximum = _dr.Count + 1;
progressBar1.Value = 0;
using (SqlConnection con = new SqlConnection("MyConStr"))
{
con.Open();
using (SqlTransaction trans = con.BeginTransaction(IsolationLevel.ReadCommitted))
{
try
{
foreach (DataRow _row in _dr)
{
progressBar1.Value++;
_pcount++;
if (_pcount >= 100)
{
_pcount = 0;
Application.DoEvents();
}
_ID = _CConvert._MConvertToString(_row[0]);
_Name = _CConvert._MConvertToString(_row[1]);

_Split = _Name.Split(' ');
_a = _Split.ToList();
_a.Remove(" ");
_a.Remove(" ");
_a.Remove(" ");
_a.Remove(" ");
_a.Remove(" ");
_a.Remove(" ");
switch (_a.Count)
{
case 0:
{
_FNAME = _FinalName = _Name;
_FNAME = _Name;
_LastName = "";
_MiddleName = "";
break;
}
case 1:
{
_FNAME = _FinalName = _Name;
_FNAME = "";
_LastName = _Name;
_MiddleName = "";
break;
}
case 2:
{
_FNAME = _FinalName = _Name;
_FNAME = _a[0];
_LastName = _FNAME;
_MiddleName = _a[1];
break;
}

case 3:
{
_FinalName = _a[2] + " " + _a[0] + " " + _a[1];
_FNAME = _a[0];
_MiddleName = _a[1];
_LastName = _a[2];
break;
}
case 4:
{ // nasreen begum ismail khan
_FinalName = _a[3] + " " + _a[0] + " " + _a[1] + " " + _a[2];
_FNAME = _a[0] + " " + _a[1]; // nasreen begum
_MiddleName = _a[2];// ismail
_LastName = _a[3];//khan
break;
}

case 5:
{ // jaibunnisa begum gulam dastagir sahab syed
_FinalName = _a[4] + " " + _a[0] + " " + _a[1] + " " + _a[2] + " " + _a[3];
_FNAME = _a[0] + " " + _a[1]; // jaibunnisa begum
_MiddleName = _a[2] + " " + _a[3];// gulam gastagir
_LastName = _a[4];//syed
break;
}

case 6:
{ // jaibunnisa begum gulam dastagir syed
_FinalName = _a[5] + " " + _a[0] + " " + _a[1] + " " + _a[2] + " " + _a[3] + " " + _a[4];
_FNAME = _a[0] + " " + _a[1]; // jaibunnisa begum
_MiddleName = _a[2] + " " + _a[3] + " " + _a[4];// gulam gastagir
_LastName = _a[5];//syed
break;
}
case 7:
{ // jaibunnisa begum gulam dastagir syed
_FinalName = _a[6] + " " + _a[0] + " " + _a[1] + " " + _a[2] + " " + _a[3] + " " + _a[4] + " " + _a[5];
_FNAME = _a[0] + " " + _a[1]; // jaibunnisa begum
_MiddleName = _a[2] + " " + _a[3] + " " + _a[4] + " " + _a[5];// gulam gastagir
_LastName = _a[6];//syed
break;
}
default:
{
_FinalName = _Name;
_FNAME = "";
_LastName = "";
_MiddleName = "";
break;
}
}

_qr = "UPDATE MYTABLE SET _FULLNAME = N'" + _FinalName + "' WHERE _ID = '" + _ID + "'";
_mExcute(_qr, con, trans);
}
trans.Commit();
con.Close();
trans.Dispose();
MessageBox.Show("DONE");
}
catch (Exception ex)
{
trans.Rollback();
con.Close();
_CShowMessageBox._MShowErrorMessageBox(ex.Message);
}
}
}

Answer

It is not clear for me if your strings processing can easily be implemented in SQL, but in order to be faster it must allow set based approach. Something like:

UPDATE MYTABLE SET _FULLNAME = dbo.getFullName(paramters)

The main cause for your code being slow is the huge number of UPDATE statements. If you use SQL Profiler, you will see how much activity you generate.

Before trying to convert to SQL, I would try the following:

1) Precompute your full name into a list of strings 2) Create a buffer table with a structure like the following (Buffer):

ID INT NOT NULL,
FullName NVARCHAR(255) NOT NULL

3) Use Bulk Insert to persist to Buffer. Bulk insert is much faster (dozens, if not hundreads of times faster) as it mimimizes round trips between your application and SQL Server.

4) Use a statement to update from your buffer to your final table

UPDATE Dest
SET T.FullName = B.FullName
FROM MYTABLE T
JOIN Buffer B ON B.ID = T.ID
Comments