Petey Petey - 3 months ago 10
C# Question

Validate 3 chunks of strings

I'm really struggling to get this work.

I have a

string mess
which I need to split up in three chunks. I have done this properly but now I'm stuck in how to validate the two first chunks (
i_clob1
and
i_clob2
) of strings with special characters in the end
"<"
,
">"
,
"</"
- because when I cut these three chunks, my stored procedure appends the three strings and it can look bad when it is cut in those characters.

If the first (
i_clob1
) or second (
i_clob2
) chunks contains the characters in the end, I would like to some how expand the string so that I don't have the characters in the end or maybe trim it? without loosing the characters. And the characters will then be in the next string. The last chunk (
i_clob3
) dont need to be validated because it will always have the last bit of text.

I really hope someone have an answer to my riddle :-)

My happy schenario is when it cut the string in some random text instead of those characters.

My current code:

public void Enqueue(string queueName, string mess)
{
if (mess.Length >= 3 && mess.Length <= 32000 * 3)
{
int lastStart = 2 * mess.Length / 3;
int lastLength = mess.Length - lastStart;

string i_clob1 = mess.Substring(0, (mess.Length / 3));
string i_clob2 = mess.Substring(mess.Length / 3, mess.Length / 3);
string i_clob3 = mess.Substring(lastStart, lastLength);


OracleCommand cmd = null;
try
{
cmd = new OracleCommand("", m_Connection)
{
CommandText = m_InSpName,
CommandType = CommandType.StoredProcedure
};


//add Aq queue name
OracleParameter qName = new OracleParameter("qname", OracleType.VarChar)
{
Direction = ParameterDirection.Input,
Value = queueName
};

//add message to enqueue
OracleParameter message1 = new OracleParameter("i_clob1", OracleType.Clob)
{
Direction = ParameterDirection.Input
};
OracleParameter message2 = new OracleParameter("i_clob2", OracleType.Clob)
{
Direction = ParameterDirection.Input
};
OracleParameter message3 = new OracleParameter("i_clob3", OracleType.Clob)
{
Direction = ParameterDirection.Input
};

i_clob1 = i_clob1.Replace("<?xml version=\"1.0\" encoding=\"utf-16\"?>", "");
message1.Value = i_clob1;
message2.Value = i_clob2;
message3.Value = i_clob3;

cmd.Parameters.Add(qName);
cmd.Parameters.Add(message1);
cmd.Parameters.Add(message2);
cmd.Parameters.Add(message3);

cmd.ExecuteNonQuery();

}
catch (Exception ex)
{
//rethrow exception and make sure we clean up i.e. execute finally below
throw new Exception("An error has occurred trying to deliver to the queue", ex);
}

finally
{
if (cmd != null)
{
cmd.Dispose();
}
}

}
}


This is an example of my input. Normally it will be about 30.000 characters total.

<item>
<item_number>1231</item_number>
<item_title>Lorem ipsum dolor sit</item_title>
<item_pbl_code>Lorem ipsum dolor sit</item_pbl_code>
<item_dep_code>Lorem ipsum dolor sit</item_dep_code>
<item_off_code>Lorem ipsum dolor sit</item_off_code>
<item_digitized_timestamp>2013-11-04 09:07:56</item_digitized_timestamp>
<item_source_url>Loremadsa/adad1231/12312</item_source_url>
<item_cat_code>Lorem ipsum dolor sit</item_cat_code>
<item_ars_code>Lorem ipsum dolor sit</item_ars_code>
<item_ric_code>Lorem ipsum dolor sit</item_ric_code>
<item_rle_code>Lorem ipsum dolor sit</item_rle_code>
<item_code>Lorem ipsum dolor sit</item_code>
<subjects>
<sub_keyword />
</subjects>
<item_description1>A lot of text goes here</item_description1>
<item_description2>A lot of text goes here</item_description2>
<item_description3>A lot of text goes here</item_description3>
</item>


The description fields would normally be about 10 - 20.000 characters

My output would be with the new solution:

first chunk

<item>
<item_number>1231</item_number>
<item_title>Lorem ipsum dolor sit</item_title>
<item_pbl_code>Lorem ipsum dolor sit</item_pbl_code>
<item_dep_code>Lorem ipsum dolor sit</item_dep_code>
<item_off_code>Lorem ipsum dolor sit</item_off_code>
<item_digitized_timestamp>2013-11-04 09:07:56</item_digitized_timestamp>
<item_source_url>Loremadsa/adad


second chunk

1231/12312</item_source_url>
<item_cat_code>Lorem ipsum dolor sit</item_cat_code>
<item_ars_code>Lorem ipsum dolor sit</item_ars_code>
<item_ric_code>Lorem ipsum dolor sit</item_ric_code>
<item_rle_code>Lorem ipsum dolor sit</item_rle_code>
<item_code>Lorem ipsum dolor


third chunk

sit</item_code>
<subjects>
<sub_keyword />
</subjects>
<item_description1>A lot of text goes here</item_description1>
<item_description2>A lot of text goes here</item_description2>
<item_description3>A lot of text goes here</item_description3>
</item>

Answer

From your description (and your previous question), I understand that you want to perform the division by accounting for both chunk-size and chunk-contents; both restrictions cannot be brought into account simultaneously and, in any case, you have to set a preference (e.g., if the given chunk contains a character which has to be "moved" to the next chunk, but this next chunk has already the maximum size, what should be done?). I understand that this preference is defined as follows:

  • You divide the chunks on account of the size (30000) but by allowing always a big enough "redundant bit" (maximum size allowed is 32000).
  • The corrections by content (shown in the code below) are performed after the aforementioned division by size; and thus it can safely be assumed that there will be no size limitation (corrections can be applied blindly as far, in the worst scenario, there will always be a 2000 redundant length).

Code performing the corrections as described above:

string i_clob1 = "anything1 </";
string i_clob2 = "anything2 </";
string i_clob3 = "anything3 </";

string allTogether = i_clob1 + i_clob2 + i_clob3;
int start2 = i_clob1.Length;
int length2 = i_clob2.Length;
int start3 = start2 + length2;

string[] bitsToAvoid = new string[] { "</", "<", ">"};
string i_clob1_out = i_clob1;
foreach (string bit in bitsToAvoid)
{
    if (i_clob1_out.Substring(i_clob1_out.Length - bit.Length) == bit)
    {
        start2 = start2 - bit.Length;
        length2 = length2 + bit.Length;
        i_clob1_out = allTogether.Substring(0, start2);
        break; //Just one wrong bit is assumed to be present
    }
}
string i_clob2_out = allTogether.Substring(start2, length2);
foreach (string bit in bitsToAvoid)
{
    if (i_clob2_out.Substring(i_clob2_out.Length - bit.Length) == bit)
    {
        start3 = start3 - bit.Length;
        i_clob2_out = allTogether.Substring(start2, start3 - start2);
        break; //Just one wrong bit is assumed to be present
    }
}
string i_clob3_out = allTogether.Substring(start3);


i_clob1 = i_clob1_out; //"anything1 "
i_clob2 = i_clob2_out; //"</anything2" 
i_clob3 = i_clob3_out; //"</anything3 </" 

NOTE: This answer is relatively adaptable (bitsToAvoid can be updated with as many elements as required), although within certain limits: it was created after having read the original description of the problem, where a shortlisted amount of characters were referred. In case of intending to define the "bits to avoid" in a more complex way (e.g., making sure that the given closing-node-tag is present or not), this algorithm should be taken as a mere starting point and would have to be appreciably improved (most likely, by accounting for some XML analysis).