MolC MolC - 3 months ago 11
C# Question

Prevent entry of punctuation marks or other symbols from textbox, join divided word and remove extra white-space

.I’m new with C#, so I have to figure out how to prevent entry of any punctuation marks or any other symbols from textbox to write it into the text document, allow only one white-space, characters and numbers. If punctuation mark is between two attached characters without white-space, then I want remove it and join this place, but if punctuation mark is somewhere between two characters with one or several white-spaces or if just several white-spaces I want remove mark(s) and keep only one white-space between words. And write everything in lower case.

So as I understand I need here following things, but have no idea how to do it:


  1. Remove everything except character and numbers.

  2. Keep only one white-space between attached characters and remove all extra white-spaces.

  3. Join parts with characters if mark was between without white-space.

  4. Write everything in lowercase.

  5. And take into consideration that I used here
    String.TrimEnd()
    function to remove white-space at the end of each new line before write it into the file.



So for example if textbox content is:

Hello, off world 1!...


Or with several white-spaces:

hello off world? 1 --...


Or with several whitespaces with marks inside or marks attached to end or beginning of word:

*_Hello , OFF,%)- >world? 1


Or marks between attached characters in word:

H,e.*l+l-o .of!f wo/rld? 1


In result I want get in text document only:

hello off world 1

Answer

I assume you are checking text in a multiline TextBox control on-the-fly, i.e. as the user types or pastes text into it.

To remove unacceptable characters from the text box you may use Regex.Replace() from TextBox.TextChanged() event:

textBox1.Text = Regex.Replace(
    textBox1.Text,
    @"(\r\n)|[^a-z0-9\s]*([^\S\r\n])?[^a-z0-9\r\n]*",
    "$1$2",
    RegexOptions.IgnoreCase
).ToLower();

But this will reset caret position which would confuse the user quite a lot.

In order to calculate new caret position we may use the Regex.Replace() what takes MatchEvaluator parameter. The function in this parameter may analyse regex match positions and length and adjust the caret position accordingly:

bool updating = false;
private void textBox1_TextChanged(object sender, EventArgs e)
{
    if (updating)
        return;
    updating = true;

    int caretPos = textBox1.SelectionStart;
    int caretPosShift = 0;
    textBox1.Text = Regex.Replace(
        textBox1.Text,
        @"(\r\n)|[^a-z0-9\s]*([^\S\r\n])?[^a-z0-9\r\n]*", 
        (m) => {
            string replacement = m.Groups[1].Value + m.Groups[2].Value;
            if (caretPos > m.Index + m.Value.Length)
                caretPosShift += m.Value.Length - replacement.Length;
            else if (caretPos >= m.Index)
                caretPosShift += caretPos - m.Index - replacement.Length;
            return replacement;
        }, 
        RegexOptions.IgnoreCase
    ).ToLower();
    textBox1.SelectionStart = caretPos - caretPosShift;

    updating = false;
}