captainduh captainduh - 1 month ago 8
HTML Question

Javascript Regular Expression for removing text

I want to replace a string of characters in an html tag using JavaScript. So in this example I want to remove everything between the

<table
and
<tbody>
. I'm using the replace function and a regular expression. The regular expression construction must be wrong somewhere. Here is what I currently have:

str = str.replace(/([<table]\w*\W*[<tbody>])/, "");


The regular expression logic as I see it is like this (correct me where I'm wrong):


  1. I'm looking for the string match of
    <table
    so I put that string in the brackets as I want that to match exactly as written.

  2. Then I place a \w*\W* because I expect 1 or more of both
    alphanumeric and non alphanumeric characters to follow.

  3. Finally I
    place the "< tbody>" in the brackets because I expect that format
    exactly.



So the results are not as I expected. There is no other
<tbody>
or
<table
in my string so I don't know what I'm doing wrong.

This is what the string looks like before I replace the characters with nothing.

"\n\t\t\t\t\t\t\n <div>\n\t\t\t\t\t\t\t
<table id=\"gvStation_ctl19_gvExtRows\" style=\"border-collapse: collapse;\" border=\"1\" rules=\"all\" cellspacing=\"0\">
\n\t\t\t\t\t\t\t\t<tbody>

Answer
  1. The brackets find any character between in any order so you don't need it in this case. See http://www.w3schools.com/jsref/jsref_obj_regexp.asp.
  2. \w* and \W* don't match the whitespaces.

Here is the solution : /<\s*table(?:.|\s)*<\s*tbody\s*>/i

var str = '"\n\t\t\t\t\t\t\n < div>\n\t\t\t\t\t\t\t < table id=\"gvStation_ctl19_gvExtRows\" style=\"border-collapse: collapse;\" border=\"1\" rules=\"all\" cellspacing=\"0\"> \n\t\t\t\t\t\t\t\t< tbody>';

str = str.replace(/<\s*table(?:.|\s)*<\s*tbody\s*>/i, "");

alert(str);

Comments