rashadb rashadb - 2 years ago 116
HTML Question

With Javascript how do you remove a tag from a reg expression that is html with multiple tags

I have a string of html that I want to deploy without the

<img />
What I have currently is:

var myHTML = "<p><img class="alignnone size-full wp-image-2857"
src="https://files.wordpress.com/2016/05/laptop.jpg?w=750&#038;h=545"
alt="https://pixabay.com/en/laptop-printer-office-folder-graph-1016257/"
width="750" height="545" /></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p> "


What I think it should look like:

var myHTML2 = "<p></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p> "


What I tried:

myHTML.replace(/<(?!\s*\/?\s*p\b)[^>]*>/gi,'')


But this strips all of the html from the string and I only want to remove the
<img />
tag.

Answer Source

Forward

It's not advisable to use a regex to parse HTML due to all the possible obscure edge cases that can crop up, but it seems that you have some control over the HTML so you should able to avoid many of the edge cases the regex police cry about.

Description

<img\s(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>

Regular expression visualization

Replace with: nothing

This regex will do the following:

  • match the entire img tag to include any sub attributes
  • avoid difficult edge cases that makes dealing with hmtl difficult

Examples

Live demo https://regex101.com/r/pG1oI7/1

Sample String

<p><img class="alignnone size-full wp-image-2857" 
src="https://files.wordpress.com/2016/05/laptop.jpg?w=750&#038;h=545" 
alt="https://pixabay.com/en/laptop-printer-office-folder-graph-1016257/" 
width="750" height="545" /></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE 
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER 
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha 
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p> 
DONE</p> 

After Replacement

<p></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE 
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER 
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha 
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p> 
DONE</p> 

Explained

NODE                     EXPLANATION
----------------------------------------------------------------------
  <img                     '<img'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ='                       '=\''
----------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    '                        '\''
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ="                       '="'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    [^'"]                    any character except: ''', '"'
----------------------------------------------------------------------
    [^\s>]*                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  >                        '>'
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download