iamsaksham iamsaksham - 5 months ago 28
HTML Question

ReactJS: Remove all html ags except some specific tags from Contenteditable div

I have a

contenteditable div
and I want that if someone pastes some content into the
contenteditable div
, then the HTML formatting and tags get stripped of and becomes plain text.

But while doing so, I dont want some specific img tags to be removed. I have list of those tags that I dont want to remove.
I came up with this, but this deleted my specific img tags also.

var html = ReactDOM.findDOMNode(this).innerHTML;

var initialBreaks = /^([^<]+)(?:<div[^>]*><br[^>]*><\/div><div[^>]*>|<p[^>]*><br[^>]*><\/p><p[^>]*>)/
var initialBreak = /^([^<]+)(?:<div[^>]*>|<p[^>]*>)/
var wrappedBreaks = /<p[^>]*><br[^>]*><\/p>|<div[^>]*><br[^>]*><\/div>/g
var openBreaks = /<(?:p|div)[^>]*>/g
var breaks = /<br[^>]*><\/(?:p|div)>|<br[^>]*>|<\/(?:p|div)>/g
var allTags = /<\/?[^>]+>/g
var newlines = /\r\n|\n|\r/g

html = html.replace(initialBreaks, '$1\n\n')
.replace(initialBreak, '$1\n')
.replace(wrappedBreaks, '\n')
.replace(openBreaks, '')
.replace(breaks, '\n')
.replace(allTags, '')
.replace(newlines, '<br>')


The
.replace(allTags, '')
replaces everything. Need it to save my specific img tags

Answer

Description

<\/?(?!img)[a-z]+(?=[\s>])(?:[^>=]|=(?:'[^']*'|"[^"]*"|[^'"\s]*))*\s?\/?>

Replace With: nothing

Regular expression visualization

This regular expression will do the following:

  • find all open and close html tags
  • ignore img tags
  • avoids difficult edge cases that makes pattern matching in HTML difficult

Example

Live Demo

https://regex101.com/r/sI2nO0/3

Sample text

Note the difficult edge case nested inside the first anchor tag.

<span><a onmouseover=' if ( 3 > a ) { var 
string=" <img src=NotTheDroidYouAreLookingFor.jpg>; "; } '
 href="link.html">This is a droid I'm looking 
for: <img src=DesiredDroids.png></a>
</span>

After Replacement

This is a droid I'm looking 
for: <img src=DesiredDroids.png>

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    img                      'img'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  [a-z]+                   any character of: 'a' to 'z' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [\s>]                    any character of: whitespace (\n, \r,
                             \t, \f, and " "), '>'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      '                        '\''
----------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      '                        '\''
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      "                        '"'
----------------------------------------------------------------------
      [^"]*                    any character except: '"' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      "                        '"'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [^'"\s]*                 any character except: ''', '"',
                               whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >                        '>'
----------------------------------------------------------------------
Comments