ll55 ll55 - 2 months ago 12
HTML Question

notepad++ - delete attributes in HTML start tags with regex

Solution:

Find:

<([a-z]+) .?=".?( */?>)


Replace with:
<\1$2





I usually copy tables from forum sites to blog sites.

I want no attribute in all start tags.

The tables are like this:

1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".


I found two answers but they do not seem to be helpful.

[A1]: It uses a fixed condition to find and replace specific terms. But in my situation, start tags are everywhere and vary with the article.

[A2]: I tried this answer but it is not working as follows.

I find
<([a-z]+) .*=".*">
and replace with
<\1>
.

Line 1 and 2 works but line 3 and 4 messed up.

How should I use regex?

EDIT:

<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>

Answer

Your .* is greedy so it matches everything until the last "> on your line. Here's what your first regex does:

https://regex101.com/r/qK5uY3/1

Try:

<([a-z]+) .*?=".*? *\/?>

I'd recommend looking at plugins for notepad++. There can be many issues using a regex to parse HTML.

https://regex101.com/r/qK5uY3/2

The *\/? before the closing > is matching optional whitespace and a self closing element. The \h I prefer to use but I don't know if Notepad++ supports that (I'm mac'er).

Update:

To capture the closing bit of the self closing element group the full closing part.

<([a-z]+) .*?=".*?( *\/?>)

then replace with the 2nd captured group.

<\1$2

Demo: https://regex101.com/r/qK5uY3/3

Comments