Joao Joao - 11 days ago 6
HTML Question

bash: get content between a pair of HTML tags

I need to get the HTML contents between a pair of given tags using a bash script.
As an example, having the HTML code below:

<html>
<head>
</head>
<body>
text
<div>
text2
<div>
text3
</div>
</div>
</body>
</html>


Using the bash command/script, given the body tag, we would get:

text
<div>
text2
<div>
text3
</div>
</div>


Thanks in advance.

Answer

plain text processing is not good for html/xml parsing. I hope this could give you some idea:

kent$  xmllint --xpath "//body" f.html 
<body>
 text
  <div>
  text2
    <div>
        text3
    </div>
  </div>
</body>
Comments