How to preserve the mutual arrangement of html <b> and <center> tags when lxml.html.fromstring is used


function parses the combination of html
tags in a strange way:


Please notice that
was moved out of

The question is how to preserve the layout and span of the pair of
tags the same as in the initial text?

FYI. If you swap the application of tags


you'll have the the correct result:

I use Python 2.7.9 and lxml 3.4.2.

Answer Source

Because your original code is not actually valid HTML.

<center> is a block-level element, and <b> is an inline element. Inline elements cannot contain block elements. lxml is doing its best to interpret the code as valid HTML.

Note also that center has been deprecated anyway since HTML4; you really shouldn't be using it.

