Josh Habdas Josh Habdas - 3 years ago 154
Python Question

Normalize whitespace with Python

I'm building a data extract using scrapy and want to normalize a raw string pulled out of an HTML document. Here's an example string:

Sapphire RX460 OC 2/4GB


Notice two groups of two whitespaces preceeding the string literal and between
OC
and
2
.

Python provides trim as described in How do I trim whitespace with Python? But that won't handle the two spaces between
OC
and
2
, which I need collapsed into a single space.

I've tried using
normalize-space()
from XPath while extracting data with my scrapy Selector and that works but the assignment verbose with strong rightward drift:

product_title = product.css('h3').xpath('normalize-space((text()))').extract_first()


Is there an elegant way to normalize whitespace using Python? If not a one-liner, is there a way I can break the above line into something easier to read without throwing an indentation error, e.g.

product_title = product.css('h3')
.xpath('normalize-space((text()))')
.extract_first()

Answer Source

You can use:

" ".join(s.split())

where s is your string.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download