root root - 4 months ago 20
CSS Question

BeautifulSoup: get css classes from html

Is there a way to get CSS classes from a HTML file using BeautifulSoup? Example snippet:

<style type="text/css">

p.c3 {text-align: justify}

p.c2 {text-align: left}

p.c1 {text-align: center}


Perfect output would be:

cssdict = {
'p.c3': {'text-align':'justify'},
'p.c2': {'text-align:'left'},

although something like this would do:

L = [
('p.c3', {'text-align': 'justify'}),
('p.c2', {'text-align': 'left'}),
('p.c'1, {'text-align': 'center'})


BeautifulSoup itself doesn't parse CSS style declarations at all, but you can extract such sections then parse them with a dedicated CSS parser.

Depending on your needs, there are several CSS parsers available for python; I'd pick cssutils (requires python 2.5 or up (including python 3)), it is the most complete in it's support, and supports inline styles too.

Other options are css-py and tinycss.

To grab and parse such all style sections (example with cssutils):

import cssutils
sheets = []
for styletag in tree.findAll('style', type='text/css')
    if not styletag.string: # probably an external sheet

With cssutil you can then combine these, resolve imports, and even have it fetch external stylesheets.