Mornor Mornor - 2 months ago 21
CSS Question

BeautifulSoup use unique CSS Selector

From this page, I need to get the status from "Anbindung an das Telefonnetz".

I identified 2 ways to get it:


  1. If the status contains the sentence "Das System arbeitet einwandfrei";

  2. If the color of the background is green.



I have chosen to go with the first option.

I use Python/BeautifulSoup to scrape the page. The thing is, there is no unique id/class or whatsoever to get this element.

I then decided to use the CSS selector of this particular element, which is the following:

div.system-item:nth-child(2) > div:nth-child(1) > p:nth-child(3)


And use it like this:

print(page.select("div.system-item:nth-child(2) > div:nth-child(1) > p:nth-child(3)"))


However, the only thing I get is an empty element (
[]
).

What could I try more to get this particular element?

EDIT

As some of you recommend it, here is the un-complete HTML source of the page.
. But to be practical, i recommend you to just take a look yourself at the page

<!doctype html>
<head>
<meta charset="utf-8">

<title>Aktueller Status | Placetel</title>

<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="msvalidate.01" content="756F6E40DD887A659CE83E5A92FFBB62">
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<meta name="generator" content="Kirby 2.3.2">

<meta name="description" content="Placetel Systemstatus: Erfahren Sie mehr &uuml;ber den aktuellen Status der Placetel Telefonanlage.">
<meta name="keywords" content="">

<meta name="robots" content="index,follow,noodp,noydir">

<link rel="canonical" href="https://www.placetel.de/status">
<link rel="publisher" href="https://plus.google.com/b/111027512373770716962/111027512373770716962/posts">

<link rel="shortcut icon" href="/favicon.ico">
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<meta name="msapplication-TileColor" content="#0e70b9">
<meta name="msapplication-TileImage" content="/ms-tile-icon.png">
<meta name="theme-color" content="#0e70b9">

<script src="//use.typekit.net/rnw8lad.js"></script>
<script>try { Typekit.load({ async: true }); } catch (e) {}</script>

<link rel="stylesheet" href="https://www.placetel.de/assets/dist/css/main.css"> <script src="https://www.placetel.de/assets/dist/js/modernizr.js"></script>
<link rel="dns-prefetch" href="//app.marketizator.com"/>
<script>
var _mktz = _mktz || [];
_mktz.cc_domain = 'placetel.de';
</script>
<script type="text/javascript" src="//d2tgfbvjf3q6hn.cloudfront.net/js/o17fe41.js"></script>
</head>
<body id="

vrs vrs
Answer

As far as I know nth-of-child is still not implemented in BeautifulSoup4. Also if you investigate the website's CSS (namely _system.scss file), you'll find out that there are 3 statuses:

  1. system-item-green
  2. system-item-yellow
  3. system-item-red

So you may want to slightly change your code to be like this:

import requests
from bs4 import BeautifulSoup as BS

url = 'https://www.placetel.de/status'
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'
}
source = requests.get(url, headers=headers)
soup = BS(source.text, 'html.parser')

status = soup.select("div.system-item")[1].attrs['class']

if 'system-item-green' in status:
     print("It works!")
elif 'system-item-yellow' in status:
     print("Something's slightly wrong")
elif 'system-item-red' in status:
     print("Does not work")
else:
     print("Has someone changed page's markup?")