Deep Value Deep Value - 7 months ago 23
Python Question

Using Regular Expressions With Python to Get Value Buried in HTML5

I'm trying to use BeautifulSoup and RE to get a specific value from Yahoo Finance. I can't figure out exactly how to get it. I'll paste some code I have along with the HTML and unique selector I got.

I just want this number in here, "7.58," but the problem is that the class of this column is the same as many other ones in the same element.

<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><td class="yfnc_tabledata1">7.58</td>"


Here is the selector Google gave me...

yfncsumtab > tbody > tr:nth-child(2) > td.yfnc_modtitlew1 > table:nth-child(10) > tbody > tr > td > table > tbody > tr:nth-child(8) > td.yfnc_tabledata1

Here is some template code I'm using to test different things, but I'm very new to regular expressions and can't find a way to extract that number after "Diluted EPS (ttm):###

from bs4 import BeautifulSoup
import requests
import re


sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

body = soup.findAll('td')


print (body)


Thanks!

ccf ccf
Answer

If using regex, please try:

>>> import re
>>> text = '<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><
td class="yfnc_tabledata1">7.58</td>"'
>>> re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', text)
['7.58']

UPDATE Here is the sample code using requests and re:

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text)

Output:

[u'7.58']