Pavel Pavel - 5 months ago 23
Python Question

Find and extract curr_id number from Investing

I need to know the

curr_id
to submit using python to investing.com and extract historic data for a number of currencies/commodities. To do this I need the
curr_id
number. As in the example bellow. I'm able to extract all scripts. But then I cannot figure out how to find the correct script index that contains
curr_id
and extract the digits '2103'. Example: I need the code to find 2103.

import requests
from bs4 import BeautifulSoup

url = 'http://www.investing.com/currencies/usd-brl-historical-data'
r = requests.get(url)

#URL
url='http://www.investing.com/currencies/usd-brl-historical-data'
#OPEN URL
r = requests.get(url)
#DETERMINE FORMAT
soup=BeautifulSoup(r.content,'html.parser')

#FIND TABLE WITH VALUES IN soup
curr_data = soup.find_all('script', {'type':'text/javascript'})'


UPDATE
I did it like this:
g_data_string=str(g_data)

if 'curr_id' in g_data_string:
print('success')

start = g_data_string.find('curr_id') + 9
end = g_data_string.find('curr_id')+13

print(g_data_string[start:end])


But I`m sure there is a better way to do it.

Answer

You can use a regular expression pattern as a text argument to find a specific script element. Then, search inside the text of the script using the same regular expression:

import re

import requests
from bs4 import BeautifulSoup

url = 'http://www.investing.com/currencies/usd-brl-historical-data'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

pattern = re.compile(r"curr_id: (\d+)")
script = soup.find('script', text=pattern)

match = pattern.search(script.text)
if match:
    print(match.group(1))

Prints 2103.

Here (\d+) is a capturing group that would match one or more digits.