I would like to retrieve data given in a SDMX file (like https://www.bundesbank.de/cae/servlet/StatisticDownload?tsId=BBK01.ST0304&its_fileFormat=sdmx&mode=its). I tried to use BeautifulSoup, but it seems, it does not see the tags. In the following the code
from bs4 import BeautifulSoup
url = "https://www.bundesbank.de/cae/servlet/StatisticDownload?tsId=BBK01.ST0304&its_fileFormat=sdmx"
html_source = urllib2.urlopen(url).read()
soup = BeautifulSoup(html_source, 'lxml')
ts_series = soup.findAll("bbk:Series")
soup.findAll("bbk:series") would return the result.
In fact, in this case, even you use
lxml as the parser, BeautifulSoup still parse it as html, since html tags are case insensetive, BeautifulSoup downcases all the tags, thus
soup.findAll("bbk:series") works. See Other parser problems from the official doc.
If you want to parse it as
soup = BeautifulSoup(html_source, 'xml') instead. It also uses
lxml is the only
xml parser BeautifulSoup has. Now you can use
ts_series = soup.findAll("Series") to get the result as beautifulSoup will strip the namespace part