Aaron Caudwell Aaron Caudwell - 2 years ago 75
Python Question

BeautifulSoup returning empty list

I’m trying to create a script where I can parse the source code from https://www.youtube.com/feed/subscriptions and retreieve the URLs of the videos in my subscription feed, in order to stick them in a MP4 download and save to my FTP server.

However I have been stuck on this problem for a couple of hours.

import bs4
import requests
source = requests.get('https://www.youtube.com/feed/subscriptions')
sourceSoup = bs4.BeautifulSoup(source.text,'html.parser')
sourceSoup.select('#grid-319397 > li:nth-child(1) > div > div.yt-lockup-dismissable > div.yt-lockup-content > h3')

I am right clicking on the css element and ‘inspect element’ then ‘copy selector’ and pasting it inside the select method.
As you can see, it keeps returning an empty list.

I have tried many different derivatives of this, but it’s not picking up anything. I am having the same problem when doing the same things on the homepage, therefore I doubt that it is because it is behind a login (although I am logged in on the PC in which the script is running).
Can someone please point in the right direction?

Answer Source

You are facing 2 different (but somehow related) issues:

  1. The page that the server returns to the GET reguest that is being sent by your code might be different from the page that you recieve when you go to the page with your browser, because of an unknown user-agent to the server.

  2. The item you're looking for is only visible after you log-in.

Now, instead of manually taking care for both of these issues, there is a YouTube API that you should be considering to use.

A demo code showing that we get a different page for different user-agents:

import requests

python_user_agent_request = requests.get('http://www.youtube.com')
chrome_user_agent_request = requests.get('http://www.youtube.com',
                                         headers={'user-agent':'''Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
                                                               (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'''})

>>  python-requests/2.7.0 CPython/3.4.2 Windows/7

>> Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36

# .text holds the HTML page source
print(python_user_agent_request.text == chrome_user_agent_request.text)
>> False
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download