smitty smitty - 1 month ago 15
Python Question

How to get a var out of beautifulsoup soup

How would I get the variable out into like a json or any other way
to just take out like challenge

<html><body><p>var rechallengeState = {
challenge : '03AHJ_Vuv8ZHJfsjnR3ueIvm89Jfa6oUJ3-kuzA-VcQIaR30A9CZva7lMaBrYlvcGG4cOPCeKXfERQe_u-cMw_8ZVi6CipeJVAYAsrOeBHryWRCMIaMt4V-TQlTgyUA4ndejEgBGUCUw7rwM-ltDr-do8ry-MRv26qQTpS-iCtvONYc6xZBURPEaTo2Nkfq8HJeFA_g6mMLUBG',
timeout : 1800,
lang : 'en',
site : '6LceKe0SAAAAACFIVHolzCVkrGgSfzFASoOmELIc',
error_message : '',
programming_error : '',
is_incorrect : false,
rtl : false,
t1 : 'Ly93d3cuZ29vZ2xlLmNvbS9qcy90aC83ZFZqVVRQN3pmYXh1UnMtNFV3Q19KT0dvbHU4LWdXcXhHTlhNMjRoV1VZLmpz',
t2 : '',
t3 : 'NjliZ0lya2VaS3JHpaaU9sbFJuSREF6Sk9KMNZbW9tcjlvcVlEnV5Y3NlWHN2YWt6ZEdleDZQcVkyaForVXJZclVsZGN0ek5rSGZlU3BuNlkrNkE9PQ\x3d\x3d'
};

</p></body></html>


This is my Soup from beautifulsoup4.
How would I get the variable out into like a json or any other way
to just take out like challenge
Thank you very much for the help!

Answer

Description in comments

from bs4 import BeautifulSoup

html = '''<html><body><p>var rechallengeState = {
    challenge : '03AHJ_Vuv8ZHJfsjnR3ueIvm89Jfa6oUJ3-kuzA-VcQIaR30A9CZva7lMaBrYlvcGG4cOPCeKXfERQe_u-cMw_8ZVi6CipeJVAYAsrOeBHryWRCMIaMt4V-TQlTgyUA4ndejEgBGUCUw7rwM-ltDr-do8ry-MRv26qQTpS-iCtvONYc6xZBURPEaTo2Nkfq8HJeFA_g6mMLUBG',
    timeout : 1800,
    lang : 'en',
    site : '6LceKe0SAAAAACFIVHolzCVkrGgSfzFASoOmELIc',
    error_message : '',
    programming_error : '',
    is_incorrect : false,
    rtl : false,
    t1 : 'Ly93d3cuZ29vZ2xlLmNvbS9qcy90aC83ZFZqVVRQN3pmYXh1UnMtNFV3Q19KT0dvbHU4LWdXcXhHTlhNMjRoV1VZLmpz',
    t2 : '',
    t3 : 'NjliZ0lya2VaS3JHpaaU9sbFJuSREF6Sk9KMNZbW9tcjlvcVlEnV5Y3NlWHN2YWt6ZEdleDZQcVkyaForVXJZclVsZGN0ek5rSGZlU3BuNlkrNkE9PQ\x3d\x3d'
};

</p></body></html>'''

soup = BeautifulSoup(html)

# get only text - without `var rechallengeState = {` and `};`
text = soup.find('p').text.strip()[29:-3]

# split to rows, clean row and split row to columns
rows = [x.strip(',').split(':') for x in text.split('\n')]

# clean and convert to dictionary
data = {a.strip():b.strip(" '") for a,b in rows}

print(data['challenge'])

result

03AHJ_Vuv8ZHJfsjnR3ueIvm89Jfa6oUJ3-kuzA-VcQIaR30A9CZva7lMaBrYlvcGG4cOPCeKXfERQe_u-cMw_8ZVi6CipeJVAYAsrOeBHryWRCMIaMt4V-TQlTgyUA4ndejEgBGUCUw7rwM-ltDr-do8ry-MRv26qQTpS-iCtvONYc6xZBURPEaTo2Nkfq8HJeFA_g6mMLUBG
Comments