Jeril Jeril - 1 year ago 63
Python Question

Python BeautifulSoup: Text from the html (web) page not shown while soup.find_all(..)

I was studying data scraping for, I was trying to fetch all the questions along with their options and answers. I was successful in getting the questions and the options but I was unable to fetch the answer. The answer format looks like below:

<div class="div-spacer">
<p><span class="ib-green"><b>Answer:</b></span> Option <b class="jq-hdnakqb">A</b></p>
<p><span class="ib-green"><b>Explanation:</b></span></p>
<p> No answer description available for this question. <b><a href="discussion-553">Let us discuss</a></b>. </p>

In the code

<b class="jq-hdnakqb">A</b>

for this line, the text 'A' is not getting fetched by the parser.

The IndiaBix page link is as follows:
Click here

In browser InspectElement text 'A' is visible whereas that parser is not fetching the text in beautifulSoup.

Kindly help me with this. I am new to python.

Answer Source

The problem is that the correct answers are dynamically loaded and there is JavaScript involved.

One option to approach the problem would be to use selenium browser automation package with a headless PhantomJS browser:

from selenium import webdriver
from selenium.webdriver import ActionChains
from import By
from import WebDriverWait
from import expected_conditions as EC

driver = webdriver.PhantomJS()

wait = WebDriverWait(driver, 10)

url = ''

# wait for the page to load
wait.until(EC.visibility_of_element_located((By.ID, "ib-main-bar")))

# iterate over questions
for question_block in driver.find_elements_by_css_selector(".bix-div-container"):
    question = question_block.find_element_by_css_selector(".bix-td-qtxt").text

    # iterate over options
    for answer_block in question_block.find_elements_by_css_selector(".bix-tbl-options tr"):
        number, answer = answer_block.find_elements_by_css_selector(".bix-td-option")

        print(number.text, answer.text)

    # get answer
    answer = question_block.find_element_by_css_selector(".jq-hdnakq").get_attribute("value")
    print("Correct Answer: " + answer)




The part of machine level instruction, which tells the central processor what has to be done, is
A. Operation code
B. Address
C. Locator
D. Flip-Flop
E. None of the above
Correct Answer: A
Which of the following refers to the associative memory?
A. the address of the data is generated by the CPU
B. the address of the data is supplied by the users
C. there is no need for an address i.e. the data is used as an address
D. the data are accessed sequentially
E. None of the above
Correct Answer: C
Process is
A. program in High level language kept on disk
B. contents of main memory
C. a program in execution
D. a job in secondary memory
E. None of the above
Correct Answer: C