Python Question

Different length of variables obtained from web scraping

After trying to scrape data and allocate Into different variables, I have gotten different length of variables. My question is how come this is the result, how to solve it and how to combine them into a data frame?

page = 20
while page <= 1000:
r = requests.get('website' + str(page))
soup = BeautifulSoup(r.text)
usertitle.extend([x.get_text().strip() for x in soup.find_all("span", attrs={"class": "cmp-reviewer"})])
datepost.extend([x.get_text() for x in soup.find_all("span", attrs={"class": "cmp-review-date-created"})])
comment.extend([x.get_text().strip() for x in soup.find_all("span", attrs={"class": "cmp-review-text"})])
page += 20

Answer Source

You have not put enough information for anyone to give you an answer to your question. The biggest issue is that no one has any real way to visualize your output. Are you ultimately getting a three tuples that have a different number of elements in each tuple and wanting to pair them appropriately into a pandas Dataframe?

If this is the case the way to do this might be to change your approach. Depending on how each page is formatted you might be able to use findNext to iterate through the items in your soup and add each one to a Dataframe as you go.

