TheMetaHorde TheMetaHorde - 2 months ago 7
Python Question

How to use python to interpret a url

I'm writing code that is attempting to extract the text from the Library of Babel.

They basically use a system of Hexes, Walls, Shelfs, Volumes and Pages to split up their library of randomly generated text files. Here is an example (https://libraryofbabel.info/book.cgi?2-w1-s2-v22:1)
Here we have Hex: 2, Wall: 1, Shelf: 2, Volume: 22, Page: 1.

I would ideally like to randomly generate a page across all these variables to extract text from, however I am not getting the output I would imagine.

Here is my code:

import requests
from bs4 import BeautifulSoup
from urlparse import urlparse
import random

hex = str(random.randint(0, 6))
wall = str(random.randint(1, 4))
shelf = str(random.randint(1, 5))
vol = str(random.randint(1, 32))
page = str(random.randint(1, 410))

print("Fetching: " + " Hex: " + hex + ", Wall: " + wall + ", Shelf: " + shelf + ", Vol: " + vol + ", Page: " + page)
babel_url = str("https://libraryofbabel.info/browse.cgi?" + hex + "-w" + wall + "-s" + shelf + "-v" + vol + ":" + page)
r = requests.get(babel_url)
soup = BeautifulSoup(r.text)
print(soup.get_text())


My output would be identical to that if I changed the url to be https://libraryofbabel.info/browse.cgi. print(babel_url) shows me that the way I wrote the url is fine but something isn't interpreting what I have written in the way I want.

I've found that just pasting https://libraryofbabel.info/book.cgi?2-w1-s2-v22:1 into chrome drops me at https://libraryofbabel.info/book.cgi. But if I navigate to https://libraryofbabel.info/book.cgi?2-w1-s2-v22:1 (or any other page) I can move between pages at will.

The only thing I get in the output worth mentioning is:


It appears your browser has javascript disabled. Follow this link to browse without javascript.

Answer

Put on you glasses :
You are requesting browse.cgi instead of book.cgi

https://libraryofbabel.info/browse.cgi?2-w2-s1-v10:72
instead of
https://libraryofbabel.info/book.cgi?2-w2-s1-v10:72

Comments