fuhrerguxez fuhrerguxez - 8 days ago 6
Python Question

Python xpath - get information in the right order

first off, I'm sorry if the title is not very clear; I'm not too sure on how to explain what I want to do with the title; anyway.

I'm getting some information from a website; I already have the information that I want but when I run the script I get the output as follows:

Ivern Jungle
Starting Items
Hunter's Talisman
Refillable Potion
Warding Totem
First Goal


Stalker's Blade
Tracker's Knife
Boots of Speed
Hunter's Potion
Vision Ward
Sweeping Lens
Second Goal


When I want it to be like this:

Ivern Jungle

Starting Items
Hunter's Talisman
Refillable Potion
Warding Totem


First Goal
Stalker's Blade
Tracker's Knife
Boots of Speed
Hunter's Potion
Vision Ward
Sweeping Lens
Second Goal


I've tried some things with the code; and this is the only way that I can get it working as I want.
Ivern jungle
is a title;
Starting Items
is another title and
First Goal
another one; before I was getting first the titles and then the other information (the items). This is the code that I have right now.

for build_names in guide_page.xpath(".//div[@class='build-container box-shadow-lb']"
"/div[1]/div[1]/div[1]/div[1]/div[1]"):

for title in build_names.xpath("div[1]/h2/text() | div[3]/div[1]/div/h2/text() | "
"div[3]/div[1]/div/div/div/a/div[2]/span/text()"):
print(title)


I'm getting most of the information from the
title
for loop because that's how I managed to get it right; if there is a more efficient way to do it; please let me know

By the way, that information is from a specific website but websites can change, from another specific website I get information like this:

Kled The Talker # Title
Kled Tank/Ad Top # Title
Mercury's Treads
The Black Cleaver
Titanic Hydra
Frozen Mallet
Dead Man's Plate
Guardian Angel
Kled Ad/LifeSteal # Title
Mercury's Treads
The Black Cleaver
Ravenous Hydra
Death's Dance
Maw of Malmortius
Guardian Angel


As you can see I don't get any spaces in between; if you go to the first website you can see that in the items section there are notes on the right side of each title from the items section that; I think that those are the ones that put the spaces in the output because in the second website there are no notes. Well, that is my main issue; how can I format the output? If I didn't explain myself too clear please let me know and I'll update the question, thanks! :)

Answer

You could navigate the tree quite a bit easier by using the class attributes more often. That way, you could rewrite your script like this:

for div in page.xpath('//div[contains(@class, "item-wrap")]'):
    print("\n{bar}\n{title}\n{bar}".format(
        bar="#"*20, 
        title=div.xpath('.//h2/text()')[0].strip()))
    print('\n'.join(x.strip() for x in div.xpath(
        './/div[contains(@class, "main-items")]//span/text()')))

Output excerpt:

####################
Starting Items
####################
Hunter's Talisman
Refillable Potion
Warding Totem

####################
First Goal
####################
Stalker's Blade
Tracker's Knife
Boots of Speed
Hunter's Potion
Vision Ward
Sweeping Lens

####################
Second Goal
####################
Rod of Ages
Boots of Mobility
Ionian Boots of Lucidity
Boots of Swiftness
Sorcerer's Shoes
Oracle Alteration

Those xpaths work equally well on the second page you linked to.

Comments