user1972601 user1972601 - 1 month ago 5
Python Question

How to loop through a list and then swap digits for other instances of the loop to see?

I have an XML Document with a structure like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://www.website.com/</loc>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://www.website.com/location/</loc>
<lastmod>2016-10-13T06:03:41Z</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:loc>https://website.com/image/</image:loc>
<image:title>Title of Item</image:title>
</image:image>
</url>
<url>
<loc>https://www.website.com/location/</loc>
<lastmod>2016-09-15T07:11:22Z</lastmod>
<changefreq>daily</changefreq>
<image:image>
<image:loc>https://website.com/image/</image:loc>
<image:title>Title of Item</image:title>
</image:image>
</url>
</urlset>


I want to see which tag is the youngest using the tab. I have used this to get the date broken down to see if one year is newer than the next year... etc. But, it doesn't work because every time I iterate to a different node the for loop "forgets" and doesn't save which date is the newest which makes it return the date from the last loop iterated, not the newest date.

I have tried everything based on variables, even thinking that getter and setter methods would work, but the values aren't updated.

tree = get_xml_data(line)
to_log(tree)
for child in tree:
if child.tag.endswith("url"):
for c in child:
if c.tag.endswith("lastmod"):
xml_date = c.text
year = ""
month = ""
day = ""
hour = ""
minute = ""
second = ""
for i in range(4):
year += str(xml_date[i])
for i in range(5, 7):
month += str(xml_date[i])
for i in range(8, 10):
day += str(xml_date[i])
for i in range(11, 13):
hour += str(xml_date[i])
for i in range(14, 16):
minute += str(xml_date[i])
for i in range(17, 19):
second += str(xml_date[i])
if year > nt.get_year():
nt.set_year(int(year))
if month > nt.get_month():
nt.set_month(int(month))
if day > nt.get_day():
nt.set_day(int(day))
if hour > nt.get_hour():
nt.set_hour(int(hour))
if minute > nt.get_minute():
nt.set_minute(int(minute))
if second > nt.get_second():
nt.set_second(int(second))

to_log("Addition:", year, month, day, hour, minute, second)
to_log("Newest addition:", nt.get_year(), nt.get_month(), nt.get_day())
to_log("Newest addition (cont.):", nt.get_hour(), nt.get_minute(), nt.get_second())


Outputs (for example, the first addition should be the newest date):

2016-10-18 19:25:20.332031 Addition: 2016 10 05 06 21 05
2016-10-18 19:25:20.332083 Addition: 2016 07 30 01 27 21
2016-10-18 19:25:20.332134 Addition: 2016 09 19 17 48 45
2016-10-18 19:25:20.332186 Addition: 2016 09 19 17 48 52
2016-10-18 19:25:20.332235 Newest addition: 2016 9 19
2016-10-18 19:25:20.332268 Newest addition (cont.): 17 48 52

Answer

This version remembers the newest addition date (and time):

import jdcal

def julian(y, m, d, h, mi, s):
    return sum(jdcal.gcal2jd(y, m, d)) + (h-12.0)/24 + mi/1440.0 + s/86400.0


tree = get_xml_data(line)
    to_log(tree)
    julNewest = 0.0                                                         # establish a comparison value for the newest addition
    for child in tree:
        if child.tag.endswith("url"):
            for c in child:
                    if c.tag.endswith("lastmod"):
                        xml_date = c.text
                        year = xml_date[0:4]
                        month = xml_date[5:7]
                        day = xml_date[8:10]
                        hour = xml_date[11:13]
                        minute = xml_date[14:16]
                        second = xml_date[17:19]
                        julDay = julian(year, month, day, hour, minute, second) # calculate Julian day number of recent addition
                        if julDay > julNewest:
                            nt.set_year(int(year))
                            nt.set_month(int(month))
                            nt.set_day(int(day))
                            nt.set_hour(int(hour))
                            nt.set_minute(int(minute))
                            nt.set_second(int(second))
                            julNewest = julDay

                        to_log("Addition:", year, month, day, hour, minute, second)
        to_log("Newest addition:", nt.get_year(), nt.get_month(), nt.get_day())
        to_log("Newest addition (cont.):", nt.get_hour(), nt.get_minute(), nt.get_second())`

You first have to import the module jdcal (if not installed, install it with "pip install jdcal"). The function that is defined then allows you to represent any date as a unique (float) number. It is much easier to compare these single numbers to other date-converted numbers to see which one is higher (more recent, newer).

Note that I also simplified and sped up your code that constructs year, month, day information.

Hope this helps.

Regards,

Comments