John Swayne John Swayne - 2 months ago 12
Python Question

Changing My Output of Code - Python

I want to change the output of my code. I have a code like this :

from collections import defaultdict

third = defaultdict(lambda: (defaultdict(lambda : defaultdict(int))))

count = 0

fh = open("C:/Users/mycomp/desktop/data.txt", "r").readlines()

for line in fh:
line_split = line.split();

date = line_split[0];
time = line_split[1];
ip = line_split[2];

third [date][time][ip]+= 1

for date, d in third.iteritems():
for time , count in d.iteritems():
print "%s %s %s %s" % (date, time, count,ip)

Log file like this :

2016-11-04 00:00:12
2016-11-05 00:00:15
2016-11-06 00:00:19

My output is like below.

2016-10-04 07:46 defaultdict(<type 'int'>, {'': 574})
2016-10-04 15:58 defaultdict(<type 'int'>, {'': 364})
2016-10-04 15:59 defaultdict(<type 'int'>, {'': 359})
2016-10-04 07:42 defaultdict(<type 'int'>, {'': 287})
2016-10-04 07:43 defaultdict(<type 'int'>, {'': 337})

but ı want an output like this :

2016-10-04 07:46 {'': 574})
2016-10-04 15:58 {'': 364})
2016-10-04 15:59 {'': 359})
2016-10-04 07:42 {'': 287})
2016-10-04 07:43 {'': 337})


Your dictionary has three levels, so each value has three keys you need to get to it (date, time and IP). Your output code loops over the first two, but there's no loop over the IPs, so you get a dictionary instead.

I suspect you want something like this, with three loops:

for date, x in third.iteritems():
    for time, y in x.iteritems():
        for ip, count in y.itertiems():
            print "%s %s %s %s" % (date, time, count, ip)

If you really do want all the data from a single date and time to be printed on a single line (even if there are multiple IPs involved), you could, I suppose, just change your print statement so it looks nicer. The count value you're getting in your current code is one of the innermost defaultdicts that maps from IP address to count. You can convert that to a regular dict if you want and include it in your print call:

for date, d in third.iteritems():
    for time, ip_count in d.iteritems():
        print "%s %s %s" % (date, time, dict(ip_count))

Note that there are only three things being formatted (the IPs and counts are part of the same object). The ip parameter you had in your code didn't actually work properly, since it wasn't being set in your two levels of loops. You were in fact printing out the last IP address you used when filling the dictionary (so the one on the last line of your input file). Unlike your example output, I suspect it would not match the contents of the inner dictionary you printed.

Note that both versions of the code above will print your data in a mostly arbitrary order. All lines for the same day will print together (and all lines for the same time within a day), but outside of those groupings, the values will be in arbitrary order. You may want to use sorted to put your data in a useful order:

import operator
keyfunc = operator.itemgetter(0)

for date, x in sorted(third.iteritems(), key=keyfunc):
    for time, y in sorted(x.iteritems(), key=keyfunc):
        for ip, count in sorted(y.itertiems(), key=keyfunc):
            print "%s %s %s %s" % (date, time, count, ip)

I'd also consider using a less nested data structure, such as a dictionary keyed with date, time, ip tuples.