Bridget Kane Bridget Kane - 1 month ago 12
Python Question

Splitting one column into many, counting frequency: 'int' object is not iterable

This is my first question on stack overflow and it may be a bit clunky as I learn the ropes - tips or pointers in question formatting welcome!

I'm very new to python and have an issue nearly identical to the one below:

how to split one column into many columns and count the frequency

For my data, I have two columns, "logger" and "page", where logger is a column of IP addresses in non-null object string format, and page is a randomized 1-10 non-null int number that represents a webpage the logger visited. An example of this is below:

logger page
0 10.1.60.203 3
1 3.75.190.181 5
2 10.1.60.203 4
3 10.1.60.203 6
4 10.1.60.253 1


What I'd like to do is to is to have one row for each unique IP in the logger column, and to have a series of columns from 1-10 representing the total count of page views for each page for each IP address, which are then counted for each column, like below:

logger page1 page2 page3 page4 page5 ...
0 10.1.60.203 5 7 14 7 2
1 3.75.190.181 10 3 20 8 6
2 10.1.60.253 22 9 2 12 18


I've tried a lot of different options to work through this - pivot tables, groupby, but I can't seem to wrap my head around how to get the counts into their respective unique columns per IP address. When I came upon the other forum I felt that that answer should work quite well, but unfortunately I'm coming across the error that 'int' object is not iterable. Here is the code from that user that I'm currently working with:

df2 = pd.DataFrame([x for x in df['page'].apply(
... lambda item: dict(map(
... lambda x: (x,1),
... item))
... ).values]).fillna(0)
>>> df2.join(df)


I can somewhat grasp what the aforementioned error means but am not confident in ability to work out the answer from there. Any help with this error or particular, or with a more broad solution to my problem, would be greatly, greatly appreciated.

Thank you!

Answer

Is that what you want?

In [8]: df
Out[8]:
         logger  page
0   10.1.60.203     3
1  3.75.190.181     5
2   10.1.60.203     4
3   10.1.60.203     6
4   10.1.60.253     1

In [9]: df.pivot_table(index='logger', columns='page', aggfunc='size', fill_value=0)
Out[9]:
page          1  3  4  5  6
logger
10.1.60.203   0  1  1  0  1
10.1.60.253   1  0  0  0  0
3.75.190.181  0  0  0  1  0