WKK WKK - 4 years ago 88
Python Question

Count how many times each address appear in data file using python

I'd like to count how many times each address appear in the data file using python.
Address range is not fixed which means that address range is different from each data files.
Some address between min and max are not appear at all.
(2nd column is address.)

To solve this efficiently, how can I approach?
I didn't know which data structure would be OK and which function will help to it?

I just tried to use large array which index means address. Read the data file and plus 1 to array[address]. This is poor code.

Added :
I tried to pieces_write[1].value_counts(), the results is

print(pieces_write[1].value_counts())
AttributeError: 'list' object has no attribute 'value_counts'


Example of DATA FILE ( 2nd column is address)

0 303567 3584 Write 0.000000
1 55590 3072 Write 0.000000
0 303574 3584 Write 0.026214
1 240840 3072 Write 0.026214
1 55596 3072 Read 0.078643
0 303581 3584 Write 0.117964
1 55596 3072 Write 0.117964
0 303588 3584 Write 0.530841
1 55596 3072 Write 0.530841
0 303595 3584 Write 0.550502
1 240840 3072 Write 0.550502
1 55602 3072 Read 0.602931
0 303602 3584 Write 0.648806
1 55602 3072 Write 0.648806
0 303609 3584 Write 0.910950
1 55602 3072 Write 0.910950
0 303616 3584 Write 0.930611
1 240840 3072 Write 0.930611
1 55608 3072 Read 0.983040
0 303623 3584 Write 1.028915
1 55608 3072 Write 1.028915
0 303630 3584 Write 1.330380
1 55608 3072 Write 1.330380


CODE for Data file read

for line in open(datafile):
line_data = line.split()
if int(line_data[1]) < 6000000:
if line_data[3] == 'Read':
pieces_read.append(line_data)
x_read.append(count)
else:
pieces_write.append(line_data)
x_write.append(count)
x_tot.append(count)
pieces_tot.append(line_data)
count += 1

Answer Source

You could use collections.Counter:

from collections import Counter 

words = []

for line in open('data.txt'):
  # Your logic here
  words.append(line.split()[1])

words_dict = Counter(words)

for key, value in words_dict.items():
  print(key, value)

Output:

303574 1
55596 3
303630 1
303567 1
303595 1
303616 1
240840 3
303588 1
55590 1
303623 1
303602 1
303581 1
55608 3
303609 1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download