BitFlow BitFlow - 7 months ago 8
Python Question

regex to find match in element of list

I'm new to Python and have complied a list of items from a file that has the an element which appeared in the file and its frequency in the file like this

('95.108.240.252', 9)


its mostly IP addresses I'm gathering. I'd like to output the address and frequency like this instead

IP Frequency
95.108.240.252 9


I'm trying to do this by regexing the list item and printing that but it returns the following error when I try
TypeError: expected string or bytes-like object


This is the code I'm using to do all the now:

ips = [] # IP address list
for line in f:
match = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line) # Get all IPs line by line
if match:
ips.append(match.group()) # if found add to list

from collections import defaultdict
freq = defaultdict( int )
for i in ips:
freq[i] += 1 # get frequency of IPs

print("IP\t\t Frequency") # Print header

freqsort = sorted(freq.items(), reverse = True, key=lambda item: item[1]) # sort in descending frequency
for c in range(0,4): # print the 4 most frequent IPs
# print(freqsort[c]) # This line prints the item like ('95.108.240.252', 9)
m1 = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", freqsort[c]) # This is the line returning errors - trying to parse IP on its own from the list
print(m1.group()) # Then print it


Not trying to even parse the frequency yet, just wanted the IPs as a starting point

Answer

The second parameter in re.search() should be string and you are passing tuple. So it is generating an error saying that it expected string or buffer.

NOTE:- Also you need to make sure that there at least 4 elements for IP address, otherwise there will be index out of bounds error

Delete the last two lines and use this instead

print(freqsort[c][0])

If you want to stick to your format you can use the following but it is of no use

m1 = re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", freqsort[c][0]) # This is the line returning errors - trying to parse IP on its own from the list
print(m1.group())