Ahmed Ayman Ibrahim Ahmed Ayman Ibrahim - 1 month ago 6
Python Question

Filling Lists/ Tables by reading input files

I am kind of new to data structure and I am having some trouble extracting specific information from multiple .txt files. I want to group specific information from junk input files.

The files are formatted as following :

---------------------------------------------------
Block 1
---------------------------------------------------
Block 2
---------------------------------------------------
Block 3
---------------------------------------------------
.
.
.


A random .txt file that serves as the input (parsed.txt) looks something like this:

---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 37989
Local AS Number: 12654
Peer IP Address: 203.123.48.6
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 146.228.1.3
Local IP Address: 193.0.4.28
---------------------------------------------------
Timestamp: 1453939200(2016-01-28 01:00:00)
Peer AS Number: 1836
Local AS Number: 12654
Peer IP Address: 2a01:2a8::3
Local IP Address: 2001:67c:2e8:2:ffff:0:4:28


Required:

The main field in each block is the "Local AS Number". I would like to read each block, check the "Local AS Number" and update some sort of data structure such that :


  • If is new, Create a (Table, List,... ) with it as its name containing
    the 3 other fields as columns (column name would be the field's
    respective name), and fill the fields with the respective value.

  • If the "Local AS Number" has been read in previous blocks, then just
    fill the other fields of the existing table.



The result should looks something like this :

AS 12654
Timestamp Peer AS Number Peer IP Address Local IP Address
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 146.228.1.3 193.0.4.28
1453939200 1836 2a01:2a8::3 2001:67c:2e8:2:ffff:0:4:28


I tried some string manipulation, but it turned out to be a complete mess, so I figured there should be a data structure more suitable.
Please note that the Tables must be active to be updated until the last .txt file has been parsed. This is a problem that I absolutely do not have an idea even where to begin solving it.

Answer

As @AxxE suggested, a dictionary of lists of tuples does what you want. Each list contains all the blocks for a given local AS number stored in tuples.

I use the re module to extract the numbers from each line collecting each block's data into a tuple which is added to a list that is keyed by the local AS number in the dictionary. Error checks might be added, of course.

import re
import fileinput
records = {}
file = open('parsed.txt', 'r')
in_line = file.readline()
while in_line:
    time_stamp = re.search(r': (\d+)\(',file.readline()).group(1)
    peer_AS = re.search(r': (\d+)',file.readline()).group(1)
    local_AS = re.search(r' \d+',file.readline()).group(0)
    peer_IP = re.search(r': (.+)$',file.readline()).group(1)
    local_IP = re.search(r': (.+)$',file.readline()).group(0)
    if local_AS in records:
        records[local_AS].append((time_stamp, peer_AS, peer_IP, local_IP))
    else:
        records[local_AS] = [(time_stamp, peer_AS, peer_IP, local_IP)]
    in_line = file.readline()
file.close()

The records may now be printed out as you indicated.

for i in records:
    entry = records[i]
    print('\t\t\tLocal AS Number: {}'.format(i))
    print('Timestamp\tPeer AS Number\tPeer IP Address\t\tLocal IP Address')
    for item in entry:
        print('{}\t{}\t\t{}\t\t{}'.format(item[0],item[1],item[2],item[3]))

This yields the output, below. I extended the example file changing the local AS number to a different one just to show the idea.

                  Local AS Number 12654
Timestamp   Peer AS Number  Peer IP Address     Local IP Address
1453939200  37989       203.123.48.6        193.0.4.28
1453939200  1836        146.228.1.3         193.0.4.28
1453939200  1836        146.228.1.3         193.0.4.28
1453939200  1836        2a01:2a8::3         2001:67c:2e8:2:ffff:0:4:28 
                  Local AS Number 12655
Timestamp   Peer AS Number  Peer IP Address     Local IP Address
1453939200  37989       203.123.48.6        193.0.4.28
1453939200  1836        146.228.1.3         193.0.4.28
1453939200  1836        146.228.1.3         193.0.4.28
1453939200  1836        2a01:2a8::3         2001:67c:2e8:2:ffff:0:4:28 
Comments