MBasith MBasith - 4 months ago 6
Python Question

Python listing last 10 modified files and reading each line of all 10 files

I need some help listing files in a directory and reading through each file using Python. I know how to do this using shell commands but is there a Pythonic way to do it?

I would like to:

1.) List all files in a directory.

2.) Grab the last 10 modified/latest files (preferably using a wildcard)

3.) Read through each line of all 10 files

Using shell commands I can:

Linux_System# ls -ltr | tail -n 10
-rw-rw-rw- 1 root root 999934 Jul 26 01:06 data_log.569
-rw-rw-rw- 1 root root 999960 Jul 26 02:05 data_log.570
-rw-rw-rw- 1 root root 999968 Jul 26 03:13 data_log.571
-rw-rw-rw- 1 root root 999741 Jul 26 04:20 data_log.572
-rw-rw-rw- 1 root root 999928 Jul 26 05:31 data_log.573
-rw-rw-rw- 1 root root 999942 Jul 26 06:45 data_log.574
-rw-rw-rw- 1 root root 999916 Jul 26 07:46 data_log.575
-rw-rw-rw- 1 root root 999862 Jul 26 08:59 data_log.576
-rw-rw-rw- 1 root root 999685 Jul 26 10:15 data_log.577
-rw-rw-rw- 1 root root 999633 Jul 26 11:26 data_log.578

Linux_System# cat data_log.{569..578}


Using glob I am able to list the files and open a specific file but not sure how I can list only the last 10 modified files and feed the wildcard file list to the open function.

import os, fnmatch, glob

files = glob.glob("data_event_log.*")
files.sort(key=os.path.getmtime)
print("\n".join(files))

data_event_log.569
data_event_log.570
data_event_log.571
data_event_log.572
data_event_log.573
data_event_log.574
data_event_log.575
data_event_log.576
data_event_log.577
data_event_log.578

with open(data_event_log.560, 'r') as f:
output_list = []
for line in f.readlines():
if line.startswith('Time'):
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
a = (lineRegex.findall(line))

Answer

it looks alike you almost did everything already

import os.path, glob

files = glob.glob("data_event_log.*")
files.sort(key=os.path.getmtime)
latest=files[-10:] # last 10 entries
print("\n".join(latest))
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
for fn in latest:
    with open(fn) as f:
        for line in f:
            if line.startswith('Time'):          
                a = lineRegex.findall(line)

Edit:

Especially if you have many files a better and simpler solution would be

import os.path, glob, heapq

files = glob.iglob("data_event_log.*")
latest=heapq.nlargest(10, files, key=os.path.getmtime) # last 10 entries
print("\n".join(latest))
lineRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
for fn in latest:
    with open(fn) as f:
        for line in f:
            if line.startswith('Time'):          
                a = lineRegex.findall(line)
Comments