ragingasiancoder ragingasiancoder - 4 months ago 11
Python Question

How does readline() work behind the scenes when reading a text file?

I would like to understand how

readline()
takes in a single line from a text file. The specific details I would like to know about, with respect to how the compiler interprets the Python language and how this is handled by the CPU, are:


  1. How does the
    readline()
    know which line of text to read, given that successive calls to
    readline()
    read the text line by line?

  2. Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?



I am a "beginner" (I have about 4 years of "simpler" programming experience), so I wouldn't be able to understand technical details, but feel free to expand if it could help others understand!

Answer

Example using the file file.txt:

fake file
with some text
in a few lines

Question 1: How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?

When you open a file in python, it creates a file object. File objects act as file descriptors, which means at any one point in time, they point to a specific place in the file. When you first open the file, that pointer is at the beginning of the file. When you call readline(), it moves the pointer forward to the character just after the next newline it reads.

Calling the tell() function of a file object returns the location the file descriptor is currently pointing to.

with open('file.txt', 'r') as fd:
    print fd.tell()
    fd.readline()
    print fd.tell()

# output:
0
10


Question 2: Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?

First off, reading a file doesn't really have anything to do with the CPU. It has to do with the operating system and the file system. Both of those determine how files can be read and written to. Barebones explanation of files

For random access in files, you can use the mmap module of python. The Python Module of the Week site has a great tutorial.

Example, jumping to the 2nd line in the example file and reading until the end:

import mmap
import contextlib

with open('fake.txt', 'r') as fd:
    with contextlib.closing(mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)) as mm:
        print mm[10:]

# output:
with some text
in a few lines
Comments