SRI VVISHNU SRI VVISHNU - 9 days ago 6
Python Question

What is the use of buffering in python's built-in open() function?

Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])


The above documentation says "The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.".

When I use

filedata = open(file.txt,"r",0)


or

filedata = open(file.txt,"r",1)


or

filedata = open(file.txt,"r",2)


or

filedata = open(file.txt,"r",-1)


or

filedata = open(file.txt,"r")


The output has no change.Each line shown above prints at same speed.

output:


Mr. Bean is a British television programme series of fifteen 25-

minute episodes written by Robin Driscoll and starring Rowan Atkinson
as

the title character. Different episodes were also written by Robin

Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the

episodes were broadcast on ITV, from the pilot on 1 January 1990,
until

"Goodnight Mr. Bean" on 31 October 1995. A clip show, "The Best Bits
of

Mr. Bean", was broadcast on 15 December 1995, and one episode, "Hair
by

Mr. Bean of London", was not broadcast until 2006 on
Nickelodeon.


Then how the buffering parameter in open() function is useful ? What
value

of that buffering parameter is best to use ?

Answer

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, only a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and may not be suitable in many cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.

Comments