Harpal Harpal - 1 month ago 5
Python Question

Upper memory limit?

Is there a limit to memory for python? I've been using a python script to calculate the average values from a file which is a minimum of 150mb big.

Depending on the size of the file I sometimes encounter a


Can more memory be assigned to the python so I don't encounter the error?

EDIT: Code now below

NOTE: The file sizes can vary greatly (up to 20GB) the minimum size of the a file is 150mb

file_A1_B1 = open("A1_B1_100000.txt", "r")
file_A2_B2 = open("A2_B2_100000.txt", "r")
file_A1_B2 = open("A1_B2_100000.txt", "r")
file_A2_B1 = open("A2_B1_100000.txt", "r")
file_write = open ("average_generations.txt", "w")
mutation_average = open("mutation_average", "w")

files = [file_A2_B2,file_A2_B2,file_A1_B2,file_A2_B1]

for u in files:
line = u.readlines()
list_of_lines = []
for i in line:
values = i.split('\t')

count = 0
for j in list_of_lines:
count +=1

for k in range(0,count):

length = len(list_of_lines[0])
print_counter = 4

for o in range(0,length):
total = 0
for p in range(0,count):
number = float(list_of_lines[p][o])
total = total + number
average = total/count
print average
if print_counter == 4:
print_counter = 0
print_counter +=1


This is my third answer because I misunderstood what your code was doing in my original, and then made a small but crucial mistake in my second -- so hopefully three's a charm.)

As others have pointed out, your MemoryError problem is most likely because you're attempting to read the entire contents of huge files into memory and then, on top of that, effectively doubling the amount of memory needed by creating a list of lists of the string values from each line.

Python's memory limits are determined by how much physical ram and virtual memory disk space your computer and operating system have available. Even if you don't use it all up and your program "works", it can become impractical because it takes too long.

Anyway, the most obvious way to avoid that is to process each file a single line at a time, which means you have to do the processing incrementally.

To accomplish this, a list of running totals for each of the fields is kept. When that is finished, the average value of each field can be calculated by dividing the corresponding total value by the count of total lines read. Once that is done, these averages can be printed out and some written to one of the output files. I've also made a conscious effort to use very descriptive variable names to try to make it understandable.

input_file_names = ["A1_B1_100000.txt", "A2_B2_100000.txt",
                    "A1_B2_100000.txt", "A2_B1_100000.txt"]

file_write = open("average_generations.txt", 'w')
mutation_average = open("mutation_average", 'w')

for file_name in input_file_names:
    with open(file_name, 'r') as input_file:
        print "processing file", file_name
        count = 0
        totals = None
        for line in input_file:
            fields = line.split('\t')
                fields.remove('\n') # remove empty field (why)?
            except ValueError:
            if not totals: # first line?
                totals = map(float, fields)
                for i in xrange(len(fields)):
                    totals[i] += float(fields[i])
            count += 1

        averages = [total/count for total in totals]

        print_counter = 0
        for average in averages:
            print average
            if print_counter % GROUP_SIZE == 0:
            print_counter += 1


mutation_average.close() # ????