Z. Winters Z. Winters - 1 month ago 10
Python Question

Round Robin Scheduling for a pandas dataframe

I have been working on a bit of code that reads in a tab-delimited CSV file, which represents a series of processes and their start times and durations, and creates a dataframe for it using pandas. I then need to apply the simplified round-robin form of scheduling to find the turnaround time for the process, with the time slice taken from the user input.

So far, I am able to read in the CSV file, label it and sort it properly. However, when trying to construct the loop to iterate over the rows to find each process' completion time, I get stuck.

The code so far looks like:

# round robin
def rr():
docname = (sys.argv[1])
method = (sys.argv[2])
# creates a variable from the user input to define timeslice
timeslice = int(re.search(r'\d+', method).group())
# use pandas to create a 2-d data frame from tab delimited file, set column 0 (process names) to string, set column
# 1 & 2 (start time and duration, respectively) to integers
d = pd.read_csv(docname, delimiter="\t", header=None, dtype={'0': str, '1': np.int32, '2': np.int32})
# sort d into d1 by values of start times[1], ascending
d1 = d.sort_values(by=1)
# Create a 4th column, set to 0, for the Completion time
d1[3] = 0
# change column names
d1.columns = ['Process', 'Start', 'Duration', 'Completion']
# intialize counter
counter = 0
# if any values in column 'Duration' are above 0, continue the loop
while (d1['Duration']).any() > 0:
for index, row in d1.iterrows():
# if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
# subtract it from the the current value in column 'Duration'
if row.Duration > timeslice:
counter += timeslice
row.Duration -= timeslice
print(index, row.Duration)
# if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
# subtract the Duration from itself, to make it 0
# set row:Completion to the current counter, which is the completion time for the process
elif row.Duration <= timeslice and row.Duration != 0:
counter += row.Duration
row.Duration -= row.Duration
row.Completion = counter
print(index, row.Duration)
# otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
else:
print(index, "Done")


Given the sample CSV file,
d1
looks like

Process Start Duration Completion
3 p4 0 280 0
0 p1 5 140 0
1 p2 14 75 0
2 p3 36 320 0
5 p6 40 0 0
4 p5 67 125 0


And when I run my code with
timeslice = 70
, I get an infinite loop of:

3 210
0 70
1 5
2 250
5 Done
4 55
3 210
0 70
1 5
2 250
5 Done
4 55


which seems it is iterating the loop correctly once, and then infinitely repeating. However,
print(d1['Completion'])
gives a value of all 0's, meaning it isn't assigning the correct
counter
value to
d1['Completion']
either.

Ideally, the
Completion
values would fill out to their corresponding times, given
timeslice=70
like:

Process Start Duration Completion
3 p4 0 280 830
0 p1 5 140 490
1 p2 14 75 495
2 p3 36 320 940
5 p6 40 0 280
4 p5 67 125 620


Which I could then use to find the average turnaround time. For some reason, however, my loop seems to iterate once and then repeat itself endlessly. When I tried switching the order of the
while
and
for
statements, it would iterate each row repeatedly until it reached 0, also giving the incorrect completion time.

Thanks in advance.

Answer

I revised your code a little bit and it works.You can't actually cover the original value with a revised value in your way, so the loop will not end.

while (d1['Duration']).any() > 0:
    for index, row in d1.iterrows():
        # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
        # subtract it from the the current value in column 'Duration'
        if row.Duration > timeslice:
            counter += timeslice
            #row.Duration -= timeslice
            # !!!LOOK HERE!!!
            d1['Duration'][index] -= timeslice
            print(index, row.Duration)
        # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
        # subtract the Duration from itself, to make it 0
        # set row:Completion to the current counter, which is the completion time for the process
        elif row.Duration <= timeslice and row.Duration != 0:
            counter += row.Duration
            #row.Duration -= row.Duration
            #row.Completion = counter
            # !!!LOOK HERE!!!
            d1['Duration'][index] = 0
            d1['Completion'][index] = counter 
            print(index, row.Duration)
        # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
        else:
            print(index, "Done")

By the way, I guess you might want to simulate the process scheduling algorithm. In that case, you have to consider the 'Start', because not every process starts at the same time.

(Your ideal table is somehow wrong.)

Comments