Ann Ann - 8 months ago 42
Python Question

Comparing two database records at a time in python

I'm trying to compare pairs of students in a database table.
My database table is as follows:

id | edu
1 | 1
2 | 1
3 | 2
4 | 2

I compare a pair of students at a time and if they have similar edu code I input them to a list with their id's as 1 if not 0.
My code is as follows:

#getting edu info
data=curr.execute('select id,edu from student_details')
result = curr.fetchall()

data2=curr.execute('select id,edu from student_details ')
result2 = curr.fetchall()

looping=curr.execute('select count(id) from student_details where id <= 4')
loop_times = curr.fetchall()
count = int(loop_times[0][0])

count = count + 1
for i in range(1,count):
row = []
for x in range(0,i):
for j in range(i+1, count):
if mydic1[i]==mydic2[j]:
print edu

This prints the output according to the way I wanted as follows:

[[0, 1, 0, 0], [0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 0]]

But I don't know whether this is the most efficient way of comparing database records in python. Any suggestions for improvements are welcome.


One way would be to use defaultdict and to create a dictionary where it has a list of student numbers for each “edu” number. So that the result is similar to this:

edu_map == {1: [1, 2], 2: [3, 4]}

The basic idea is that you iterate over each entry in the table and add the student's number to the “edu” number in the resulting dict:

edu_map = defaultdict(list)
for stud, edu in mydic1.items():
    edu_map[edu] += [stud]

The reason for defaultdict is that it automatically creates a new dictionary entry (in this case of an empty list) when it is requested but doesn't exist. So you don't need to check if edu not in edu_map and then add an empty list, because defaultdict does that for you.

To convert this result in a list like yours, you can go through each student again and get their “edu” number and then use the numbers in the edu_map to get a list of that:

result = []
for stud, edu in mydic1.items():
    stud_in_edu = edu_map[edu]
    stud_in_edu = [1 if other_stud != stud and other_stud in stud_in_edu else 0
                   for other_stud in range(len(mydic1))]

This assumes that mydic1 only contains valid entries. There is no need to additionally count them in a separate SQL query if you already have all entries.