Ann Ann - 9 months ago 49
Python Question

Comparing two database records at a time in python

I'm trying to compare pairs of students in a database table.
My database table is as follows:

id | edu
1 | 1
2 | 1
3 | 2
4 | 2

I compare a pair of students at a time and if they have similar edu code I input them to a list with their id's as 1 if not 0.
My code is as follows:

#getting edu info
data=curr.execute('select id,edu from student_details')
result = curr.fetchall()

data2=curr.execute('select id,edu from student_details ')
result2 = curr.fetchall()

looping=curr.execute('select count(id) from student_details where id <= 4')
loop_times = curr.fetchall()
count = int(loop_times[0][0])

count = count + 1
for i in range(1,count):
row = []
for x in range(0,i):
for j in range(i+1, count):
if mydic1[i]==mydic2[j]:
print edu

This prints the output according to the way I wanted as follows:

[[0, 1, 0, 0], [0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 0]]

But I don't know whether this is the most efficient way of comparing database records in python. Any suggestions for improvements are welcome.

Answer Source

One way would be to use defaultdict and to create a dictionary where it has a list of student numbers for each “edu” number. So that the result is similar to this:

edu_map == {1: [1, 2], 2: [3, 4]}

The basic idea is that you iterate over each entry in the table and add the student's number to the “edu” number in the resulting dict:

edu_map = defaultdict(list)
for stud, edu in mydic1.items():
    edu_map[edu] += [stud]

The reason for defaultdict is that it automatically creates a new dictionary entry (in this case of an empty list) when it is requested but doesn't exist. So you don't need to check if edu not in edu_map and then add an empty list, because defaultdict does that for you.

To convert this result in a list like yours, you can go through each student again and get their “edu” number and then use the numbers in the edu_map to get a list of that:

result = []
for stud, edu in mydic1.items():
    stud_in_edu = edu_map[edu]
    stud_in_edu = [1 if other_stud != stud and other_stud in stud_in_edu else 0
                   for other_stud in range(len(mydic1))]

This assumes that mydic1 only contains valid entries. There is no need to additionally count them in a separate SQL query if you already have all entries.