Gina Sun Gina Sun - 2 months ago 15
Python Question

how to make python loop faster to run pairwise association test

I have a list of patient id and drug names and a list of patient id and disease names. I want to find the most indicative drug for each disease.

To find this I want to do Fisher exact test to get the p-value for each disease/drug pair. But the loop runs very slowly, more than 10 hours. Is there a way to make the loop more efficient, or a better way to solve this association problem?

My loop:

import numpy as np
import pandas as pd
from scipy.stats import fisher_exact

most_indicative_medication = {}
rx_list = list(meps_meds.rxName.unique())
disease_list = list(meps_base_data.columns.values)[8:]

for i in disease_list:
print i
rx_dict = {}
for j in rx_list:
subset = base[['id', i, 'rxName']].drop_duplicates()
subset[j] = subset['rxName'] == j
subset = subset.loc[subset[i].isin(['Yes', 'No'])]
subset = subset[[i, j]]
tab = pd.crosstab(subset[i], subset[j])
if len(tab.columns) == 2:
rx_dict[j] = fisher_exact(tab)[1]
else:
rx_dict[j] = np.nan
most_indicative_medication[i] = min(rx_dict, key=rx_dict.get)

Answer Source

You need multiprocessing/multithreading, I have added the code.:

from multiprocessing.dummy import Pool as ThreadPool
most_indicative_medication = {}
rx_list = list(meps_meds.rxName.unique()) 
disease_list = list(meps_base_data.columns.values)[8:]

def run_pairwise(i):
    print i
    rx_dict = {}
    for j in rx_list: 
        subset = base[['id', i, 'rxName']].drop_duplicates()
        subset[j] = subset['rxName'] == j
        subset = subset.loc[subset[i].isin(['Yes', 'No'])]
        subset = subset[[i, j]]
        tab = pd.crosstab(subset[i], subset[j]) 
        if len(tab.columns) == 2:
            rx_dict[j] = fisher_exact(tab)[1]
        else: 
            rx_dict[j] = np.nan
    most_indicative_medication[i] = min(rx_dict, key=rx_dict.get)

pool = ThreadPool(3)
pairwise_test_results = pool.map(run_pairwise,disease_list)
pool.close()
pool.join()

notes:http://chriskiehl.com/article/parallelism-in-one-line/