ShinjiOno ShinjiOno - 7 months ago 34
Python Question

Smallest Difference Between 2 object in Dataframe

df = pd.DataFrame( {'Stock' : ['Apple','Broadcomm','Citi','D&G'],
'PE' : pd.Series([1.5,3.9,5.6,6.8]),
})


I'm looking for an algorithm to rank stock pair from a pool based on the difference of PE i.e PE stock 1 - PE stock 2

i.e pool of 40 stocks, rank based on unique stock pair based on smallest PE difference. Total will have 20 unique pairs
eg. MSFT appears in pair 1, with smallest PE associated with MSFT pair, MSFT should not reappear again in the subsequent pair

What's the correct algorithm for doing this?

So far I have tried to find the PE difference of each and every pair and rank ascending. What should I do next?

Answer

This is an approach that uses itertools.combinations(), isin(), and drop():

import pandas as pd
import itertools as it

df = pd.DataFrame({'Stock' : ['Apple', 'Broadcomm', 'Citi', 'D&G', 'Elixir', 'Foxtrot'],
                   'PE'    : [3.8, 3.9, 5.6, 6.8, 0.5, 3.9]})
print(df)

assert len(df) % 2 == 0
m = df.set_index('Stock')
ranking = pd.DataFrame(columns=['StockA', 'StockB', 'minPE', 'deltaPE'],
                       data=[(a, b, min(m.PE[a], m.PE[b]), abs(m.PE[a] - m.PE[b]))
                             for a, b in it.combinations(m.index, 2)])
print(ranking)
ranking.sort_values(['deltaPE', 'minPE'], inplace=True)
print(ranking)

# ranking is sorted from best to worst.
# Start with first line, eliminate other lines that belong to either one of
# this line's stocks (but not both), then proceed to next line and repeat.
for i in range(len(df) // 2):
    a = ranking.iloc[i].StockA
    b = ranking.iloc[i].StockB
    contenders = ranking[ranking.StockA.isin([a, b]) ^ ranking.StockB.isin([a, b])]
    ranking.drop(contenders.index, inplace=True)

print(ranking)

Output:

    PE      Stock
0  3.8      Apple
1  3.9  Broadcomm
2  5.6       Citi
3  6.8        D&G
4  0.5     Elixir
5  3.9    Foxtrot

# ---- Ranking before sorting:
       StockA     StockB  minPE  deltaPE
0       Apple  Broadcomm    3.8      0.1
1       Apple       Citi    3.8      1.8
2       Apple        D&G    3.8      3.0
3       Apple     Elixir    0.5      3.3
4       Apple    Foxtrot    3.8      0.1
5   Broadcomm       Citi    3.9      1.7
6   Broadcomm        D&G    3.9      2.9
7   Broadcomm     Elixir    0.5      3.4
8   Broadcomm    Foxtrot    3.9      0.0
9        Citi        D&G    5.6      1.2
10       Citi     Elixir    0.5      5.1
11       Citi    Foxtrot    3.9      1.7
12        D&G     Elixir    0.5      6.3
13        D&G    Foxtrot    3.9      2.9
14     Elixir    Foxtrot    0.5      3.4

# ---- Ranking after sorting:
       StockA     StockB  minPE  deltaPE
8   Broadcomm    Foxtrot    3.9      0.0
0       Apple  Broadcomm    3.8      0.1
4       Apple    Foxtrot    3.8      0.1
9        Citi        D&G    5.6      1.2
5   Broadcomm       Citi    3.9      1.7
11       Citi    Foxtrot    3.9      1.7
1       Apple       Citi    3.8      1.8
6   Broadcomm        D&G    3.9      2.9
13        D&G    Foxtrot    3.9      2.9
2       Apple        D&G    3.8      3.0
3       Apple     Elixir    0.5      3.3
7   Broadcomm     Elixir    0.5      3.4
14     Elixir    Foxtrot    0.5      3.4
10       Citi     Elixir    0.5      5.1
12        D&G     Elixir    0.5      6.3

# ---- Ranking after dropping rows:
      StockA   StockB  minPE  deltaPE
8  Broadcomm  Foxtrot    3.9      0.0
9       Citi      D&G    5.6      1.2
3      Apple   Elixir    0.5      3.3
Comments