sebap123 sebap123 - 5 months ago 7
Python Question

Counting total number of element occurrences in different length vectors

I have 3 very long (100K+ elements) vectors of different products names. Each vector has different length. What I want to do is to count in how many vectors each product is. So something like this:

v1 = ['product1','product2','product3']
v2 = ['product3','product1','product5','product7','product10']
v3 = ['product1','product10']

'product1' 3
'product2' 1
'product3' 2
'product5' 1
'product7' 1
'product10' 2


Products might be in any order within vector and within vector each product appears only once.

I wanted to use pandas
DataFrame
here, but all columns must be this same length. Also simple summing based on rows will not work, because this same product might be on different row in each column.

Does anyone has any idea what will be the best way to do this? I know that I can do simple bruteforce loop but I don't want to if I can use something from numpy or pandas.

Answer

You can use Counter and chain to do this in a few lines:

from collections import Counter
from itertools import chain

v1 = ['product1','product2','product3']
v2 = ['product3','product1','product5','product7','product10']
v3 = ['product1','product10']

c = Counter(chain(v1, v2, v3))
# more space-efficient than Counter(v1 + v2 + v3)
# Counter({'product1': 3, 'product10': 2, 'product3': 2, 'product7': 1, 'product5': 1, 'product2': 1})

c['product10']
# 2
Comments