Reman Reman - 19 days ago 6
Python Question

How to add position of a string in a list to a new list of doubles?

Example:

r is a textfile loaded in a list

r = ['John is american', 'Bea is french', 'John is american', 'Ray is german', 'John is american', 'Bea is french', 'Bea is french', '', 'Lisa is dutch']


What I want to do is to count the number of occurrences and to add the position in r:

finallist = ['string', frequency, [positions in r]]

finallist = [['John is american', 3, [0,2,4]], ['Bea is french', 3, [1,5,6]], ['Ray is german', 1, [3]], ['Lisa is dutch', 1, [7]]]


I know how to count the strings in r:

[[x,r.count(x)] for x in set(r)]


(or using Counter class from the collections library)

but how can I add the position of the strings in r to finallist?

Answer

Use a dictionary to track the positions of the sentences (building lists); the final lengths of these lists are also the frequency count:

from collections import defaultdict

pos = defaultdict(list)
for i, sentence in enumerate(r):
    pos[sentence].append(i)
finallist = [[sentence, len(positions), positions] for sentence, positions in pos.items()]

Demo:

>>> from collections import defaultdict
>>> r = ['John is american', 'Bea is french', 'John is american', 'Ray is german', 'John is american', 'Bea is french', 'Bea is french', '', 'Lisa is dutch']
>>> pos = defaultdict(list)
>>> for i, sentence in enumerate(r):
...     pos[sentence].append(i)
...
>>> [[sentence, len(positions), positions] for sentence, positions in pos.items()]
[['John is american', 3, [0, 2, 4]], ['Bea is french', 3, [1, 5, 6]], ['Ray is german', 1, [3]], ['', 1, [7]], ['Lisa is dutch', 1, [8]]]

If output order matters, and you don't yet have access to Python 3.6 (which is in beta at the time of this answer but whose dict implementation preserves insertion order), then you could use an OrderedDict instance, and use dict.setdefault() to materialise the initial empty list for each key:

from collections import OrderedDict

pos = OrderedDict()
for i, sentence in enumerate(r):
    pos.setdefault(sentence, []).append(i)
finallist = [[sentence, len(positions), positions] for sentence, positions in pos.items()]
Comments