Erin.Signe Erin.Signe - 1 year ago 92
Python Question

Python - Looking for a solution to prevent duplicates in lists generated via text file

As the title mentions, I'm looking for a way for my code to not only detect duplicates in a list, but to separate the items, moving them to other lists.

I am relatively new at programming, having just finished an introductory course to python at my local JC. During the course of the semester, I had an idea to create a program that would read lines in a .txt file, place them in a list, shuffle the list, and split the single list into two new lists before printing them. This would be used for quickly distributing uniquely named objects to two players in a game, for example. Here is what the code looks like so far:

Given that list.txt = apple, apple, banana, pear, orange, kiwi

import random
#create list from text file
list = [line.strip() for line in open("list.txt", 'r')]

#Copy list to preserve the original
list2 = list[::]
random.shuffle(list2) #shuffle the order

#If # of items = odd, remove one
if len(list2) % 2 == 1:
del list2[-1]

#Divide items into two lists
A = list2[:len(list2)//2]
B = list2[len(list2)//2:]

print("The items in group A are: ", A)
print("The items in group B are: ", B)

I've gotten everything to work so far, but something I want to implement is having the program detect if, say, both "apple"'s are in list A and, if they are, move one to list B. I've come up with three solutions, but not one of them works. Solution 1 would be to take the two "apple"'s as they are and detect/sort from there.

def moveDuplicates(in_list):
unique = set(in_list)
for each in unique:
count = in_list.count(each)
if count > 1:
return True
return False

The problem is that I can detect duplicates just fine like this, but I don't know how to specify moving one of them since I won't know the position of either in the new lists, since it's shuffled every time.

Solution 2 is to rename the "apple"s to apple1 and apple2, or something similar. With this solution, I can choose to delete and append either item easily, but I don't know how to get the program to detect that two items with sequential numbers are in the same list.

Solution 3 is the worst one imo, since it involves writing conditions for every potential duplicate to make sure before the shuffle that one of each would be placed in each list, which violates DRY principles.

I feel that the most ideal solution would be to use solution 2 or 3 while somehow using something similar to how google uses asterisks in searches to denote that anything can be in place of the asterisk.

Answer Source

I'm assuming if you have more than 2 copies of any word, you can ignore everything after the first two. My suggested solution splits the list into repeated values, which will be shared by both groups and single values, which will be shuffled and split among the two groups.

import collections

list = ['a', 'a', 'a', 'b', 'b', 'c', 'd']

# Split the list into words that occur once and ones that are repeats
single = []
repeats = []
for word, count in collections.Counter(list).items():
    if count > 1:

# Each group gets a copy of the repeats
group_a = repeats[:]
group_b = repeats[:]

# Now shuffle/divide the single values
half = len(single) // 2

You could add additional shuffles for the individual groups, if you like.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download