fubo - 1 year ago 39
C# Question

# Most efficient way to remove duplicates from a List

Let's say I have a List with duplicate values and I want to remove the duplicates.

``````List<int> myList = new List<int>(Enumerable.Range(0, 10000));

// adding a few duplicates here
``````

I have found 3 approaches to solve this:

``````List<int> result1 = new HashSet<int>(myList).ToList(); //3700 ticks
List<int> result2 = myList.Distinct().ToList(); //4700 ticks
List<int> result3 = myList.GroupBy(x => x).Select(grp => grp.First()).ToList(); //18800 ticks
//referring to pinturic's comment:
List<int> result4 = new SortedSet<int>(myList).ToList(); //18000 ticks
``````

In most answers here on SO, the Distinct approach is shown as the "correct one", yet the HashSet is always faster!

My question: is there anything I have to be aware of when I use the HashSet approach and is there another more efficient way?

There is a big difference between these two approaches:

``````List<int> Result1 = new HashSet<int>(myList).ToList(); //3700 ticks
List<int> Result2 = myList.Distinct().ToList(); //4700 ticks
``````

The first one can (will probably) change the order of the elements of the returned `List<>`: `Result1` elements won't be in the same order of `myList`'s ones. The second maintains the original ordering.

There is probably no faster way than the first one.

There is probably no "more correct" (for a certain definition of "correct" based on ordering) than the second one.

(the third one is similar to the second one, only slower)

Just out of curiousity, the `Distinct()` is:

``````// Reference source http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,712
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return DistinctIterator<TSource>(source, null);
}

// Reference source http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,722
static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer) {
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource element in source)
So in the end the `Distinct()` simply uses an internal implementation of an `HashSet<>` (called `Set<>`) to check for the uniqueness of items.