Sunny Sunny - 1 year ago 67
Java Question

Set vs List when need both unique elements and access by index

I need to keep a unique list of elements seen and I also need to pick random one from them from time to time. There are two simple ways for me to do this.

  1. Keep elements seen in a Set - that gives me uniqueness of elements. When there is a need to pick random one, do the following:


  2. Keep elements seen in a List - this way no need to convert to array as there is the get() function for when I need to ask for a random one. But here I would need to do this when adding.

    if (elementsSeen.indexOf(element)==-1) {elementsSeen.add(element);}

So my question is which way would be more efficient? Is converting to array more consuming or is indexOf worse? What if attempting to add an element is done 10 or 100 or 1000 times more often?

Answer Source

HashSet and TreeSet both extend AbstractCollection, which includes the toArray() implementation as shown below:

public Object[] toArray() {
    // Estimate size of array; be prepared to see more or fewer elements
    Object[] r = new Object[size()];
    Iterator<E> it = iterator();
    for (int i = 0; i < r.length; i++) {
        if (! it.hasNext()) // fewer elements than expected
            return Arrays.copyOf(r, i);
        r[i] =;
    return it.hasNext() ? finishToArray(r, it) : r;

As you can see, its responsible for allocating the space for an array, as well as creating an Iterator object for copying. So, for a Set, adding is O(1), but retrieving a random element will be O(N) because of the element copy operation.

A List, on the other hand, allows you quick access to a specific index in the backing array, but doesn't guarantee uniqueness. You would have to re-implement the add, remove and associated methods to guarantee uniqueness on insert. Adding a unique element will be O(N), but retrieval will be O(1).

So, it really depends on which area is your potential high usage point. Are the add/remove methods going to be heavily used, with random access used sparingly? Or is this going to be a container for which retrieval is most important, since few elements will be added or removed over the lifetime of the program?

If the former, I'd suggest using the Set with toArray(). If the latter, it may be beneficial for you to implement a unique List to take advantage to the fast retrieval. The significant downside is add contains many edge cases for which the standard Java library takes great care to work with in an efficient manner. Will your implementation be up to the same standards?