Spectre Spectre - 18 days ago 6
Java Question

HashSet<User> modification, while user have only final value in 'equals/hashcode'

I read some docs about

HashSets
performance and I still don't get one thing.

I have a mutable
User
class, where there's one unique, t-safe final field:

public User {

// magical thread-safe, immutable int
private final int userID;
// some mutable stuff

public User(int userID){
this.userID = userID;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
User user = (User) o;

return Objects.equals(userID, user.userID);
}

@Override
public int hashCode() {
return Objects.hash(userID);
}
}


I have a thread-safe immutable field - userID.

Now, I create a
HashSet<User> users
container and here starts my question.

I often iterate over this collection, to find a
User
by his nickname, ID or some other variables, and sometimes I change their values, (mutable values, not strings), but the userID still remains the same, always.

Does iterating and modifying mutable objects in this case affect the
HashSet
performance? If I have
hashcode()
which includes only one, immutable value - it should be ok, right?

Thanks very much for the help!

Edit



Changed AtomicInteger to int - no need for atomicity, it's already t-safe

Answer

Does iterating and modifying mutable objects in this case affect the HashSetperformance? If I have hashcode() which includes only one, immutable value - it should be ok, right?

Right. Since HashSet is backed by HashMap:

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

(https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html)

Conclusion: Performance in a HashSet is mostly affected by:

  • The ratio between the number of contained items and the collection's actual capacity.
  • The distribution of values returned by hashCode.

(Changing the value returned by hashCode - OK, I notice this is not your case, would harm severely the behaviour of HashMap, because if an object was initially indexed by number N, changing it -externally- later to N+1 without the HashMap noticing it, would make the HashMap not finding the object in its "expected" place.)

All this said, there's something smelly in your question: You say that you "often iterate over that HashSet". But HashSet is not to be iterated on: It is to be indexed: You should reach an object directly by calling get or contains. Iterating is a poor using of the indexing nature of HashSet.

Finding by multiple criteria

If you need to find a User object by different criteria, you should add this paradigm for each filtering value:

public class UserContainer 
{
    private final Map<K, User> usersByKey1=new HashMap<K1, User>(1.7*finalSize);

    public void addUser(User user)
    {
        synchronized(this)
        {
            this.usersByKey1.put(key1, user);
            ...
        }
    }

    public void User getUserByKey1(Key1 key1)
    {
        return this.usersByKey1(key1);
    }

    public void removeUser(User user)
    {
        synchronized(this)
        {
            this.usersByKey1.remove(key1);
            ...
        }
    }
}

See? UserContainer is an abstraction which encapsulates all user management issues: indexing, adding, removing, etc. You can add a new indexing map for each required searching value: userName, email, etc - as long as each one of them is actually a candidate primary key for User.