How HashSet works in Java? How HashSet eliminates duplicates in Java? HashSet Tutorial.

How HashSet works in Java?
How HashSet eliminates duplicates in Java?


This is very popular interview question for both beginners and experienced. HashSet internally uses HashMap for storing the data and retrieving it back. 
So if you know how HashMap works then it is very easy to understand how HashSet works.



Please go through below articles on HashMap for better understanding,
How HashMap works?
How hashcode and equals method works in HashMap?

How HashSet works in Java


HashSet internally uses HashMap for storing data, what it means is,
When HashSet, add("data") method is called, add method internally calls HashMap put(key, value) method.
    private transient HashMap<E,Object> map;
  
    //Dummy Value to store against key
    private static final Object PRESENT = new Object();

    public HashSet() {
        map = new HashMap<>();
    }

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

Once HashMap put method is invoked, it stores the data(key-value pair) in bucket, by first evaluating hashcode of key, then identifies exact bucket(array index) and then do equals comparison of the key to store and the key of already stored key-value pairs in linked list present in that bucket.


So what HashMap does when put method is called,
  1. When "Key" is already present, then it replaces the old value, with new value and returns old value.
  2. When "Key" is not present, then it adds the key-value pair and returns null.

So when data is added in HashSet, say for example, 
  1. hashset.add("jayesh").

  2. add method will put it in map as map.put("jayesh", DUMMY_VALUE);

  3. If key "jayesh" is not present in map, then hashmap will add jayesh as key and value as DUMMY_VALUE and will return null, which is indication for hashset add method that key "jayesh" is not present in map previously.

    Hashset add method checks that if HashMap put method return null it means supplied data is entered for first time and add method returns true to caller, which means data successfully added in HashSet.

  4. If key "jayesh" is already present in map, then hashmap will return old value stored against key "jayesh" which will always be DUMMY_VALUE because value is always fixed in case of hashset. So when hashmap returns other than null, then this is indication for add method of hashset that there is already key "jayesh" present and need to return false.

    Hashset add method checks that if HashMap put method return not null it means supplied data is already entered previously and add method returns false to caller, which means data already present with same key and no new entry is created for supplied data
    (This is how HashSet eliminates duplicates as well).

HashSet Example


import java.util.HashSet;
import java.util.Set;

class Student{
 int rollNo;

 public Student(int rollNo) {
  this.rollNo = rollNo;
 }
 
 @Override
 public int hashCode() {
  return rollNo;
 }
 
 @Override
 public boolean equals(Object obj) {
  return this.rollNo == ((Student)obj).rollNo;
 }
}

public class HashMapWorks {
 
 public static void main(String[] args) {
  Set<Student> hashSet = new HashSet<Student>();
  System.out.println(hashSet.add(new Student(10))); //add returns True
  System.out.println(hashSet.add(new Student(20))); //add returns True
  System.out.println(hashSet.add(new Student(30))); //add returns True
  System.out.println(hashSet.add(new Student(20))); //add returns False
  System.out.println(hashSet.add(new Student(10))); //add returns False
  
  System.out.println(hashSet.size());
 }
}


Output:
true
true
true
false
false
3

 

True is indication that hashset has encountered supplied data for the first time and that is why it has stored supplied data successfully.
False is indication that hashset has encountered supplied data not for first time and that is why it is ignoring supplied data to store and failed.
HashSet Size is 3 because only data 10, 20 and 30 is saved successfully.

Is HashSet synchronized?


No. HashSet is not synchronized because it internally uses HashMap which is not synchronized.

Synchronized Set can be achieve by calling,
Set<String> synchronizedSet = Collections.synchronizedSet(new HashSet<String>());
OR
Set<String> synchronizedSet = Collections.newSetFromMap(new Hashtable<String, Boolean>());

You may also like to see


How ConcurrentHashMap works and ConcurrentHashMap interview questions.

How HashMap works

How hashcode and equals method work in hashmap

What is Load factor and Rehashing in Hashmap?

Time complexity of HashMap get and put operation.

When HashMap goes in infinite loop



Enjoy !!!! 

If you find any issue in post or face any error while implementing, Please comment.

Post a Comment