Aniruddha Sinha Aniruddha Sinha - 2 months ago 12
Java Question

Raw Comparator vs WritableComparable



compare()
and
compareTo()
work synonimously if we talk about sorting keys yet I just want to know that in the era of highly configured machines will there be any need to think about when to use
compare()
and when to use
compareTo()
?

If there is any need to think about any scenario where
compare(byte b1[],int s1,int l1, byte b2[],int s2,int l2)
has an edge over
compareTo(object key1,Object key2)
then please suggest the fields or the use cases or the types of problems where we really need to decide which one to use?

ThankYou !!

Answer

Use of RawComparator:

If you still want to optimize time taken by Map Reduce Job, then you have to use RawComparator.

Intermediate key value pairs have been passed from Mapper to Reducer. before these values reach Reducer from Mapper, shuffle and sorting steps will be performed.

Sorting is improved because the RawComparator will compare the keys by byte. If we did not use RawComparator, the intermediary keys would have to be completely de-serialized to perform a comparison.

Example:

public class IndexPairComparator extends WritableComparator {
protected IndexPairComparator() {
    super(IndexPair.class);
}

@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    int i1 = readInt(b1, s1);
    int i2 = readInt(b2, s2);

    int comp = (i1 < i2) ? -1 : (i1 == i2) ? 0 : 1;
    if(0 != comp)
        return comp;

    int j1 = readInt(b1, s1+4);
    int j2 = readInt(b2, s2+4);
    comp = (j1 < j2) ? -1 : (j1 == j2) ? 0 : 1;

    return comp;
}

}

In above example, we did not directly implement RawComparator. Instead we extended WritableComparator, which internally implements RawComparator.

Have a look at this article by Jee Vang

Implementation of RawComparator() in WritableComparator : Just compare the keys

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
     try {
      buffer.reset(b1, s1, l1);                   // parse key1
      key1.readFields(buffer);

      buffer.reset(b2, s2, l2);                   // parse key2
      key2.readFields(buffer);

    } catch (IOException e) {
      throw new RuntimeException(e);
    }

    return compare(key1, key2);                   // compare them
}

Have a look at source

Comments