Rahul Koshaley Rahul Koshaley - 4 months ago 12
Java Question

how to combine 3 pair RDDs

I have a sort of complex requirement

1) 1) for Pinterest

twitter handle , pinterest_post , pinterest_likes.

handle "what" , 7



JavaPairRDD<String ,Pinterest> PintRDD


2) for Instagram

Twitter handle , instargam_post , instagram_likes

handle "hello" , 10
handle2 "hi" ,20


JavaPairRDD<String ,Pinterest> instRDD


3) for ontologies

twitter handle , categories , sub_categories

handle , Products , MakeUp
handle , Products, MakeUp
handle2 , Services , Face

JavaPairRDD<String ,ontologies1> ontologiesPair


Final output should be

handle , "what" , 7 , "hello" , 10 , products , makeup
handle , "what" , 7 , "hello", 10 , products , makeup
handle2, , , "hi" ,20 , Services , Face


How Do I achieve it using spark

Answer

If I have the following classes:

public class Pinterest implements Serializable{

private static final long serialVersionUID = 1226764093455880169L;
 public String twitterHandle;
 public String pinterest_post ;
 public int pinterest_likes;

Pinterest(String pinterest_post,int pinterest_likes){

    this.pinterest_post=pinterest_post;
    this.pinterest_likes=pinterest_likes;
  }
}

public class Instagram implements Serializable {

    private static final long serialVersionUID = 7351892713578143761L;
    public String twitterHandle;
    public String instagram_post ;
    public int instagram_likes;

    Instagram(String instagram_post,int instagram_likes){

        this.instagram_post=instagram_post;
        this.instagram_likes=instagram_likes;
    }

}

public class Ontologies implements Serializable{

    private static final long serialVersionUID = 1996294848173720136L;
    public String twitterHandle;
    public String categories  ;
    public String sub_categories ;

    Ontologies(String categories,String sub_categories){

        this.categories=categories;
        this.sub_categories=sub_categories;
    }

}

And If we suppose that -1 is the null value for the variable that acts as a counter, this code resolves your problem:

    JavaPairRDD<String, Pinterest> pintRDD = sc
            .parallelizePairs(Arrays.asList(new Tuple2<String, Pinterest>("handle", new Pinterest("what", 7))));
    JavaPairRDD<String, Instagram> instRDD = sc
            .parallelizePairs(Arrays.asList(new Tuple2<String, Instagram>("handle", new Instagram("hello", 10)),
                    new Tuple2<String, Instagram>("handle2", new Instagram("Hi", 20))));
    JavaPairRDD<String, Ontologies> ontologiesPair = sc.parallelizePairs(
            Arrays.asList(new Tuple2<String, Ontologies>("handle", new Ontologies("Products", "MakeUp")),
                    new Tuple2<String, Ontologies>("handle2", new Ontologies("Service", "Face"))));
    JavaPairRDD<String, Tuple3<Iterable<Ontologies>, Iterable<Instagram>, Iterable<Pinterest>>> grouped = ontologiesPair
            .cogroup(instRDD, pintRDD);

    Map<String, Tuple3<Iterable<Ontologies>, Iterable<Instagram>, Iterable<Pinterest>>> mapResult = grouped
            .collectAsMap();

    for (Map.Entry<String, Tuple3<Iterable<Ontologies>, Iterable<Instagram>, Iterable<Pinterest>>> entry : mapResult
            .entrySet()) {

        Ontologies ontologies = new Ontologies("", "");
        Pinterest pinterest = new Pinterest("", -1);
        Instagram instagram = new Instagram("", -1);

        if (entry.getValue()._1().iterator().hasNext()) {
            ontologies = entry.getValue()._1().iterator().next();
        }

        if (entry.getValue()._2().iterator().hasNext()) {
            instagram = entry.getValue()._2().iterator().next();
        }

        if (entry.getValue()._3().iterator().hasNext()) {
            pinterest = entry.getValue()._3().iterator().next();
        }

        System.out.println(entry.getKey() + " " + pinterest.pinterest_post + " " + " " + pinterest.pinterest_likes
                + " " + instagram.instagram_post + " " + instagram.instagram_likes + " " + ontologies.categories
                + " " + ontologies.sub_categories);
    }
}

The result is:

handle what 7 hello 10 Products MakeUp

handle2 -1 Hi 20 Service Face

Comments