Difference between revisions of "Hadoop: Giraph - Langkah Pertama"

From OnnoWiki
Jump to navigation Jump to search
(New page: Sumber: http://peter.grman.at/first-steps-with-apache-giraph/ First steps with Apache-Giraph Jan 27, 2014 • Peter I used Apache Giraph during a project for my studies. There is a gr...)
 
 
(One intermediate revision by the same user not shown)
Line 11: Line 11:
 
A friend of mine pointed me to her blog posts about Giraph:
 
A friend of mine pointed me to her blog posts about Giraph:
  
    Run Example in Giraph: Shortest Paths
+
* Run Example in Giraph: Shortest Paths http://marsty5.com/2013/04/29/run-example-in-giraph-shortest-paths/
    Run example in Giraph: PageRank
+
* Run example in Giraph: PageRank http://marsty5.com/2013/05/29/run-example-in-giraph-pagerank/
  
This gave me a clearer picture. I understood better, what is happening and how to make new things. (Think as a vertex, you are the vertex.) But there was still a clear explanation of the code missing. Ok, Apache Giraph is an open source project, and there is a giraph-examples folder in the repository, so let’s just have a look at it.
+
This gave me a clearer picture. I understood better, what is happening and how to make new things. (Think as a vertex, you are the vertex.) But there was still a clear explanation of the code missing. Ok, Apache Giraph is an open source project, and there is a giraph-examples folder in the repository
 +
 
 +
* https://github.com/apache/giraph/tree/release-1.0/giraph-examples/src/main/java/org/apache/giraph/examples
 +
 
 +
so let’s just have a look at it.
  
 
Making sense of the code for SimpleShortestPathVertex wasn’t too difficult. It is only few lines, which are relevant and if we remove the logging and add some comments it will be actually easy to grasp it:
 
Making sense of the code for SimpleShortestPathVertex wasn’t too difficult. It is only few lines, which are relevant and if we remove the logging and add some comments it will be actually easy to grasp it:
Line 56: Line 60:
 
In the upcoming days I’ll publish more blog posts about:
 
In the upcoming days I’ll publish more blog posts about:
  
    Implementing a simple hop-count (or modified shortest-path) algorithm in Giraph (source code)
+
* Implementing a simple hop-count (or modified shortest-path) algorithm in Giraph (source code)
    Implementing Ja-be-Ja algorithm in Giraph (source code)
+
* Implementing Ja-be-Ja algorithm in Giraph (source code)
    And why not to use Giraph for edge-centric algorithms
+
* And why not to use Giraph for edge-centric algorithms
  
  

Latest revision as of 05:45, 11 November 2015

Sumber: http://peter.grman.at/first-steps-with-apache-giraph/


First steps with Apache-Giraph

Jan 27, 2014 • Peter

I used Apache Giraph during a project for my studies. There is a great official Quick Start-Tutorial on how to set up Giraph on top of Apache Hadoop 0.20.203.0-RC1 and run your first example application. When you follow it, you will have a running environment where you can test your code. Still, you won’t have any idea, how to write it (at least I didn’t).

A friend of mine pointed me to her blog posts about Giraph:

This gave me a clearer picture. I understood better, what is happening and how to make new things. (Think as a vertex, you are the vertex.) But there was still a clear explanation of the code missing. Ok, Apache Giraph is an open source project, and there is a giraph-examples folder in the repository

so let’s just have a look at it.

Making sense of the code for SimpleShortestPathVertex wasn’t too difficult. It is only few lines, which are relevant and if we remove the logging and add some comments it will be actually easy to grasp it:

public void compute(Iterable<DoubleWritable> messages) {
  if (getSuperstep() == 0) {
    // initially, no node is reachable
    setValue(new DoubleWritable(Double.MAX_VALUE));
  }
  
  double minDist = isSource() ? 0d : Double.MAX_VALUE;
  // check the incoming messages 
  // and collect the minimal distance
  // (for the source this will be always 0)
  for (DoubleWritable message : messages) {
    minDist = Math.min(minDist, message.get());
  }
  
  // check if the new minimal distance is smaller 
  // than the original one (Double.MAX_VALUE initially)
  // this will be in the beginning only true for 
  // the source vector since its distance is always 0
  if (minDist < getValue().get()) {
    setValue(new DoubleWritable(minDist));
    for (Edge<LongWritable, FloatWritable> edge : getEdges()) {
      // send a message to all neighbours with 
      // the current distance from the source vector
      // + the distance to the neighbour
      double distance = minDist + edge.getValue().get();
      sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance));
    }
  }
  
  // task is finished, until new messages will arrive
  // in this case the vertex will be woken up again
  voteToHalt();
}

Now the problem with this example is, that it’s too easy. All messages and the internal state consist of only one Double-Value. In reality, however, this won’t be enough, and I just could not find anything that would describe the next steps. So I had to experiment and eventually wrote my own simple algorithm to learn how to write code for Giraph.

In the upcoming days I’ll publish more blog posts about:

  • Implementing a simple hop-count (or modified shortest-path) algorithm in Giraph (source code)
  • Implementing Ja-be-Ja algorithm in Giraph (source code)
  • And why not to use Giraph for edge-centric algorithms




Referensi