Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"

From OnnoWiki
Jump to navigation Jump to search
Line 30: Line 30:
  
 
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS
 
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS
 +
Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
  
 
==5. Amazon==
 
==5. Amazon==

Revision as of 18:06, 9 November 2015

Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html


To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.


1.clearbits.net

From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.

2.grouplens.org

grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.

If you have Hadoop installed on your machine,you can use the following two ways to generate data.

hadoop-examples.jar randomwriter /random-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data

Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.

hadoop-examples.jar randomtextwriter /random-text-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data

Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.

5. Amazon

provides so many data sets ,you can use them.

6. Stackoverflow

Check answers of the same question on stackoverflow


7. University of Waitako

many data sets available for practicing machine learning.

8. Quora

See answers for the similar question on Quora.

If you know any free data sets ,please share in comments





Referensi