Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"

From OnnoWiki
Jump to navigation Jump to search
(New page: Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html Datasets for practicing hadoop To practise Hadoop you can use below ways to generate the big data (GB)...)
 
Line 1: Line 1:
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
  
Datasets for practicing hadoop
 
  
 
To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
 
To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
  
  
1.clearbits.net
+
==1.clearbits.net==
  
 
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
 
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
  
2.grouplens.org
+
==2.grouplens.org==
      grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.
+
 
 +
grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.
  
 
If you have Hadoop installed on your machine,you can use the following two ways to generate data.
 
If you have Hadoop installed on your machine,you can use the following two ways to generate data.
  
3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data
+
==3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data==
  
  generates 10 GB data per node under folder /random-data in HDFS.
+
generates 10 GB data per node under folder /random-data in HDFS.
 
                  
 
                  
4.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data
+
==4.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data==
generates 10 GB textual data per node under folder /random-text-data in HDFS.
+
 
 +
generates 10 GB textual data per node under folder /random-text-data in HDFS.
  
 
path of hadoop-examples.jar may change as per your hadoop installation.
 
path of hadoop-examples.jar may change as per your hadoop installation.
  
5. Amazon provides so many data sets ,you can use them.
+
==5. Amazon==
 +
 
 +
provides so many data sets ,you can use them.
 +
 
 +
==6. Stackoverflow==
 +
 
 +
Check answers of the same question on stackoverflow
 +
 
 +
 
 +
==7. University of Waitako==
  
6. Check answers of the same question on stackoverflow
+
many data sets available for practicing machine learning.
  
7.From University of Waikato ,many data sets available for practicing machine learning.
+
==8. Quora==
  
8.See answers for the similar question on Quora.
+
See answers for the similar question on Quora.
  
 
If you know any free data sets ,please share in comments
 
If you know any free data sets ,please share in comments

Revision as of 15:42, 9 November 2015

Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html


To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.


1.clearbits.net

From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.

2.grouplens.org

grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.

If you have Hadoop installed on your machine,you can use the following two ways to generate data.

3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data

generates 10 GB data per node under folder /random-data in HDFS.

4.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data

generates 10 GB textual data per node under folder /random-text-data in HDFS.

path of hadoop-examples.jar may change as per your hadoop installation.

5. Amazon

provides so many data sets ,you can use them.

6. Stackoverflow

Check answers of the same question on stackoverflow


7. University of Waitako

many data sets available for practicing machine learning.

8. Quora

See answers for the similar question on Quora.

If you know any free data sets ,please share in comments





Referensi