Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"

From OnnoWiki
Jump to navigation Jump to search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
  
 +
Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data).
 +
Kita dapat membuat sendiri dari hadoop-examples,
  
To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
 
  
 +
==hadoop-examples.jar randomwriter /random-data==
  
==1.clearbits.net==
+
cd /usr/local/hadoop
 +
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data
  
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
+
Membuat 10 GB data per node didalam folder /random-data di HDFS.
 +
Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
 +
 
 +
==hadoop-examples.jar randomtextwriter /random-text-data==
  
==2.grouplens.org==
 
  
grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.
+
cd /usr/local/hadoop
 +
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
  
If you have Hadoop installed on your machine,you can use the following two ways to generate data.
+
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS
 +
Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
  
==3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data==
 
  
generates 10 GB data per node under folder /random-data in HDFS.
+
Atau mengambilnya dari:
               
+
 
==hadoop-examples.jar randomtextwriter /random-text-data==
 
  
 +
==grouplens.org==
  
cd /usr/local/hadoop
+
Dataset bisa di ambil di
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
 
  
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS
+
http://grouplens.org/datasets/movielens/
  
==5. Amazon==
 
  
provides so many data sets ,you can use them.
+
==Amazon==
  
==6. Stackoverflow==
+
Mempunyai dataset dari berbagai bidang, seperti,
  
Check answers of the same question on stackoverflow
+
* Astronomi
 +
* Biologi
 +
* Kimia
 +
* Cuaca
 +
* Ekonomi
 +
* Geografi
 +
* Matematika
 +
* dll
  
 +
Cek
  
==7. University of Waitako==
+
https://aws.amazon.com/datasets/
 +
https://aws.amazon.com/1000genomes/
 +
https://aws.amazon.com/datasets/common-crawl-corpus/
  
many data sets available for practicing machine learning.
+
==Stackoverflow==
  
==8. Quora==
+
Cek jawaban di
  
See answers for the similar question on Quora.
+
http://stackoverflow.com/search?q=hadoop+dataset
  
If you know any free data sets ,please share in comments
+
==University of Waitako, New Zealand==
  
 +
Cek
  
 +
http://www.cs.waikato.ac.nz/ml/weka/datasets.html
  
 +
==Quora==
  
 +
Cek jawaban dari pertanyaan
  
 +
https://www.quora.com/search?q=hadoop+dataset
  
 +
==DataScienceCentral==
  
 +
http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free
  
 
==Referensi==
 
==Referensi==
  
 
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
 
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html

Latest revision as of 20:01, 9 November 2015

Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html

Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data). Kita dapat membuat sendiri dari hadoop-examples,


hadoop-examples.jar randomwriter /random-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data

Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.

hadoop-examples.jar randomtextwriter /random-text-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data

Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.


Atau mengambilnya dari:


grouplens.org

Dataset bisa di ambil di

http://grouplens.org/datasets/movielens/


Amazon

Mempunyai dataset dari berbagai bidang, seperti,

  • Astronomi
  • Biologi
  • Kimia
  • Cuaca
  • Ekonomi
  • Geografi
  • Matematika
  • dll

Cek

https://aws.amazon.com/datasets/
https://aws.amazon.com/1000genomes/
https://aws.amazon.com/datasets/common-crawl-corpus/

Stackoverflow

Cek jawaban di

http://stackoverflow.com/search?q=hadoop+dataset

University of Waitako, New Zealand

Cek

http://www.cs.waikato.ac.nz/ml/weka/datasets.html

Quora

Cek jawaban dari pertanyaan

https://www.quora.com/search?q=hadoop+dataset

DataScienceCentral

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

Referensi