Hadoop: Sampel Dataset untuk test Hadoop
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
1.clearbits.net
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
grouplens.org
Dataset bisa di ambil di
http://grouplens.org/datasets/movielens/
hadoop-examples.jar randomwriter /random-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data
Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
hadoop-examples.jar randomtextwriter /random-text-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
Amazon
Mempunyai dataset dari berbagai bidang, seperti,
- Astronomi
- Biologi
- Kimia
- Cuaca
- Ekonomi
- Geografi
- Matematika
- dll
Cek
https://aws.amazon.com/datasets/ https://aws.amazon.com/1000genomes/ https://aws.amazon.com/datasets/common-crawl-corpus/
Stackoverflow
Cek jawaban di
http://stackoverflow.com/search?q=hadoop+dataset
7. University of Waitako
many data sets available for practicing machine learning.
Quora
Cek jawaban dari pertanyaan
https://www.quora.com/search?q=hadoop+dataset