Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"

From OnnoWiki
Jump to navigation Jump to search
(New page: Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html Datasets for practicing hadoop To practise Hadoop you can use below ways to generate the big data (GB)...)
 
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
 
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
  
Datasets for practicing hadoop
+
Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data).
 +
Kita dapat membuat sendiri dari hadoop-examples,
  
To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
 
  
 +
==hadoop-examples.jar randomwriter /random-data==
  
1.clearbits.net
+
cd /usr/local/hadoop
 +
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data
  
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
+
Membuat 10 GB data per node didalam folder /random-data di HDFS.
 +
Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
  
2.grouplens.org
+
==hadoop-examples.jar randomtextwriter /random-text-data==
      grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.
 
  
If you have Hadoop installed on your machine,you can use the following two ways to generate data.
 
  
3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data
+
cd /usr/local/hadoop
 +
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
  
  generates 10 GB data per node under folder /random-data in HDFS.
+
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS
               
+
Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
4.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data
 
generates 10 GB textual data per node under folder /random-text-data in HDFS.
 
  
path of hadoop-examples.jar may change as per your hadoop installation.
 
  
5. Amazon provides so many data sets ,you can use them.
+
Atau mengambilnya dari:
  
6. Check answers of the same question on stackoverflow
 
  
7.From University of Waikato ,many data sets available for practicing machine learning.
+
==grouplens.org==
  
8.See answers for the similar question on Quora.
+
Dataset bisa di ambil di
  
If you know any free data sets ,please share in comments
+
http://grouplens.org/datasets/movielens/
  
  
 +
==Amazon==
  
 +
Mempunyai dataset dari berbagai bidang, seperti,
  
 +
* Astronomi
 +
* Biologi
 +
* Kimia
 +
* Cuaca
 +
* Ekonomi
 +
* Geografi
 +
* Matematika
 +
* dll
  
 +
Cek
  
 +
https://aws.amazon.com/datasets/
 +
https://aws.amazon.com/1000genomes/
 +
https://aws.amazon.com/datasets/common-crawl-corpus/
  
 +
==Stackoverflow==
 +
 +
Cek jawaban di
 +
 +
http://stackoverflow.com/search?q=hadoop+dataset
 +
 +
==University of Waitako, New Zealand==
 +
 +
Cek
 +
 +
http://www.cs.waikato.ac.nz/ml/weka/datasets.html
 +
 +
==Quora==
 +
 +
Cek jawaban dari pertanyaan
 +
 +
https://www.quora.com/search?q=hadoop+dataset
 +
 +
==DataScienceCentral==
 +
 +
http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free
  
 
==Referensi==
 
==Referensi==
  
 
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
 
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html

Latest revision as of 20:01, 9 November 2015

Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html

Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data). Kita dapat membuat sendiri dari hadoop-examples,


hadoop-examples.jar randomwriter /random-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data

Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.

hadoop-examples.jar randomtextwriter /random-text-data

cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data

Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.


Atau mengambilnya dari:


grouplens.org

Dataset bisa di ambil di

http://grouplens.org/datasets/movielens/


Amazon

Mempunyai dataset dari berbagai bidang, seperti,

  • Astronomi
  • Biologi
  • Kimia
  • Cuaca
  • Ekonomi
  • Geografi
  • Matematika
  • dll

Cek

https://aws.amazon.com/datasets/
https://aws.amazon.com/1000genomes/
https://aws.amazon.com/datasets/common-crawl-corpus/

Stackoverflow

Cek jawaban di

http://stackoverflow.com/search?q=hadoop+dataset

University of Waitako, New Zealand

Cek

http://www.cs.waikato.ac.nz/ml/weka/datasets.html

Quora

Cek jawaban dari pertanyaan

https://www.quora.com/search?q=hadoop+dataset

DataScienceCentral

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

Referensi