Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
Onnowpurbo (talk | contribs)  | 
				Onnowpurbo (talk | contribs)   (→Quora)  | 
				||
| (11 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html  | Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html  | ||
| + | Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data).  | ||
| + | Kita dapat membuat sendiri dari hadoop-examples,  | ||
| − | |||
| + | ==hadoop-examples.jar randomwriter /random-data==  | ||
| − | + |  cd /usr/local/hadoop  | |
| + |  hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data  | ||
| − | + | Membuat 10 GB data per node didalam folder /random-data di HDFS.  | |
| + | Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.  | ||
| + | |||
| + | ==hadoop-examples.jar randomtextwriter /random-text-data==  | ||
| − | |||
| − | + |  cd /usr/local/hadoop  | |
| + |  hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data  | ||
| − | + | Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS  | |
| + | Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.  | ||
| − | |||
| − | + | Atau mengambilnya dari:  | |
| − | + | ||
| − | |||
| + | ==grouplens.org==  | ||
| − | + | Dataset bisa di ambil di  | |
| − | |||
| − | + |  http://grouplens.org/datasets/movielens/  | |
| − | |||
| − | + | ==Amazon==  | |
| − | + | Mempunyai dataset dari berbagai bidang, seperti,  | |
| − | + | * Astronomi  | |
| + | * Biologi  | ||
| + | * Kimia  | ||
| + | * Cuaca  | ||
| + | * Ekonomi  | ||
| + | * Geografi  | ||
| + | * Matematika  | ||
| + | * dll  | ||
| + | Cek  | ||
| − | + |  https://aws.amazon.com/datasets/  | |
| + |  https://aws.amazon.com/1000genomes/  | ||
| + |  https://aws.amazon.com/datasets/common-crawl-corpus/  | ||
| − | + | ==Stackoverflow==  | |
| − | + | Cek jawaban di  | |
| − | + |  http://stackoverflow.com/search?q=hadoop+dataset  | |
| − | + | ==University of Waitako, New Zealand==  | |
| + | Cek  | ||
| + |  http://www.cs.waikato.ac.nz/ml/weka/datasets.html  | ||
| + | ==Quora==  | ||
| + | Cek jawaban dari pertanyaan  | ||
| + |  https://www.quora.com/search?q=hadoop+dataset  | ||
| + | ==DataScienceCentral==  | ||
| + |  http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free  | ||
==Referensi==  | ==Referensi==  | ||
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html  | * http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html  | ||
Latest revision as of 20:01, 9 November 2015
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data). Kita dapat membuat sendiri dari hadoop-examples,
hadoop-examples.jar randomwriter /random-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data
Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
hadoop-examples.jar randomtextwriter /random-text-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
Atau mengambilnya dari:
grouplens.org
Dataset bisa di ambil di
http://grouplens.org/datasets/movielens/
Amazon
Mempunyai dataset dari berbagai bidang, seperti,
- Astronomi
 - Biologi
 - Kimia
 - Cuaca
 - Ekonomi
 - Geografi
 - Matematika
 - dll
 
Cek
https://aws.amazon.com/datasets/ https://aws.amazon.com/1000genomes/ https://aws.amazon.com/datasets/common-crawl-corpus/
Stackoverflow
Cek jawaban di
http://stackoverflow.com/search?q=hadoop+dataset
University of Waitako, New Zealand
Cek
http://www.cs.waikato.ac.nz/ml/weka/datasets.html
Quora
Cek jawaban dari pertanyaan
https://www.quora.com/search?q=hadoop+dataset
DataScienceCentral
http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free