Difference between revisions of "Hadoop: Sampel Dataset untuk test Hadoop"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) (→Quora) |
||
| (12 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html | Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html | ||
| + | Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data). | ||
| + | Kita dapat membuat sendiri dari hadoop-examples, | ||
| − | |||
| + | ==hadoop-examples.jar randomwriter /random-data== | ||
| − | + | cd /usr/local/hadoop | |
| + | hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data | ||
| − | + | Membuat 10 GB data per node didalam folder /random-data di HDFS. | |
| + | Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB. | ||
| − | == | + | ==hadoop-examples.jar randomtextwriter /random-text-data== |
| − | |||
| − | + | cd /usr/local/hadoop | |
| + | hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data | ||
| − | + | Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS | |
| + | Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB. | ||
| − | |||
| − | |||
| − | |||
| − | + | Atau mengambilnya dari: | |
| − | |||
| − | == | + | ==grouplens.org== |
| − | + | Dataset bisa di ambil di | |
| − | + | http://grouplens.org/datasets/movielens/ | |
| − | |||
| + | ==Amazon== | ||
| − | + | Mempunyai dataset dari berbagai bidang, seperti, | |
| − | + | * Astronomi | |
| + | * Biologi | ||
| + | * Kimia | ||
| + | * Cuaca | ||
| + | * Ekonomi | ||
| + | * Geografi | ||
| + | * Matematika | ||
| + | * dll | ||
| − | + | Cek | |
| − | + | https://aws.amazon.com/datasets/ | |
| + | https://aws.amazon.com/1000genomes/ | ||
| + | https://aws.amazon.com/datasets/common-crawl-corpus/ | ||
| − | + | ==Stackoverflow== | |
| + | Cek jawaban di | ||
| + | http://stackoverflow.com/search?q=hadoop+dataset | ||
| + | ==University of Waitako, New Zealand== | ||
| + | Cek | ||
| + | http://www.cs.waikato.ac.nz/ml/weka/datasets.html | ||
| + | ==Quora== | ||
| + | Cek jawaban dari pertanyaan | ||
| + | |||
| + | https://www.quora.com/search?q=hadoop+dataset | ||
| + | |||
| + | ==DataScienceCentral== | ||
| + | |||
| + | http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free | ||
==Referensi== | ==Referensi== | ||
* http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html | * http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html | ||
Latest revision as of 20:01, 9 November 2015
Sumber: http://www.hadooplessons.info/2013/06/data-sets-for-practicing-hadoop.html
Untuk mem-praktekan Hadoop kita membutuhkan dataset yang besar (Big Data). Kita dapat membuat sendiri dari hadoop-examples,
hadoop-examples.jar randomwriter /random-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter /random-data
Membuat 10 GB data per node didalam folder /random-data di HDFS. Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
hadoop-examples.jar randomtextwriter /random-text-data
cd /usr/local/hadoop hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomtextwriter /random-text-data
Membuat 10 GB text data per node dibawah folder /random-text-data di HDFS Butuh waktu tidak sampai 1 menit di harddisk WD Red 6TB.
Atau mengambilnya dari:
grouplens.org
Dataset bisa di ambil di
http://grouplens.org/datasets/movielens/
Amazon
Mempunyai dataset dari berbagai bidang, seperti,
- Astronomi
- Biologi
- Kimia
- Cuaca
- Ekonomi
- Geografi
- Matematika
- dll
Cek
https://aws.amazon.com/datasets/ https://aws.amazon.com/1000genomes/ https://aws.amazon.com/datasets/common-crawl-corpus/
Stackoverflow
Cek jawaban di
http://stackoverflow.com/search?q=hadoop+dataset
University of Waitako, New Zealand
Cek
http://www.cs.waikato.ac.nz/ml/weka/datasets.html
Quora
Cek jawaban dari pertanyaan
https://www.quora.com/search?q=hadoop+dataset
DataScienceCentral
http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free