Difference between revisions of "Big Data: 8 Trend"

From OnnoWiki
Jump to navigation Jump to search
Line 31: Line 31:
 
==More, better NoSQL==
 
==More, better NoSQL==
  
Alternatives to traditional SQL-based relational databases, called NoSQL (short for “Not Only SQL”) databases, are rapidly gaining popularity as tools for use in specific kinds of analytic applications, and that momentum will continue to grow, says Curran. He estimates that there are 15 to 20 open-source NoSQL databases out there, each with its own specialization. For example, a NoSQL product with graph database capability, such as ArangoDB, offers a faster, more direct way to analyze the network of relationships between customers or salespeople than does a relational database. “These databases have been around for a while, but they’re picking up steam because of the kinds of analyses people need,” he says. One PwC client in an emerging market has placed sensors on store shelving to monitor what products are there, how long customers handle them and how long shoppers stand in front of particular shelves. “These sensors are spewing off streams of data that will grow exponentially,” Curran says. “A NoSQL key-value pair database such as Redis is the place to go for this because it’s special-purpose, high-performance and lightweight.
+
Berbagai alternatif dari SQL-based relational database, di sebut NoSQL (kependekan dari “Not Only SQL”) database, saat ini memperoleh popularitas sebagai tool untuk aplikasi analitik yang spesifik.
  
 
==Deep learning==
 
==Deep learning==

Revision as of 06:10, 9 November 2015

Sumber: http://www.cio.com/article/2838172/data-analytics/8-big-trends-in-big-data-analytics.html


Big data analytics in the cloud

Hadoop, framework dan tool untuk memproses set data yang sangat besar, pada awalnya dirancang untuk bekerja pada kluster mesin fisik. Sekarang semakin banyak teknologi yang tersedia untuk pengolahan data di Cloud. Lebih murah untuk memperluas pada mesin virtual dari membeli mesin sendiri secara fisik dan mengelolanya sendiri.


Hadoop: The new enterprise data operating system

Distributed analytic frameworks, seperti MapReduce, berevolusi menjadi distributed resource manager yang perlahan akan membuat Hadoop menjadi general-purpose data operating system. Kita dapat melakukan berbagai manipulasi data maupun operasi analitik dengan memasukan data ke Hadoop sebagai distributed file storage system.

Artinya untuk sebuah perusahaan? Karena SQL, MapReduce, in-memory, stream processing, graph analytics dan berbagai beban akan dapat di jalankan oleh Hadoop dengan kinerja yang baik, semakin banyak usaha yang akan menggunakan Hadoop sebagai enterprise data hub.

Big data lakes

Teori database tradisional men-dikte agar kita mendisain data set sebelum memasukan daya. Dalam sebuah data lake (danau daya), kita memutar balikan cara ini. Artinya, kita akan mengambil semua sumber daya dan memasukan semua ke repository Hadoop yang besar, dan kita tidak men-disain data model sebelumnya.

Kita memberikan tool untuk orang melakukan analisa data, dengan definisi high-level akan data yang ada di lake (danau) tersebut. Orang akan membuat penerawangan terhadap data sambil jalan, jadi sangat perlahan, model organik untuk membuat database skala besar. Kesulitannya, orang yang melalukan ini harus sangat cakap / pandai.


More predictive analytics

Dengan big data, analis akan mempunyai lebih banyak data untuk bekerja, juga processing power untuk menangani record yang besar dengan banyak atribut. Mesin learning tradisional menggunakan analisa statistik berbasis pada sample dari total data set. Sekarang kita mempunyai kemampuan menganalisa record yang sangat besar dengan banyak atribut per record.


SQL on Hadoop: Faster, better

Jika anda seorang coder yang pandai dan ahli matematika, anda dapat memasukan semua data dan melakukan analisa di Hadoop. Agar memudahkan orang maka dikembangkan SQL untuk Hadoop. Apache Hive memberikan kemungkinan untuk menggunakan SQL-like query untuk Hadoop.

More, better NoSQL

Berbagai alternatif dari SQL-based relational database, di sebut NoSQL (kependekan dari “Not Only SQL”) database, saat ini memperoleh popularitas sebagai tool untuk aplikasi analitik yang spesifik.

Deep learning

Deep learning, a set of machine-learning techniques based on neural networking, is still evolving but shows great potential for solving business problems, says Hopkins. “Deep learning . . . enables computers to recognize items of interest in large quantities of unstructured and binary data, and to deduce relationships without needing specific models or programming instructions,” he says.

In one example, a deep learning algorithm that examined data from Wikipedia learned on its own that California and Texas are both states in the U.S. “It doesn’t have to be modeled to understand the concept of a state and country, and that’s a big difference between older machine learning and emerging deep learning methods,” Hopkins says.

“Big data will do things with lots of diverse and unstructured text using advanced analytic techniques like deep learning to help in ways that we only now are beginning to understand,” Hopkins says. For example, it could be used to recognize many different kinds of data, such as the shapes, colors and objects in a video — or even the presence of a cat within images, as a neural network built by Google famously did in 2012. “This notion of cognitive engagement, advanced analytics and the things it implies . . . are an important future trend,” Hopkins says.

In-memory analytics

The use of in-memory databases to speed up analytic processing is increasingly popular and highly beneficial in the right setting, says Beyer. In fact, many businesses are already leveraging hybrid transaction/analytical processing (HTAP) — allowing transactions and analytic processing to reside in the same in-memory database.

But there’s a lot of hype around HTAP, and businesses have been overusing it, Beyer says. For systems where the user needs to see the same data in the same way many times during the day — and there’s no significant change in the data — in-memory is a waste of money.

And while you can perform analytics faster with HTAP, all of the transactions must reside within the same database. The problem, says Beyer, is that most analytics efforts today are about putting transactions from many different systems together. “Just putting it all on one database goes back to this disproven belief that if you want to use HTAP for all of your analytics, it requires all of your transactions to be in one place,” he says. “You still have to integrate diverse data.”

Moreover, bringing in an in-memory database means there’s another product to manage, secure, and figure out how to integrate and scale.

For Intuit, the use of Spark has taken away some of the urge to embrace in-memory databases. “If we can solve 70% of our use cases with Spark infrastructure and an in-memory system could solve 100%, we’ll go with the 70% in our analytic cloud,” Loconzolo says. “So we will prototype, see if it’s ready and pause on in-memory systems internally right now.” Staying one step ahead

With so many emerging trends around big data and analytics, IT organizations need to create conditions that will allow analysts and data scientists to experiment. “You need a way to evaluate, prototype and eventually integrate some of these technologies into the business,” says Curran.

“IT managers and implementers cannot use lack of maturity as an excuse to halt experimentation,” says Beyer. Initially, only a few people — the most skilled analysts and data scientists — need to experiment. Then those advanced users and IT should jointly determine when to deliver new resources to the rest of the organization. And IT shouldn’t necessarily rein in analysts who want to move ahead full-throttle. Rather, Beyer says, IT needs to work with analysts to “put a variable-speed throttle on these new high-powered tools.”



Referensi