Data Science Strategy: Memilah Konsep Machine Learning
Sorting Out the Concept of Machine Learning People often ask me to explain the difference between advanced analytics and machine learning and to say when it is advisable to go for one approach or the other. I always start out by defining machine learning. Machine learning (ML) is 22 PART 1 Optimizing Your Data Science Investmentthe scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to perform the task. So, here’s how advanced analytics and ML have some characteristics in common: » » Both advanced analytics and machine learning techniques are used for building and executing advanced mathematical and statistical models as well as building optimized models that can be used to predict events before they happen. » » Both methods use data to develop the models, and both require defined model policies. » » Automation can be used to run both analytics models and machine learning models after they’re put into production. What about the differences between advanced analytics and machine learning? » » There is a difference in who the actor is when creating your model. In an advanced analytics model, the actor is human; in a machine learning model, the actor is (obviously) a machine. » » There is also a difference in the model format. Analytics models are devel- oped and deployed with the human-defined design, whereas ML models are dynamic and change design and approach as they’re being trained by the data, optimizing the design along the way. Machine learning models can also be deployed as dynamic, which means that they continue to train, learn and optimize the design when exposed to real-life data and its live context. » » Another difference between analytical models and machine learning models regards the difference in how models are tested using data (for analytics) and trained using data (for machine learning). In analytics data is used to test that the defined outcome is achieved as expected, while in machine learning, the data is used to train the model to optimize its design depending on the nature of the data. » » Finally, the techniques and tools used to develop advanced analytics models and ML models differ. Machine learning modeling techniques are much more advanced and are built on other principles related to how the machine will learn to optimize the model performance. CHAPTER 1 Framing Data Science Strategy 23Figure 1-6 shows how the different models can be developed, tested, or trained and then deployed. As you can see, analytics models are always developed and tested in a static manner, where the human actor decides which statistical methods to use and how to test the model using the defined sample data set in order to reach the optimal model performance. And, regardless of how much data (or which data) you push through an analytical model, it stays the same until the human actor decides to correct or evolve the model. FIGURE 1-6: The difference in how development, training, and deployment are done for an analytical model versus a machine learning model. In ML development, a human actor also decides which technique or method to be used. Training methods in ML differ depending on which technique is used — you can use supervised learning, for example, or unsupervised learning, semi- supervised learning, reinforcement learning, or even deep learning, which is a more complex method. It’s even possible to combine two methods, like combining reinforcement learning with deep learning to what is referred to as deep rein- forcement learning. Instead of the static approach used in traditional model testing, with ML models you first train a model using a selected training data set that should represent the target environment where you intend to deploy the ML model. During the training, the model performance is tested to monitor the learning progress as well as measure the model accuracy. Within the scope of the chosen ML method, you then let the algorithm (machine actor) train itself on the training data set to reach the target that has been set. The machine then continues to train the ML model to evolve and find the most optimized model performance as long as you let it. The time will come when the model accuracy cannot be improved on using the training set. At that stage, you have to evaluate whether the model accuracy is good enough for deployment. 24 PART 1 Optimizing Your Data Science InvestmentIf you decide that a sufficient level of training has been reached by the machine actor, you need to decide how to deploy the model in the target environment, — deploy to production, in other words. You have two options at this point. You can decide that the model is sufficiently trained to achieve its purpose and that you can deploy it as a static model — meaning that it will no longer learn and optimize performance based on data, regardless of what changes occur in the target envi- ronment. Or, you can decide to deploy the ML model into production as a dynamic model, meaning that it will continue to evolve and optimize its performance driven by the data and behaviors that populate the model in the production envi- ronment. This is sometimes also referred to as online training. So, when should you go for what type of model and deployment approach? Well, it depends on many factors. As a guiding rule, you should never use ML if you can get the job done using an analytics approach. Why? For the same reason you don’t use a sledgehammer to drive a nail. You might perhaps succeed, but you can just as easily destroy the nail and hurt yourself, causing loss of time and money. When it comes to a static or dynamic deployment, it depends on the business model and whether the target environment is static (changes happen seldom and are usually minor) or dynamic (changes occur often and on a large scale). If you’re developing an algorithm to make online recommendations based on previous user behavior, for example, it’s necessary to deploy a dynamic ML model; otherwise, you cannot fulfil your objective. If, on the other hand, the purpose of the ML model is to let the machine find the optimal way to automate a set of complex tasks that you expect to stay the same over time, it is advisable to deploy the ML model as a static model in its target environment. Be aware that implementing ML models in live environments requires more resources from you. Machine learning training is complex and requires a lot of processing capacity as well as more monitoring of the ML model. You need to make sure that the ML model continues to perform as expected and doesn’t degrade or deviate from its objective as part of its live training. Another aspect to consider is the need to ensure that the model can interact with other dynamic ML models in the target environment without disturbing each other’s purpose or act in a way that leads to models canceling each other out. (What you’re doing here is often referred to as ensuring model interoperability.)