Data Science Strategy: Antara Machine Learning dan Traditional Programming

From OnnoWiki
Jump to navigation Jump to search

Dealing with the Difference between Machine Learning and Traditional Software Programming It is quite well established and commonly agreed in the software industry on what the actual difference is between traditional programming and machine learning. However, when it comes to how this difference should be handled, there’s little agreement. Given this division, I want to take the time to explain what to consider when it comes to your implementation approaches as well as how to deal with these differing viewpoints in terms of development aspects as well as the production envi- ronment. But first let me start you off by looking at what the argument’s all about. The traditional programming approach, shown in Figure 3-1, has you decide beforehand how to solve a certain problem by using the program being developed. The main target for the software developer is to build the requested functionality. FIGURE 3-1: The traditional programming approach. Based on the data and the program, the machine performs the analysis exactly the way you want, regardless of whether it’s the most optimized way to solve the problem. The assumption is that you-the-programmer (rather than the machine) know best how to solve the problem. On the other hand, when it comes to machine learning development, the starting point is to empower the machine to find the best solution when you set the bound- aries of which data to use and which outcome to achieve — and nothing more. (See Figure 3-2.) The assumption is that, given these conditions, the machine will find the most optimized program to solve the problem. FIGURE 3-2: A machine learning approach. CHAPTER 3 Dealing with Difficult Challenges 47So, what do these distinct approaches mean in terms of your development and pro- duction environments? One main aspect to consider is that traditional programming embraces a much stricter process. It’s rule-based and follows predefined design principles. The starting point for machine learning development, on the other hand, is much more explorative and open-ended. As you might have guessed, this will have quite a significant impact on how the development environment needs to be set up. Some companies have a tendency to downplay the impact of the development envi- ronment setup and which impact it will have on data science productivity. If you start from this vantage point, you may well conclude that you can use the same (or similar) infrastructure setup for both your traditional software development envi- ronment and your data science environment. Nothing could be further from the truth — taking such an approach means that you’re setting up major barriers toward achieving your goal of useful artificial intelligence/machine learning output. Traditional programming is much more restrictive when it comes to which programming languages to use and which principles to apply for what task. This, of course, impacts how both the development and production environments need to be set up. Figure 3-3 gives you a graphical representation of how traditional programming happens. FIGURE 3-3: The traditional programming flow. As you can see on the left side of Figure 3-3, traditional software programming can happen separately from both data and the development and test environment. It doesn’t have to happen separately, but the fact is that it can be done in isolation — even on a laptop in a coffee shop — and then integrated with other code in the development and test environment. At this point, data can be added to the model in order to achieve the desired output. 48 PART 1 Optimizing Your Data Science InvestmentFigure 3-3 also shows that the deployment of the software program is done in a separate environment (into a software/hardware product or similar production environment, for example) outside the development and test environment. Turning once again to a machine learning approach, you need to recognize that the explorative and learning nature of machine learning development requires the setup that’s available to your data scientists to be extremely flexible. Efficient data management, easy data access, and a variety of specialized machine learning tools must be easily available. Nobody walks into the process with predefined notions of exactly which machine learning technique to use, because all that needs to be explored and because the most optimized solution may become clear only after the process has started. As Figure 3-4 shows, machine learning development cannot happen in isolation and without the data. It all starts and ends with the data in a machine learning development flow because the data itself is what trains the model for an optimized design. For this to work, you obviously have to have a constant data flow, which means that you need a stable data pipeline — preferably, a virtualized one that offers more infrastructure flexibility over time. FIGURE 3-4: A machine learning flow. For machine learning virtualized production environments not implemented on the edge (inside IoT devices like a mobile phone, a car, a watch, a fridge, or other types of devices that are connected and where a ML algorithm can run, in other words), try to keep your development and production environment close or as part of the same infrastructure setup. This facilitates machine learning productivity when moving between development and production, with faster and more effi- cient feedback loops as part of the benefits. You also gain a cost efficiency benefit when you don’t need to duplicate the infrastructure, because both are built on the same data pipeline. CHAPTER 3 Dealing with Difficult Challenges 49