Data Science Strategy: Definisi & Scope
Defining and Scoping a Data Science Strategy To understand the constituent parts of a data science strategy as well the strate- gy’s current and future significance, it’s worthwhile to look at some of the major components on a high level. I then address each of these different parts in detail throughout this book. But before that I need to make a short clarification of the difference between a data science strategy and a data strategy. On a high-level a data science strategy refers to the strategy you define with regards to the entire data science investment in our company. It includes areas such as overall data science objectives and strategic choices, regulatory strategies, data need, competences and skillsets, data architecture, as well as how to measure the outcome. The data strategy on the other hand, constitutes a subset of the data sci- ence strategy, and is focused on outlining the strategic direction directly related to the data. This includes areas such as data scope, data consent, legal, regulatory and ethical considerations, storage collection frequency, data storage retention periods, data management process and principles, and last, but not least; data governance. Both strategies are needed in order to succeed with your data science investment and should complement each other in order to work. The details of how to define the data strategy is captured in part 2 of this book. Objectives If I ask about the objectives of a data science strategy, I’m asking whether there are clear company objectives set and agreed on for any of the investments made in data science. Are the objectives formulated in a way that makes them possible to execute and measure success by? If not, then the objectives need to be reformu- lated; this is a critically important starting point that must be completed properly in order to succeed down the line. Data science is a new field that holds amazing opportunities for companies to drive a fundamental transformation, but it is complex and often not fully under- stood by top management. You should consider whether the executive team’s understanding of data science is sufficient to set the right targets or whether they need to be educated and then guided in setting their target. Whether you’re a manager or an employee in a small or large company, if you want your company to succeed with its data science investment, don’t sit and hope that the leadership of your company will understand what needs to be done. If you’re knowledgeable in the area, make your voice heard or, if you aren’t, don’t hesitate to accept help from those who have experience in the field. 26 PART 1 Optimizing Your Data Science InvestmentIf you decide to bring in external experts to assist you in your data science strategizing, be sure to read up on the area yourself first, so that you can judge the relevance of their recommendations for your business — the place where you are the expert. Approach Taking the right initial approach is a fundamental part of your data science strategy — it will determine whether your company takes the appropriate imple- mentation and transformation approach for the data science investment. For example, is the approach ambitious enough — or is it too ambitious, considering time estimates related to available competence? Is there a clear business strategy and expected value that the data science strategy can relate to? Taking the time to think through the approach is sure to pay off because, if you don’t know where you’re going, you are most unlikely to end up there. Choices The term choices here refers to the strategic choices necessary to drive the data science transformation forward. The strategy you create cannot be about doing everything. It’s equally important to make strategic choices about what to do as it is to make decisions about what not to do. Decisions can also be distributed differently over time, because the choices can be about starting with a particular business area or set of customers, learning from that experience, and then continuing to include other areas or customers. The same strategy applies to choices of data categories or types to focus on early rather than later on as the company matures and capabilities expand. Data Defining a data strategy is a cornerstone of the data science strategy — it includes all aspects related to the data, such as whether or not you understand the various types of data you need to access in order to achieve your business objectives. Is the data available? How will you approach data management and data storage? Have you set priorities on the data? Have you identified and set data quality targets? Another important aspect of data relates to data governance and security. Data will be one of your most valuable assets going forward; how you treat it is funda- mental to your company’s success. CHAPTER 1 Framing Data Science Strategy 27Legal Understanding the legal implications for the data you need in terms of access rights, ownership, and usage models is vital. If you aren’t on top of this aspect early on, you might find yourself in a situation where you cannot get hold of the data you need for your business without breaking the law, or, even if you can get hold of the data, you may realize that you cannot use it in the way you need in order to fulfil your business objectives. Laws and regulations related to data privacy stretch further than many people think, and they keep changing in order to protect people’s data integrity. This is good from a privacy perspective, but doesn’t always work well with data innova- tion. Therefore, as a good investment, you should always stay informed about laws and regulations related to the data needed for your business. Ethics Ethics, an area of growing importance, refers to the creation of clear ethical guide- lines for how data science is approached in the company. Internally, this term refers to securing a responsible approach to data usage and management when it comes to preserving the data privacy of your customers or other stakeholders. One way of protecting privacy is through anonymizing personal information in the data sets. Externally, insisting on the ethics of data science is vital when it comes to gaining your customers’ trust in how you handle data. When machine learning or artificial intelligence is introduced — especially when automation of decisions and preven- tive actions are involved — it touches on another ethical perspective: the “explain- ability” of algorithms. It refers to the idea that it must be possible to explain a decision or action taken by a machine. Machine learning or artificial intelligence cannot become an automated, black box execution by a machine. Humans must stay in control to secure the transparency of AI algorithms and ensure that ethical boundaries are kept. Competence Based on the objectives that are set, choices that are made, and approach that is chosen, you must ensure that you put the right competence in place to execute on your targets. Putting together an experienced and competent data science team is easier said than done. Why is that? Well, you really need three main categories of competencies, and the availability of experienced data scientists in the market is now very low, simply because few data scientists have the sufficient experience and because the demand for these types of competencies is very high. 28 PART 1 Optimizing Your Data Science InvestmentYou can’t get by with simply hiring only data scientists. Data engineers with a genuine understanding of the data in focus is fundamental. Without good data management, data scientists cannot perform their algorithmic magic. It’s as simple as that. Finally, you need to secure domain expertise for the targeted area, whether it’s a vast business understanding or an exceptional operational understanding. It’s absolutely crucial to have the domain experts working closely with the data engineers and the data scientists to achieve productive data science teams in your organization. Infrastructure When talking about infrastructure, it’s all about understanding what is needed in terms of data architecture and applications in order to enable a productive and innovative environment for your data science teams. It includes considering both a development environment (a workspace where you innovate, develop, train, and test new capabilities) and a production environment (a runtime environment where you deploy and run your solutions). Infrastructure includes all aspects, from how you’ll set up your data collection/ data ingest, anonymization, data storage, data management, and application layer with tools for the analytics and ML/AI development and production environment. It is impossible to identify and set up the perfect environment, especially because the technology evolution in this area is moving very fast. However, a vital part of the infrastructure setup is to avoid getting locked into a situation where you become entirely dependent on a certain infrastructure vendor (hardware, software, or cloud, for example). I don’t mean that you should only go for open source prod- ucts, but I do mean that you have to think carefully which building blocks you’re using and then make sure that they’re exchangeable in the long run, if needed. Governance and security Working actively with data governance and security will make sure you stay in control of data usage at all times. It isn’t important only in terms of gaining your customers’ trust, but it is in many cases also a necessity for following the law. Keeping track of which data is collected, stored, and used for which use cases is a minimum requirement for most types of data. Overworking the area of governance and security will have an impact on your data science productivity and innovation. A common mistake is to be overprotective with regard to data usage, keeping all data locked in to a degree that nobody can access what they need in order to do their job. Therefore, you should approach the CHAPTER 1 Framing Data Science Strategy 29setup of data governance and security with a mindset of openness when it comes to sharing data amongst employees within the organization. Lock the gates to outsiders, but strive for an open-data approach internally, boosting collaboration, reuse, and innovation. Commercial/business models As part of your company’s data science strategy, you need to consider whether you only want to focus your efforts internally as a means of improving operational efficiency or whether you have ambitions to utilize data science to improve your commercial business models. Improving your business using data science will absolutely expand your possibilities, both in improving current business as well as helping you find new opportunities. Tread carefully when commercializing data. If you haven’t transformed internally first by implementing data-driven operations, you’ll likely be unable to fully leverage a data science approach externally in the business perspective. That doesn’t mean you need to implement and run data-driven operations throughout the company, but such operations will be needed for the areas con- nected to the new data-science-based business models and commercial offerings you’re aiming to realize. Measurements Without measuring your success, how will you ever know whether you have actually achieved your objectives? Or be able to prove that. Still, many companies fail to think of measurements early on. Measurements are needed not only from an internal operational efficiency perspective but also to measure whether you have managed to deliver on the promises made to customers. This is important regardless of whether the agreed- on customer targets have been contracted or not. It should always be a priority for you to know how your business is performing against your objectives. The feed- back will give you all the information you need to determine where the business stands, what needs to improve, and what has perhaps already been achieved. Yes, establishing measurements early on is fundamental when it comes to secur- ing continuous learning in your company, but it also shows customers that you care about reaching your targets. However, don’t forget to think through the met- rics structure you plan to use. It isn’t an easy task to identify and define the correct set of metrics from the start. This is also something that needs to be reevaluated over time, based on which measurements actually give you the insights and feedback needed on what is going well — and what isn’t going so well.