Data Science Strategy: Memperoleh Data
Getting Data from There to Here When a company decides to embark on a journey to become data driven, the focus is naturally on the data itself, which inevitably leads to a greater awareness of the actual variety of data needed to gain full proactive and data-driven control of their current business. On top of that, companies soon realize that in order to expand CHAPTER 3 Dealing with Difficult Challenges 41beyond what is possible today, the data sets need to become even more varied. At this point, many companies start to realize that the data which is fundamental to becoming truly data driven might actually belong to someone else or is located in another country, with other data regulations. This section explains how to strate- gically approach such practical challenges as part of your data acquisition. Handling dependencies on data owned by others Dealing with proprietary data is an unavoidable yet manageable challenge faced by any company striving toward becoming fully data-driven. Typically, what hap- pens is that you have identified and carefully specified all the data you need in your data strategy and when you then start looking into how to strategically approach capturing the data, you realize that you have a data ownership problem. If you use only data generated from your internal IT environment, you have, of course, less of a problem. If that’s the case, however, then your company probably isn’t truly data-driven in the proper sense. A data-driven business accounts for how its products and/or services are used and how it performs in real-life settings, not merely in the lab environment. And anytime you start using data generated by life in the real world, you run into the data ownership problem. What kind of data am I talking about? First and foremost, this involves data owned by your customers, but it can also include data owned by your customers’ custom- ers, depending on which business you’re in. You have to take the time to truly understand the detailed context of the data you need. It can relate to issues of data privacy, but it doesn’t have to. It can simply be the case that the data you need in order to better understand your business performance or potential belongs to someone else. Don’t get discouraged when it comes to ownership issues. Most situations can be solved from a legal perspective if you’re willing to address them openly with the data owners, explaining why you need the data and how you will treat the data after it’s in your possession. It’s all about gaining trust with regard to how, and for what purpose, the data will be used. (It wouldn’t hurt to also spell out how your work may, if possible, contribute back to the owners of the data.) At the end of the day, you need to be absolutely certain that you understand (and are complying with) the legal constraints that apply for each different type of data you intend to use. Your use of the data must also be regulated by way of a contrac- tual setup with the party owning the data, including what rights your company has related to data access, storage, and usage over time. 42 PART 1 Optimizing Your Data Science InvestmentLaws and regulations have a habit of changing over time. Lately, the trend is to increase restrictions even further in order to protect an individual’s right to their own data. One recent example is the quite restrictive General Data Protection and Regulation (GDPR) enacted by the European Union (EU) that went into effect in May 2018. Given recent news of the misuse of data by entities such as Cambridge Analytica and Facebook, the U.S. and Canada are definitely looking into legislation similar to the EU’s GDPR. Anything that helps to protect an individual’s right to privacy is all for the best, but just remember that the way you deal with privacy legislation today will most probably be quite different in the near future. Therefore, you should strategically and proactively think through your infrastructure setup and your data needs to ensure that you account for these types of constraints in your current and evolving data science environment.