Difference between revisions of "Data science"

From OnnoWiki
Jump to navigation Jump to search
Line 13: Line 13:
  
  
Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.<ref name="Hayashi" /> It employs techniques and theories drawn from many fields within the context of [[mathematics]], [[statistics]], [[information science]], and [[computer science]].
+
Data science adalah "konsep untuk menyatukan statistik, analisis data, pembelajaran mesin dan metode terkait" untuk "memahami dan menganalisis fenomena aktual" dengan data.Ini menggunakan teknik dan teori yang diambil dari banyak bidang dalam konteks [[matematika]], [[statistik]], [[infformation science]], dan [[ilmu komputer]].
  
[[Turing award]] winner [[Jim Gray (computer scientist)|Jim Gray]] imagined data science as a "fourth paradigm" of science ([[Empirical research|empirical]], [[Basic research|theoretical]], [[computational science|computational]] and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the [[information explosion|data deluge]].<ref name="TansleyTolle2009">{{cite book|author1=Stewart Tansley|author2=Kristin Michele Tolle|title=The Fourth Paradigm: Data-intensive Scientific Discovery|url=https://books.google.com/books?id=oGs_AQAAIAAJ|year=2009|publisher=Microsoft Research|isbn=978-0-9825442-0-4}}</ref><ref name="BellHey2009">{{cite journal|last1=Bell|first1=G.|last2=Hey|first2=T.|last3=Szalay|first3=A.|title=COMPUTER SCIENCE: Beyond the Data Deluge|journal=Science|volume=323|issue=5919|year=2009|pages=1297–1298|issn=0036-8075|doi=10.1126/science.1170411}}</ref>
+
[[Turing award]] winner [[Jim Gray (computer scientist)|Jim Gray]] imagined data science as a "fourth paradigm" of science ([[Empirical research|empirical]], [[Basic research|theoretical]], [[computational science|computational]] and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the [[information explosion|data deluge]].
  
In 2012, when [[Harvard Business Review]] called it "The Sexiest Job of the 21st Century",<ref name="Harvard" /> the term "data science" became a [[buzzword]].  It is now often used interchangeably with earlier concepts like [[business analytics]],<ref name="GilPress" /> [[business intelligence]], [[Predictive modelling|predictive modeling]], and [[statistics]].  Even the suggestion that data science is sexy was paraphrasing [[Hans Rosling]], featured in a [https://www.bbc.co.uk/programmes/b00wgq0l 2011 BBC documentary] with the quote, "Statistics is now the sexiest subject around."<ref>{{cite news|url=https://www.nytimes.com/2011/04/03/business/03stream.html|title=When the Data Struts Its Stuff|first=Natasha|last=Singer|date=2011-04-02|access-date=2018-09-01|language=en-US}}</ref> [[Nate Silver]] referred to data science as a sexed up term for statistics.<ref name="NateSilver" /><nowiki>   In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute[d] beyond usefulness."</nowiki><ref>{{Cite news|url=http://radar.oreilly.com/2011/05/data-science-terminology.html|title=Why the term "data science" is flawed but useful|last=Warden|first=Pete|date=2011-05-09|work=O'Reilly Radar|access-date=2018-05-20|language=en-US}}</ref> While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents.<ref name="GilPress" /> To its discredit, however, many data-science and [[big data|big-data]] projects fail to deliver useful results, often as a result of poor management and utilization of resources.<ref>{{Cite news|url=https://hbr.org/2018/01/are-you-setting-your-data-scientists-up-to-fail|title=Are You Setting Your Data Scientists Up to Fail?|date=2018-01-25|work=Harvard Business Review|access-date=2018-05-26}}</ref><ref>{{Cite web|url=https://www.consultancy.uk/news/16839/70-of-big-data-projects-in-uk-fail-to-realise-full-potential|title=70% of Big Data projects in UK fail to realise full potential|website=www.consultancy.uk|language=en|access-date=2018-05-26}}</ref><ref>{{Cite news|url=http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/|title=The Data Economy: Why do so many analytics projects fail? – Analytics Magazine|date=2014-07-07|work=Analytics Magazine|access-date=2018-05-26|language=en-US}}</ref><ref>{{Cite web|url=https://www.kdnuggets.com/2018/05/data-science-4-reasons-failing-deliver.html|title=Data Science: 4 Reasons Why Most Are Failing to Deliver|website=www.kdnuggets.com|language=en-US|access-date=2018-05-26}}</ref>
+
In 2012, when [[Harvard Business Review]] called it "The Sexiest Job of the 21st Century", the term "data science" became a [[buzzword]].  It is now often used interchangeably with earlier concepts like [[business analytics]], [[business intelligence]], [[Predictive modelling|predictive modeling]], and [[statistics]].  Even the suggestion that data science is sexy was paraphrasing [[Hans Rosling]], featured in a [https://www.bbc.co.uk/programmes/b00wgq0l 2011 BBC documentary] with the quote, "Statistics is now the sexiest subject around." [[Nate Silver]] referred to data science as a sexed up term for statistics.  In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute[d] beyond usefulness."
 +
 
 +
While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and [[big data|big-data]] projects fail to deliver useful results, often as a result of poor management and utilization of resources.
  
 
==History==
 
==History==
The term "data science" has appeared in various contexts over the past thirty years but did not become an established term until recently. In an early usage, it was used as a substitute for [[computer science]] by [[Peter Naur]] in 1960. Naur later introduced the term "[[datalogy]]".<ref>{{cite journal|last1=Naur|first1=Peter|title=The science of datalogy|journal=Communications of the ACM|date=1 July 1966|volume=9|issue=7|pages=485|doi=10.1145/365719.366510}}</ref> In 1974, Naur published ''Concise Survey of Computer Methods'', which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
 
  
In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe for their biennial conference. Here, for the first time, the term data science is included in the title of the conference ("Data Science, classification, and related methods"),<ref>{{cite web|url=https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/|title=A Very Short History Of Data Science|first=Gil|last=Press}}</ref> after the term was introduced in a roundtable discussion by Chikio Hayashi.<ref name="Hayashi">{{Cite book|chapter-url=https://link.springer.com/chapter/10.1007/978-4-431-65950-1_3|url=https://www.springer.com/book/9784431702085|title=Data Science, Classification, and Related Methods|last=Hayashi|first=Chikio|date=1998-01-01|publisher=Springer Japan|isbn=9784431702085|editor-last=Hayashi|editor-first=Chikio|series=Studies in Classification, Data Analysis, and Knowledge Organization|location=|pages=40–51|language=en|chapter=What is Data Science? Fundamental Concepts and a Heuristic Example|doi=10.1007/978-4-431-65950-1_3|editor-last2=Yajima|editor-first2=Keiji|editor-last3=Bock|editor-first3=Hans-Hermann|editor-last4=Ohsumi|editor-first4=Noboru|editor-last5=Tanaka|editor-first5=Yutaka|editor-last6=Baba|editor-first6=Yasumasa}}</ref>
+
The term "data science" has appeared in various contexts over the past thirty years but did not become an established term until recently. In an early usage, it was used as a substitute for [[computer science]] by [[Peter Naur]] in 1960. Naur later introduced the term "[[datalogy]]". In 1974, Naur published ''Concise Survey of Computer Methods'', which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
 +
 
 +
In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe for their biennial conference. Here, for the first time, the term data science is included in the title of the conference ("Data Science, classification, and related methods"), after the term was introduced in a roundtable discussion by Chikio Hayashi.
 +
 
 +
In November 1997, [[C.F. Jeff Wu]] gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the [[University of Michigan]].
  
In November 1997, [[C.F. Jeff Wu]] gave the inaugural lecture entitled "Statistics = Data Science?"<ref name="cfjwutk">{{cite web|last=Wu|first=C. F. J. (1997)|title=Statistics = Data Science?|url=http://www2.isye.gatech.edu/~jeffwu/presentations/datascience.pdf|
 
accessdate=9 October 2014}}</ref> for his appointment to the H. C. Carver Professorship at the [[University of Michigan]].<ref name="cfjwu01">{{cite web|title=Identity of statistics in science examined|publisher=The University Records, 9 November 1997, The University of Michigan|url=http://ur.umich.edu/9899/Nov09_98/4.htm|accessdate=12 August 2013}}</ref>
 
 
In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In his conclusion,
 
In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In his conclusion,
he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.<ref name="cfjwutk"/>
+
he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.
Later, he presented his lecture entitled "Statistics = Data Science?" as the first of his 1998 P.C. Mahalanobis Memorial Lectures.<ref name="cfjwu02">{{cite web|url=http://www.isical.ac.in/~statmath/html/pcm/pcm_recent.html|title=P.C. Mahalanobis Memorial Lectures, 7th series|last=|first=|date=|website=|publisher=P.C. Mahalanobis Memorial Lectures, Indian Statistical Institute|archive-url=https://web.archive.org/web/20131029191813/http://www.isical.ac.in/~statmath/html/pcm/pcm_recent.html|archive-date=29 October 2013|dead-url=|accessdate=18 Jul 2017}}</ref> These lectures honor [[Prasanta Chandra Mahalanobis]], an Indian scientist and statistician and founder of the [[Indian Statistical Institute]].
+
Later, he presented his lecture entitled "Statistics = Data Science?" as the first of his 1998 P.C. Mahalanobis Memorial Lectures. These lectures honor [[Prasanta Chandra Mahalanobis]], an Indian scientist and statistician and founder of the [[Indian Statistical Institute]].
  
In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Volume 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique.<ref name="cleveland01">Cleveland, W. S. (2001). [https://pdfs.semanticscholar.org/915c/d8e2b39eb02723553913d592b2237d4d9960.pdf Data science: an action plan for expanding the technical areas of the field of statistics]. International Statistical Review / Revue Internationale de Statistique, 21–26</ref> In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
+
In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Volume 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
  
 
In April 2002, the International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA)<ref name="ics12">International Council for Science: Committee on Data for Science and Technology. (2012, April). CODATA, The Committee on Data for Science and Technology. Retrieved from International Council for Science : Committee on Data for Science and Technology: http://www.codata.org/</ref> started the ''Data Science Journal'',<ref name="dsj12">Data Science Journal. (2012, April). Available Volumes. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/_vols {{Webarchive|url=https://web.archive.org/web/20120403153707/http://www.jstage.jst.go.jp/browse/dsj/_vols |date=3 April 2012 }}</ref> a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues.<ref name="dsj02">Data Science Journal. (2002, April). Contents of Volume 1, Issue 1, April 2002. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/1/0/_contents</ref> Shortly thereafter, in January 2003, Columbia University began publishing ''The Journal of Data Science'',<ref name="jds03">The Journal of Data Science. (2003, January). Contents of Volume 1, Issue 1, January 2003. Retrieved from http://www.jds-online.com/v1-1</ref> which provided a platform for all data workers to present their views and exchange ideas. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."<ref>{{cite web|last=National Science Board|title=Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century|url=http://www.nsf.gov/pubs/2005/nsb0540/|publisher=National Science Foundation|accessdate=30 June 2013}}</ref>
 
In April 2002, the International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA)<ref name="ics12">International Council for Science: Committee on Data for Science and Technology. (2012, April). CODATA, The Committee on Data for Science and Technology. Retrieved from International Council for Science : Committee on Data for Science and Technology: http://www.codata.org/</ref> started the ''Data Science Journal'',<ref name="dsj12">Data Science Journal. (2012, April). Available Volumes. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/_vols {{Webarchive|url=https://web.archive.org/web/20120403153707/http://www.jstage.jst.go.jp/browse/dsj/_vols |date=3 April 2012 }}</ref> a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues.<ref name="dsj02">Data Science Journal. (2002, April). Contents of Volume 1, Issue 1, April 2002. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/1/0/_contents</ref> Shortly thereafter, in January 2003, Columbia University began publishing ''The Journal of Data Science'',<ref name="jds03">The Journal of Data Science. (2003, January). Contents of Volume 1, Issue 1, January 2003. Retrieved from http://www.jds-online.com/v1-1</ref> which provided a platform for all data workers to present their views and exchange ideas. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."<ref>{{cite web|last=National Science Board|title=Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century|url=http://www.nsf.gov/pubs/2005/nsb0540/|publisher=National Science Foundation|accessdate=30 June 2013}}</ref>
Line 47: Line 50:
 
In an effort similar to Dhar's, Stanford professor [[David Donoho]], in September 2015, takes the proposition further by rejecting three simplistic and misleading definitions of data science in lieu of criticisms.<ref name=":2">{{Cite journal|last=Donoho|first=David|date=September 2015|title=50 Years of Data Science|url=http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf|journal=Based on a talk at Tukey Centennial workshop, Princeton NJ Sept 18 2015|volume=|pages=|via=}}</ref> First, for Donoho, data science does not equate to [[big data]], in that the size of the data set is not a criterion to distinguish data science and statistics.<ref name=":2" /> Second, data science is not defined by the [[computing]] skills of sorting big data sets, in that these skills are already generally used for analyses across all disciplines.<ref name=":2" /> Third, data science is a heavily applied field where [[Graduate school|academic programs]] right now do not sufficiently prepare data scientists for the jobs, in that many [[Graduate school|graduate programs]] misleadingly advertise their analytics and statistics training as the essence of a data science program.<ref name=":2" /><ref>{{Cite book|title=The Culture of Big Data|last=Barlow|first=Mike|publisher=O'Reilly Media, Inc.|year=2013|isbn=|location=|pages=}}</ref> As a [[statistician]], [[David Donoho|Donoho]], following many in his field, champions the broadening of learning scope in the form of data science,<ref name=":2" /> like John Chambers who urges statisticians to adopt an inclusive concept of learning from data,<ref>{{Cite journal|last=Chambers|first=John M.|date=1993-12-01|title=Greater or lesser statistics: a choice for future research|url=https://link.springer.com/article/10.1007/BF00141776|journal=Statistics and Computing|language=en|volume=3|issue=4|pages=182–184|doi=10.1007/BF00141776|issn=0960-3174}}</ref> or like William Cleveland who urges to prioritize extracting from data applicable [[Predictive modelling|predictive tools]] over [[Explanatory model|explanatory theories.]]<ref name="cleveland01" /> Together, these [[statistician]]s envision an increasingly inclusive applied field that grows out of traditional [[statistics]] and beyond.
 
In an effort similar to Dhar's, Stanford professor [[David Donoho]], in September 2015, takes the proposition further by rejecting three simplistic and misleading definitions of data science in lieu of criticisms.<ref name=":2">{{Cite journal|last=Donoho|first=David|date=September 2015|title=50 Years of Data Science|url=http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf|journal=Based on a talk at Tukey Centennial workshop, Princeton NJ Sept 18 2015|volume=|pages=|via=}}</ref> First, for Donoho, data science does not equate to [[big data]], in that the size of the data set is not a criterion to distinguish data science and statistics.<ref name=":2" /> Second, data science is not defined by the [[computing]] skills of sorting big data sets, in that these skills are already generally used for analyses across all disciplines.<ref name=":2" /> Third, data science is a heavily applied field where [[Graduate school|academic programs]] right now do not sufficiently prepare data scientists for the jobs, in that many [[Graduate school|graduate programs]] misleadingly advertise their analytics and statistics training as the essence of a data science program.<ref name=":2" /><ref>{{Cite book|title=The Culture of Big Data|last=Barlow|first=Mike|publisher=O'Reilly Media, Inc.|year=2013|isbn=|location=|pages=}}</ref> As a [[statistician]], [[David Donoho|Donoho]], following many in his field, champions the broadening of learning scope in the form of data science,<ref name=":2" /> like John Chambers who urges statisticians to adopt an inclusive concept of learning from data,<ref>{{Cite journal|last=Chambers|first=John M.|date=1993-12-01|title=Greater or lesser statistics: a choice for future research|url=https://link.springer.com/article/10.1007/BF00141776|journal=Statistics and Computing|language=en|volume=3|issue=4|pages=182–184|doi=10.1007/BF00141776|issn=0960-3174}}</ref> or like William Cleveland who urges to prioritize extracting from data applicable [[Predictive modelling|predictive tools]] over [[Explanatory model|explanatory theories.]]<ref name="cleveland01" /> Together, these [[statistician]]s envision an increasingly inclusive applied field that grows out of traditional [[statistics]] and beyond.
  
For the future of data science, Donoho projects an ever-growing environment for [[open science]] where data sets used for [[Academic publishing|academic publications]] are accessible to all researchers.<ref name=":2" /> [[National Institutes of Health|US National Institute of Health]] has already announced plans to enhance reproducibility and transparency of research data.<ref>{{Cite journal|last=Collins|first=Francis S.|last2=Tabak|first2=Lawrence A.|date=2014-01-30|title=NIH plans to enhance reproducibility|journal=Nature|volume=505|issue=7485|pages=612–613|issn=0028-0836|pmc=4058759|pmid=24482835|doi=10.1038/505612a}}</ref> Other big [[Academic journal|journals]] are likewise following suit.<ref>{{Cite journal|last=McNutt|first=Marcia|date=2014-01-17|title=Reproducibility|url=http://science.sciencemag.org/content/343/6168/229|journal=Science|language=en|volume=343|issue=6168|pages=229–229|doi=10.1126/science.1250475|issn=0036-8075|pmid=24436391}}</ref><ref>{{Cite journal|last=Peng|first=Roger D.|date=2009-07-01|title=Reproducible research and Biostatistics|url=https://academic.oup.com/biostatistics/article/10/3/405/293660|journal=Biostatistics|language=en|volume=10|issue=3|pages=405–408|doi=10.1093/biostatistics/kxp014|issn=1465-4644}}</ref> This way, the future of data science not only exceeds the boundary of [[Statistical theory|statistical theories]] in scale and methodology, but data science will revolutionize current academia and [[Paradigm|research paradigms]].<ref name=":2" /> As Donoho concludes, "the scope and impact of data science will continue to expand enormously in coming decades as scientific data and data about science itself become ubiquitously available."<ref name=":2" />
+
For the future of data science, Donoho projects an ever-growing environment for [[open science]] where data sets used for [[Academic publishing|academic publications]] are accessible to all researchers. [[National Institutes of Health|US National Institute of Health]] has already announced plans to enhance reproducibility and transparency of research data.
 +
 
 +
Other big [[Academic journal|journals]] are likewise following suit. This way, the future of data science not only exceeds the boundary of [[Statistical theory|statistical theories]] in scale and methodology, but data science will revolutionize current academia and [[Paradigm|research paradigms]]. As Donoho concludes, "the scope and impact of data science will continue to expand enormously in coming decades as scientific data and data about science itself become ubiquitously available."
  
 
==See also==
 
==See also==

Revision as of 12:40, 3 December 2019


Data-science-from-zero.jpg



Image bc1fdaa7-c81d-4c41-a313-b1236e4c847720191203 114823.jpg


Data Science adalah bidang interdisiplin yang menggunakan metode, proses, algoritma dan sistem ilmiah untuk mengekstraksi pengetahuan dan insights dari data dalam berbagai bentuk, baik terstruktur maupun tidak terstruktur, mirip dengan data mining.


Data science adalah "konsep untuk menyatukan statistik, analisis data, pembelajaran mesin dan metode terkait" untuk "memahami dan menganalisis fenomena aktual" dengan data.Ini menggunakan teknik dan teori yang diambil dari banyak bidang dalam konteks matematika, statistik, infformation science, dan ilmu komputer.

Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

In 2012, when Harvard Business Review called it "The Sexiest Job of the 21st Century", the term "data science" became a buzzword. It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. Even the suggestion that data science is sexy was paraphrasing Hans Rosling, featured in a 2011 BBC documentary with the quote, "Statistics is now the sexiest subject around." Nate Silver referred to data science as a sexed up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute[d] beyond usefulness."

While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources.

History

The term "data science" has appeared in various contexts over the past thirty years but did not become an established term until recently. In an early usage, it was used as a substitute for computer science by Peter Naur in 1960. Naur later introduced the term "datalogy". In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.

In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe for their biennial conference. Here, for the first time, the term data science is included in the title of the conference ("Data Science, classification, and related methods"), after the term was introduced in a roundtable discussion by Chikio Hayashi.

In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan.

In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In his conclusion, he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists. Later, he presented his lecture entitled "Statistics = Data Science?" as the first of his 1998 P.C. Mahalanobis Memorial Lectures. These lectures honor Prasanta Chandra Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute.

In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Volume 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.

In April 2002, the International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA)<ref name="ics12">International Council for Science: Committee on Data for Science and Technology. (2012, April). CODATA, The Committee on Data for Science and Technology. Retrieved from International Council for Science : Committee on Data for Science and Technology: http://www.codata.org/</ref> started the Data Science Journal,<ref name="dsj12">Data Science Journal. (2012, April). Available Volumes. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/_vols Template:Webarchive</ref> a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues.<ref name="dsj02">Data Science Journal. (2002, April). Contents of Volume 1, Issue 1, April 2002. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/1/0/_contents</ref> Shortly thereafter, in January 2003, Columbia University began publishing The Journal of Data Science,<ref name="jds03">The Journal of Data Science. (2003, January). Contents of Volume 1, Issue 1, January 2003. Retrieved from http://www.jds-online.com/v1-1</ref> which provided a platform for all data workers to present their views and exchange ideas. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."<ref>Template:Cite web</ref>

Around 2007,Template:Citation needed Turing award winner Jim Gray envisioned "data-driven science" as a "fourth paradigm" of science that uses the computational analysis of large data as primary scientific method<ref name="TansleyTolle2009" /><ref name="BellHey2009" /> and "to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other."<ref>Template:Cite news</ref>

In the 2012 Harvard Business Review article "Data Scientist: The Sexiest Job of the 21st Century",<ref name="Harvard">Template:Citation</ref> DJ Patil claims to have coined this term in 2008 with Jeff Hammerbacher to define their jobs at LinkedIn and Facebook, respectively. He asserts that a data scientist is "a new breed", and that a "shortage of data scientists is becoming a serious constraint in some sectors", but describes a much more business-oriented role.

In 2013, the IEEE Task Force on Data Science and Advanced Analytics<ref>Template:Cite web</ref> was launched. In 2013, the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg, establishing the European Association for Data Science (EuADS). The first international conference: IEEE International Conference on Data Science and Advanced Analytics was launched in 2014.<ref>Template:Cite web</ref> In 2014, General Assembly launched student-paid bootcamp and The Data Incubator launched a competitive free data science fellowship.<ref>Template:Cite news</ref> In 2014, the American Statistical Association section on Statistical Learning and Data Mining renamed its journal to "Statistical Analysis and Data Mining: The ASA Data Science Journal" and in 2016 changed its section name to "Statistical Learning and Data Science".<ref name="ASA">Template:Cite web</ref> In 2015, the International Journal on Data Science and Analytics<ref>Template:Cite web</ref> was launched by Springer to publish original work on data science and big data analytics. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of the Society "Data Science Society" at the third ECDA conference at the University of Essex, Colchester, UK.

Relationship to statistics

The popularity of the term "data science" has exploded in business environments and academia, as indicated by a jump in job openings.<ref>Template:Cite news</ref> However, many critical academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs.<ref name="GilPress">Template:Cite web</ref> In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”<ref name="NateSilver">Template:Cite web</ref> Similarly, in business sector, multiple researchers and analysts state that data scientists alone are far from being sufficient in granting companies a real competitive advantage<ref>Template:Cite journal</ref> and consider data scientists as only one of the four greater job families companies require to leverage big data effectively, namely: data analysts, data scientists, big data developers and big data engineers.<ref>Template:Cite journal</ref>

On the other hand, responses to criticism are as numerous. In a 2014 Wall Street Journal article, Irving Wladawsky-Berger compares the data science enthusiasm with the dawn of computer science. He argues data science, like any other interdisciplinary field, employs methodologies and practices from across the academia and industry, but then it will morph them into a new discipline. He brings to attention the sharp criticisms computer science, now a well respected academic discipline, had to once face.<ref name=":1">Template:Cite news</ref> Likewise, NYU Stern's Vasant Dhar, as do many other academic proponents of data science,<ref name=":1" /> argues more specifically in December 2013 that data science is different from the existing practice of data analysis across all disciplines, which focuses only on explaining data sets. Data science seeks actionable and consistent pattern for predictive uses.<ref name=":0" /> This practical engineering goal takes data science beyond traditional analytics. Now the data in those disciplines and applied fields that lacked solid theories, like health science and social science, could be sought and utilized to generate powerful predictive models.<ref name=":0" />

In an effort similar to Dhar's, Stanford professor David Donoho, in September 2015, takes the proposition further by rejecting three simplistic and misleading definitions of data science in lieu of criticisms.<ref name=":2">Template:Cite journal</ref> First, for Donoho, data science does not equate to big data, in that the size of the data set is not a criterion to distinguish data science and statistics.<ref name=":2" /> Second, data science is not defined by the computing skills of sorting big data sets, in that these skills are already generally used for analyses across all disciplines.<ref name=":2" /> Third, data science is a heavily applied field where academic programs right now do not sufficiently prepare data scientists for the jobs, in that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program.<ref name=":2" /><ref>Template:Cite book</ref> As a statistician, Donoho, following many in his field, champions the broadening of learning scope in the form of data science,<ref name=":2" /> like John Chambers who urges statisticians to adopt an inclusive concept of learning from data,<ref>Template:Cite journal</ref> or like William Cleveland who urges to prioritize extracting from data applicable predictive tools over explanatory theories.<ref name="cleveland01" /> Together, these statisticians envision an increasingly inclusive applied field that grows out of traditional statistics and beyond.

For the future of data science, Donoho projects an ever-growing environment for open science where data sets used for academic publications are accessible to all researchers. US National Institute of Health has already announced plans to enhance reproducibility and transparency of research data.

Other big journals are likewise following suit. This way, the future of data science not only exceeds the boundary of statistical theories in scale and methodology, but data science will revolutionize current academia and research paradigms. As Donoho concludes, "the scope and impact of data science will continue to expand enormously in coming decades as scientific data and data about science itself become ubiquitously available."

See also

References

"Firebase - CrunchBase". CrunchBase. Retrieved June 11, 2014.