Data science

From OnnoWiki
Jump to navigation Jump to search


Data Science TANPA PROGRAMMING
Statistician vs Data Scientist

Data Science adalah bidang interdisiplin yang menggunakan metode, proses, algoritma dan sistem ilmiah untuk mengekstraksi pengetahuan dan insights dari data dalam berbagai bentuk, baik terstruktur maupun tidak terstruktur, mirip dengan data mining.


Data science adalah "konsep untuk menyatukan statistik, analisis data, pembelajaran mesin dan metode terkait" untuk "memahami dan menganalisis fenomena aktual" dengan data.Ini menggunakan teknik dan teori yang diambil dari banyak bidang dalam konteks matematika, statistik, infformation science, dan ilmu komputer.

Pemenang Turing award Jim Gray membayangkan data science sebagai "fourth paradigm" dari science,

  1. empirical
  2. theoretical
  3. computational
  4. data-driven

dan menambahkan bahwa "everything about science is changing because of the impact of information technology" dan adanya data deluge.

Pada tahun 2012, saat Harvard Business Review menyebutnya "The Sexiest Job of the 21st Century", istilah "data science" menjadi buzzword. Data Science sering bertukar dengan konsep-konsep sebelumnya seperti business analytics, business intelligence, predictive modeling, dan statistics. Beberapa bahkan menyebut data science adalah sexy seperti dikatakan oleh Hans Rosling, ditayangkan dalam 2011 BBC documentary dengan quote, "Statistics is now the sexiest subject around." Nate Silver menyebut data science sebagai istilah sexed up dari statistik. Dalam banyak hal, pendekatan-pendekatan sebelumnya secara sederhana di re-branding sebagai "data science" supaya lebih menarik, yang pada akhirnya menyebabkan istilah tersebut menjadi "dilute[d] beyond usefulness."

Pada saat ini banyak program di universitas memberikan gelar di bidang data science, padahal belum ada konsensus akan definisi maupun kurikulum yang cocok. Dari sisi yang men-diskredit-kan, sayangnya, banyak proyek data-science dan big-data gagal menyampaikan hasil yang baik, sering kali karena manajemen dan penggunaan sumber daya / resource yang tidak baik.


Sejarah

Sejarah Data Science

Istilah "data science" muncul dalam berbagai konteks pada tiga puluh tahun belakangan akan tetapi tidak menjadi istilah yang menyakinkan sampai belakangan ini. Pada penggunaan awalnya, digunakan sebagai pengganti untuk computer science oleh Peter Naur tahun 1960. Naur kemudian mengajukan istilah "datalogy". Pada tahun 1974, Naur mempublikasikan Concise Survey of Computer Methods, yang secara bebas menggunakan istilah data science dalam survey-nya pada metoda kontemporer data processing yang digunakan dalam aplikasi yang sangat banyak.

Pada tahun 1996, anggota International Federation of Classification Societies (IFCS) bertemu di Kobe untuk conference dua tahunan. Disini, untuk pertama kali, istilah data science dimasukan dalam judul conference ("Data Science, classification, and related methods"), setelah istilah tersebut di perkenalkan di diskusi roundtable oleh Chikio Hayashi.

Pada bulan November 1997, C.F. Jeff Wu memberikan pidato pengukuhan berjudul "Statistics = Data Science?" untuk posisi H. C. Carver Professorship di University of Michigan.

Dalam pidato tersebut, dia meng-karakterisasi pekerjaan statistik sebagai trilogi dari

  • data collection
  • data modeling & analysis
  • decision making

Pada kesimpulannya, dia mengusulkan untuk penggunaan istilah "data science" untuk keperluan modern dan non-computer science, dan mengadvokasi agar statistik di rename menjadi data science dan ahli statistik sebagai data scientist.

Selanjutnya, dia mempresentasi kuliah berjudul "Statistics = Data Science?" sebagai pertama kali dari kuliah P.C. Mahalanobis Memorial Lectures 1998. Kuliah ini untuk menghormati Prasanta Chandra Mahalanobis, seorang ilmuwan dan ahli statistik india dan founder of the Indian Statistical Institute.

Pada tahun 2001, William S. Cleveland memperkenalkan data science sebagai disiplin ilmu tersendiri, extending ilmu statistik dan memasukan "advances in computing with data" dalam tulisannya "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," yang di publikasikan di Volume 69, No. 1, edisi April 2001 dari International Statistical Review / Revue Internationale de Statistique. Dalam tulisannya, Cleveland menyebutkan enam wilayah keahlian teknis yang perlu di kuasasi dalam data science, yaitu:

  • multidisciplinary investigation
  • model
  • method for data
  • computing with data
  • pedagogy
  • tool evaluation
  • theory.

Pada bulan April 2002, International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA) mulai Data Science Journal, sebuah publikasi yang fokus pada isu seperti deskripsi dari data system, yang di publikasi di Internet, dengan berbagai aplikasinya dan isu legal. Tidak lama sesudah itu, pada bulan Januari 2003, Columbia University mulai mempublikasikan The Journal of Data Science, yang menjadi platform bagi semua data worker untuk mempresentasikan pandangan mereka dan bertukar fikiran. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."

Around 2007, Turing award winner Jim Gray envisioned "data-driven science" as a "fourth paradigm" of science that uses the computational analysis of large data as primary scientific method and "to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other."

In the 2012 Harvard Business Review article "Data Scientist: The Sexiest Job of the 21st Century", DJ Patil claims to have coined this term in 2008 with Jeff Hammerbacher to define their jobs at LinkedIn and Facebook, respectively. He asserts that a data scientist is "a new breed", and that a "shortage of data scientists is becoming a serious constraint in some sectors", but describes a much more business-oriented role.

In 2013, the IEEE Task Force on Data Science and Advanced Analytics was launched. In 2013, the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg, establishing the European Association for Data Science (EuADS). The first international conference: IEEE International Conference on Data Science and Advanced Analytics was launched in 2014. In 2014, General Assembly launched student-paid bootcamp and The Data Incubator launched a competitive free data science fellowship. In 2014, the American Statistical Association section on Statistical Learning and Data Mining renamed its journal to "Statistical Analysis and Data Mining: The ASA Data Science Journal" and in 2016 changed its section name to "Statistical Learning and Data Science". In 2015, the International Journal on Data Science and Analytics was launched by Springer to publish original work on data science and big data analytics. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of the Society "Data Science Society" at the third ECDA conference at the University of Essex, Colchester, UK.

Relationship to statistics

Image 4f5f7868-ed17-4214-a252-bbeec114101a20200113 083746.jpg
Membuat Model Machine Learning


The popularity of the term "data science" has exploded in business environments and academia, as indicated by a jump in job openings. However, many critical academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician." Similarly, in business sector, multiple researchers and analysts state that data scientists alone are far from being sufficient in granting companies a real competitive advantage and consider data scientists as only one of the four greater job families companies require to leverage big data effectively, namely: data analysts, data scientists, big data developers and big data engineers.

On the other hand, responses to criticism are as numerous. In a 2014 Wall Street Journal article, Irving Wladawsky-Berger compares the data science enthusiasm with the dawn of computer science. He argues data science, like any other interdisciplinary field, employs methodologies and practices from across the academia and industry, but then it will morph them into a new discipline. He brings to attention the sharp criticisms computer science, now a well respected academic discipline, had to once face. Likewise, NYU Stern's Vasant Dhar, as do many other academic proponents of data science, argues more specifically in December 2013 that data science is different from the existing practice of data analysis across all disciplines, which focuses only on explaining data sets. Data science seeks actionable and consistent pattern for predictive uses. This practical engineering goal takes data science beyond traditional analytics. Now the data in those disciplines and applied fields that lacked solid theories, like health science and social science, could be sought and utilized to generate powerful predictive models.

In an effort similar to Dhar's, Stanford professor David Donoho, in September 2015, takes the proposition further by rejecting three simplistic and misleading definitions of data science in lieu of criticisms. First, for Donoho, data science does not equate to big data, in that the size of the data set is not a criterion to distinguish data science and statistics. Second, data science is not defined by the computing skills of sorting big data sets, in that these skills are already generally used for analyses across all disciplines. Third, data science is a heavily applied field where academic programs right now do not sufficiently prepare data scientists for the jobs, in that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program. As a statistician, Donoho, following many in his field, champions the broadening of learning scope in the form of data science, like John Chambers who urges statisticians to adopt an inclusive concept of learning from data, or like William Cleveland who urges to prioritize extracting from data applicable predictive tools over explanatory theories. Together, these statisticians envision an increasingly inclusive applied field that grows out of traditional statistics and beyond.

For the future of data science, Donoho projects an ever-growing environment for open science where data sets used for academic publications are accessible to all researchers. US National Institute of Health has already announced plans to enhance reproducibility and transparency of research data.

Other big journals are likewise following suit. This way, the future of data science not only exceeds the boundary of statistical theories in scale and methodology, but data science will revolutionize current academia and research paradigms. As Donoho concludes, "the scope and impact of data science will continue to expand enormously in coming decades as scientific data and data about science itself become ubiquitously available."




Referensi

Pranala Menarik