Data Science is not about making complicated models. It is not about making awesome visualization. It is not about writing code. Data Science is about using data to create as much impact as possible for your company. Now impact can be in the form of multiple things. It can be in the form of insights. It can be in the form of data products or it can be in the form of product recommendations for your company. Now to do those things, then you need tools like complicated models or data visualizations or writing code. But essentially as a data scientist your job is to solve real problems your company is facing. And what kind of tools you use? No one cares. There is a lot of misconception about Data Science, especially if you go to YouTube. And the reason for this is because there is a huge misalignment between what is popular to talk about and what is needed in the industry. From a perspective of a Data Scientist actually working for a huge company, those companies really emphasis on using data to improve their products.
History of Data Science
Before Data Science, we popularized the term Data Mining from an article published in 1996.This article referred to the overall process of discovering useful information from data. In 2001, William S. Cleveland wanted to take data mining to another level. He did that by combining Computer Science with Data Mining. Basically, he made statistics a lot more technical which he believed would expand the possibilities of data mining and create a powerful force for innovation. Now you could take advantage of computing power for statistics. And he called this combo Data Science.
Around this time, this is also when web 2.0 emerged where websites are no longer just a digital pamphlet, but a medium for a shared experience amongst millions and millions of users. These are websites like myspace in 2003, Facebook in 2004 and YouTube in 2005. We can now interact with these websites meaning we can contribute, post comments, like, upload, share leaving our footprint in the digital landscape we call the internet. And help create and shape the ecosystem we now know and love today.
The Advent of Big Data
And guess what? That is a huge amount of data, so much data, it became very hard to handle by employing traditional technologies. So, we called it Big Data. This opened a lot of possibilities for finding more insights using data. But it also meant that simplest questions required sophisticated data infrastructure just to support handling of data. We needed parallel computing technology like map reduce, Hadoop and spark.So the rise of big data around 2010 started the increase of Data Science technologies in supporting the business needs. The needs were around getting insights from their large sets of unstructured data. Data Science was thus then described as almost anything that has to do with the data.