Let’s unpack a few of the most important technical skills Data Scientists routinely use. To think about how these skills are used in the pursuit of data science, it’s helpful to organize them into three major categories:
Collecting and storing data
All that data has to come from somewhere. It also has to be consistent and organized for it to produce reliable insights. This isn’t as straightforward as simply casting a net—the Data Scientist should know how the data will be used, how to manipulate it into a usable form (that is, data cleaning and wrangling), and how to turn it into an effective database (in a word or two, database management). You might also hear these steps referred to as data extraction, data transformation, and data loading. Whatever you call it, familiarity with Excel and querying languages like SQL is indispensable.
Databases are, by their very nature, nice and tidy. But not all data is so cooperative. Data Scientists frequently work with unstructured data—information that doesn’t fit neatly into tables, such as audio and video, customer feedback replies, or social media posts. Because they’re not numerical or streamlined, finding ways to make this data usable can be a challenge—one that falls squarely on the Data Scientist’s shoulders.
Analyzing and modeling data
Python, R, Hadoop, and Spark, among other analytical tools, help Data Scientists to quantify and analyze data sets using statistical methods, run tests, and create models that can be used across a wide range of applications, from finance to e-commerce to natural resources. Ultimately, the goal is to generate models that can derive new insights from data and predict unknowns.
The skills Data Scientists need to accomplish these tasks are as varied as the tasks themselves, but as a general rule, data wrangling, data exploration, analysis, and modeling lean heavily on a foundation of math and programming. This is also where data science–specific skills like machine learning and deep learning come into play.
Visualizing and presenting data
Converting data from tables into charts and graphs—or even dashboards, which allow non-analysts to retrieve information in a more intuitive way—is an art unto itself. There are a range of tools Data Scientists use to accomplish this, including Tableau, PowerBI, Plotly, Bokeh, and Matplotlib, among others, each with its own strengths. It’s worth noting that software can’t tell you what type of visualization is most appropriate to highlight your findings—so a good understanding of the ways in which data can be visualized is a necessary first step.