Why should you learn BASH?

What is Bash?

In short, Bash is the Unix command-line interface (CLI). You’ll also see it called the terminal, the command line, or the shell. It’s a command language that allows us to work with files on our computers in a way that’s far more efficient and powerful than using a GUI (graphical user interface).

Making the switch from graphical user interfaces (GUIs) to a command-line interface can feel overwhelming. Here are a few reasons why you should be learning bash and using the command line:

Command Line Skills Help With Building Repeatable Data Processes

Part of a data scientist’s role is to make sure certain information is available regularly, often daily. Most of the time this data is acquired, processed and displayed in the same way.

The command line is well suited for this purpose because commands are easily automated and replicated.

Consider the following situation:

Your employer decides to invest in data analytics. Several data professionals will be joining the team. You are tasked with making sure that their machines have everything they need to get started.

If you can work with a CLI (command language interpreter), you can write a few scripts that will install, configure and test everything automatically.

If you can’t, you’ll have to resort to a GUI and make the same mouse and click movements, repeatedly, across several machines.

That’s just one example of how terminal skills can help make data science processes more scalable and repeatable.

Learning Bash Makes You More Flexible

In a data science role, you’ll often find you have more flexibility if you can use the terminal rather than having to rely on clicking through GUIs.

Since the command line is a program that runs other programs (hence the name “shell”), the interaction between programs is often easier to adjust in the command line.

Once you’ve mastered bash commands, it’s relatively easy to write scripts, and shell scripts make building all sorts of data pipelines and workflows much simpler.

More broadly, knowing how to use the shell gives you a second option for interacting with your computer.

You can always use the GUI when you prefer, but the command line can provide you with more direct power and control for those times when you need it.

Working With Text Files is Easier

Text files are among the most common ways methods to store and handle data. Almost any data science project is going to involve some work with text files. Being able to handle text files quickly and efficiently is thus a very useful skill for a data scientist.

The shell has very powerful text processing tools like AWK and sed, which help with getting acquainted with files and facilitate data cleaning.

For example, the code below uses AWK to print the first and third columns of a file named a_csv_file , where the second field’s value is Dataquest , using a comma as a field separator.

awk 'BEGIN {FS=","} {if ($2=="Dataquest") {print $1 $3} } a_csv_file'

All it takes is one line of code!