How To Create Word Cloud in Python?

A Word Cloud in Python can be created in the following steps:

1. Import Necessary Libraries

Import the following libraries which are required to create a Word Cloud

import pandas as pd import matplotlib.pyplot as plt from wordcloud import WordCloud

2. Selecting the Dataset

For this example, we are using Popular Dataset Top Games on Google Play Store from Kaggle.

Download the Dataset and save it in your current working directory for hassle-free code implementation.

Import the dataset into a variable of your choice. Here our data is imported to variable df .

Text for the Word Cloud does not need to be from a Dataset. To get a meaningful text with fewer efforts, we are using the Dataset for our example.

df = pd.read_csv(“android-games.csv”)

3. Selecting the Text and Amount of Text for Word Cloud

Selecting text for creating a Word Cloud is an important task. One must check for various factors for the selection of Text such as:

  • Do we have Problem Statement?
  • Does the Selected Text have meaning in it?
  • Can we conclude the created Word Cloud?
  • Does our Text have an adequate amount of Text?

Word Cloud requires text in an adequate amount. A large number of words would hinder the visual appearance of Word Cloud and a lesser number of words would make no sense.

We can use the .head() method of DataFrame to check the Columns and the type of data present in them. In our example, we have taken the column category as Text.

Since the columns category has a prefix of GAME before each category game, our Word Cloud would end up creating GAME as the most frequent word and Word Cloud will have no meaning in int. Thus, we will perform filtering while adding the category column to the Text.

4. Check for NULL values

It is required to check for the null values in our dataset as while creating the Word Cloud, it would not accept text with nan values.

df.isna().sum()

If our dataset had any NaN values, we need to treat the missing values accordingly. Fortunately, this dataset has no NaN values, thus we can move to the next step.

If there are very few NaN values, it is always advisable to remove such rows as it would not affect the Word Cloud to a larger extent.

4. Adding Text to a Variable

Based on the parameters from Step 3, add the Text Data to a variable of your choice. Here, we are adding the data into variable text.

text = " ".join(cat.split()[1] for cat in df.category)

Since we need to filter the GAME from the category, we have split each row value and took the 2nd item, i.e. the category name from the category column.

5. Creating the Word Cloud

Create an object of class WordCloud with the name of your choice and call the generate() method. Here we have created the object with the name word_cloud.

WordCloud() takes several arguments as per the need. Here we are adding two arguments:

  1. collocations = False , which will ignore the collocation words from the Text

  2. background_color = ‘White’, which will make the words look clearer

The .generate() method takes one argument of the text we created. In our case, we will give the text variable as an argument to .generate().

word_cloud = WordCloud(collocations = False, background_color = ‘white’).generate(text)

6 . Plotting the Word Cloud

Using the .imshow() method of matplotlib.pyplot to display the Word Cloud as an image.

.imshow() takes several arguments, but in our example, we are taking two arguments:

  1. word_cloud created in Step 5

  2. interpolation = ‘bilinear’

Since we are creating an image with .imshow(), the resampling of the image is done as the image pixel size and screen resolution doesn’t not match. This resampling is controlled with the interpolation argument to produce softer or crisper images as per our need. There are several types of interpolation available such as gaussian, quadric, bicubic. Here we are using bilinear interpolation.

Plotting the image with axis off as we don’t want axis ticks in our image.

plt.imshow(wordcloud, interpolation=‘bilinear’) plt.axis(“off”) plt.show()

7. The Complete Code

#Importing Libraries import pandas as pd import matplotlib.pyplot as plt %matplotlib inline from wordcloud import WordCloud #Importing Dataset df = pd.read_csv(“android-games.csv”) #Checking the Data df.head() #Checking for NaN values df.isna().sum() #Removing NaN Values #df.dropna(inplace = True) #Creating the text variable text = " ".join(cat.split()[1] for cat in df.category) # Creating word_cloud with text as argument in .generate() method word_cloud = WordCloud(collocations = False, background_color = ‘white’).generate(text) # Display the generated Word Cloud plt.imshow(wordcloud, interpolation=‘bilinear’) plt.axis(“off”) plt.show()

word cloud in python 3

Word Cloud of category column (Image Source – Personal Computer) *The attached image size is irrespective of output image size

Similarly, let’s create Word Cloud for the title column from the imported dataset.

#Importing Libraries import pandas as pd import matplotlib.pyplot as plt %matplotlib inline from wordcloud import WordCloud #Importing Dataset df = pd.read_csv(“1.csv”) #Checking the Data df.head() #Creating the text variable text2 = " ".join(title for title in df.title) # Creating word_cloud with text as argument in .generate() method word_cloud2 = WordCloud(collocations = False, background_color = ‘white’).generate(text2) # Display the generated Word Cloud plt.imshow(word_cloud2, interpolation=‘bilinear’) plt.axis(“off”) plt.show()

Output

Word Cloud of title column (Image Source – Personal Computer) *The attached image size is irrespective of the output image size