Introduction to Data Visualization With Matplotlib in Python

Introduction to Data Visualization With Matplotlib in Python

Preface

This Jupyter notebook is a summary of my learnings. I have learned matplotlib from various sources. In this notebook, I have tried to explain how things work in matplotlib using my own examples. I hope you will like it :) For those who already know matplotlib, you can use this as a reference notebook!

0. Import the required packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Protip 1: Import Seaborn also, even if you are not using it. Because it improves overall plot quality.

import seaborn as sns

To view the plots inline

This helps to interact with the plots on the Jupyter Notebook itself.

%matplotlib notebook

Chapter 1: Figures and Subplots

To draw plots in matplotlib, we need something on which we can plot. Just like we use a pen to draw plots on paper, on the same token, we use Python to draw plots on matplotlib figure object.

So, how can we create a figure object!?
It’s just 1 line. A good practice is to name the figure object as “fig”

fig=plt.figure()

We see that the interactive figure object gets created. But a few of them are empty! Let’s fill them.

type(fig)

plt.figure() returns a “matplotlib.figure.Figure”

Now that we know where to plot, let’s actually plot!

fig.plot([1,2])

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)

in
----> 1 fig.plot([1,2])

AttributeError: 'Figure' object has no attribute 'plot'

Well, this gives an error because we can’t directly plot on the figure object! We should create subplots in a figure object and then plot.

To create subplots, use

fig.add_subplot(2,2,1)

Now this means that we have created 2*2 ( first two parameters ) subplots.
And the third parameter specifies the subplot which we are referring to.

But the problem with the above code is we did not store that in a variable. So let’s quickly do it!

sub_plot_1 = fig.add_subplot(2,2,1)

Well, there a catchy point here. We have created 4 subplots (2*2), but we have stored only one subplot. What happened to the remaining ones. Well, we can repeat the same steps above and store all the subplots.

sub_plot_2 = fig.add_subplot(2,2,2)
sub_plot_3 = fig.add_subplot(2,2,3)
sub_plot_4 = fig.add_subplot(2,2,4)

Now it’s time to draw our first plot.

sub_plot_1.plot([1,7,2,9])

Output:

[]

Where is the output!? The output is above, scroll above and see it. But, I should get the output right after the cell, right? Well, it is correct, but in the case of figure objects, the output is below the code where we have created figure object. We have done it in one of the previous cells i.e cell 3, so the output is located there.

So now, let’s see how to get the output right after the cell

fig2 = plt.figure()

fig2_sub_plot_1 = fig2.add_subplot(1,1,1)

fig2_sub_plot_1.plot([1,7,2,9])

I think you are happy now :)

I have a question. I have 3 other subplots right because we have created 4 subplots. How to plot on those 3 of them? Well, just change the third parameter. It is as simple as that!

fig3 = plt.figure()

fig3_sub_plot_1 = fig3.add_subplot(2,2,1)
fig3_sub_plot_2 = fig3.add_subplot(2,2,2)
fig3_sub_plot_3 = fig3.add_subplot(2,2,3)
fig3_sub_plot_4 = fig3.add_subplot(2,2,4)

fig3_sub_plot_1.plot([1,7,2,9])
fig3_sub_plot_2.plot(np.random.rand(10)) # but what is np.random.rand(10)? It generates random numbers!
# More about it in my NumPy notebook. I will post that noteboook soon :)
fig3_sub_plot_3.plot(np.random.randn(10))
fig3_sub_plot_4.plot(np.random.rand(10).cumsum())

Woah! That’s amazing! But, I have a problem. I don't like Blue. Can I change it? Also, I would like to change the marker. Is it possible? Interestingly, matplotlib has all those functionalities which can be changed. I will tell them in the next chapter!

Chapter 2: Colors, markers, and line styles

Changing colors, markers, and line styles aren’t tough! It’s very easy. All you need to do is, just add a parameter to the plot function.

Let me create another 4 subplots and then illustrate various colors, markers, and line-styles.

fig4 = plt.figure()

fig4_sub_plot_1 = fig4.add_subplot(2,2,1)
fig4_sub_plot_2 = fig4.add_subplot(2,2,2)
fig4_sub_plot_3 = fig4.add_subplot(2,2,3)
fig4_sub_plot_4 = fig4.add_subplot(2,2,4)

fig4_sub_plot_1.plot([1,7,2,9],'ko--')
fig4_sub_plot_2.plot(np.random.rand(10),'cv:')
fig4_sub_plot_3.plot(np.random.randn(10),'rD-.')
fig4_sub_plot_4.plot(np.random.rand(10).cumsum(),'g,-.')

Wow! They are amazing. Well, let me tell you, there are still many more ways in which you can customize your plots. Matplotlib provides you with many flexible ways!!

Refer to the below cell for all the markers, colors, and line styles which are available in matplotlib.

**Markers**

\============= ===============================
character description
\============= ===============================
``'.'`` point marker
``','`` pixel marker
``'o'`` circle marker
``'v'`` triangle_down marker
``'^'`` triangle_up marker
``'<'`` triangle_left marker
``'>'`` triangle_right marker
``'1'`` tri_down marker
``'2'`` tri_up marker
``'3'`` tri_left marker
``'4'`` tri_right marker
``'s'`` square marker
``'p'`` pentagon marker
``'*'`` star marker
``'h'`` hexagon1 marker
``'H'`` hexagon2 marker
``'+'`` plus marker
``'x'`` x marker
``'D'`` diamond marker
``'d'`` thin_diamond marker
``'|'`` vline marker
``'_'`` hline marker
\============= ===============================

**Line Styles**

\============= ===============================
character description
\============= ===============================
``'-'`` solid line style
``'--'`` dashed line style
``'-.'`` dash-dot line style
``':'`` dotted line style
\============= ===============================

Example format strings::

'b' # blue markers with default shape
'or' # red circles
'-g' # green solid line
'--' # dashed line with default color
'^k:' # black triangle_up markers connected by a dotted line

**Colors**

The supported color abbreviations are the single letter codes

\============= ===============================
character color
\============= ===============================
``'b'`` blue
``'g'`` green
``'r'`` red
``'c'`` cyan
``'m'`` magenta
``'y'`` yellow
``'k'`` black
``'w'`` white
\============= ===============================

You can go ahead and try out various colors, markers, and line-styles!

I have a question, can I draw shapes?? Because, I am more of a Mathematics guy and would love to plot rectangles, squares, and polygons. Well, scroll down to 3rd chapter :D

Chapter 3: Shapes

As usual, let’s create another 2 new subplots

fig5 = plt.figure()

fig5_sub_plot_1 = fig5.add_subplot(1,2,1)
fig5_sub_plot_2 = fig5.add_subplot(1,2,2)

To plot rectangle, use the below syntax!

rect = plt.Rectangle((0.17,0.29),0.32,0.2)

# (0.17,0.29) is bottom left point
# 0.32 is length on X-axis
# 0.2 is length on Y-axis

fig5_sub_plot_1.add_patch(rect)

Notice that we have used the “add_patch” function rather than the “plot” function. Basic shapes like rectangle, circle are present in matplotlib.pyplot. But, if you want all the shapes, then they are present in matplotlib.patches. Also, shapes are called “Patches” in matplotlib.

circle = plt.Circle((0.17,0.29),0.12)

**# (0.17,0.29) is centre of circle
# 0.12 is radius of circle

**

fig5_sub_plot_2.add_patch(circle)

Here is the good news! Till now, we have been choosing our styles to customize our plots. But now matplotlib does it for us. How!? Series and Data Frames are nicely integrated with matplotlib plot functions. If we call plot function with a Series or Data Frame object, matplotlib comes up with its own styling for us!

Chapter 4: Plotting with Series and Data Frames

Let's create a Series object first

series = pd.Series( [1,7,2,9], index = ['a','b','c','d'] )
series

Output:

a 1
b 7
c 2
d 9
dtype: int64

Plot the series object

fig6 = plt.figure()
plt.plot(series)

Now for Data Frames!

df=pd.DataFrame(np.random.rand(5,4),
columns=['col_1','col_2','col_3','col_4'],
index=np.array(['row_1','row_2','row_3','row_4','row_5']))
df

Our data frame

df.plot()

As I have said, matplotlib automates the styling. This is the plot! Automatically, the colors, location of legend, x_ticks, y_ticks are chosen by matplotlib. Isn’t it cool!?

Till now, we have seen only line plots. Well, there exist many other types of plots too, like bar charts, stacked bar charts, histograms, box plots, violin plots, density plots, etc. Don’t worry if they are too much for you now! Depending on the data we choose our plot. For example, for categorical data, we use bar charts and histograms. And many more … That is a different topic in itself. We shall talk about “which plot to use when” later.

Bar charts and Histograms for Data Frames

df.plot.bar()

df.plot.hist()

Box plots are used to see how the data is. It tells us Five Number Summary ( min, Q1, Q2(median), Q3, max)

df.plot.box()

Here are more interesting plots! These are plots done with Seaborn which I will continue in the next article.

fig=plt.figure()
sns.violinplot(df)

fig=plt.figure()
sns.heatmap(df)

fig=plt.figure()
sns.distplot(df)

Still many more in Seaborn article, till then bye :) Seaborn is matplotlib++ in basic terms. It's more fun than matplotlib.

Connect with me -

LinkedIn : https://linkedin.com/in/bomma-pranay GitHub : [https://github.com/Bomma-Pranay](https://github.com/Bomma-Pranay)

--- By Bomma Pranay A Data Science Enthusiast