🐼📊 Pandas inbuilt.plot() methods

🐼📊 Pandas inbuilt.plot() methods#

👨‍🏫 Vikesh K
📓 Lab 04

💡 “It doesn’t get easier, you get better” 💡

📝Lab Agenda#

Pandas plotting
Plotly versions

Note

In the notebook, we will focus on thee pandas .plot() method. Its a powerful and a quick method to quickly visualise the charts just using Pandas. For some of the charts, we would also focus on the plotly extension to render an interactive chart.

Please read more about them in the original documentation

Importing modules#

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Data Creation#

# framing a df
df_sales = pd.DataFrame({'Year': [2010, 2011, 2012, 2013, 2014],
                   'Sales': [100, 120, 90, 150, 200]})

# data plot
df_sales

	Year	Sales
0	2010	100
1	2011	120
2	2012	90
3	2013	150
4	2014	200

Line Plot#

Line Plot: Used to visualize the trend in data over time. This plot is useful when you want to analyze how a variable has changed over time

df_sales.plot(x ='Year', y ='Sales', kind='line');

_images/6b50261ffbb2013673940fa888e46935ccfd8306aed6bb1bd8a379e2b0e6353d.png

Interactive Plotly Version

df_sales.plot(x ='Year', y ='Sales', kind ='line', backend = 'plotly')

Scatter Plot#

Scatter Plot: Used to visualize the relationship between two continuous variables. This plot is useful when you want to analyze the correlation between two variables.

df_age = pd.DataFrame({'Age': [20, 25, 30, 35, 40],
                   'Income': [50000, 60000, 70000, 80000, 90000]})

df_age.plot(x='Age', y='Income', kind='scatter');

_images/721011c74456619ce4771065a5d5a7ee06100d83a522f8a05a05d5d5eeb17212.png

Plotly version

df_age.plot(x='Age', y='Income', kind='scatter', backend = 'plotly')

Bar Plot#

Bar Plot: Used to compare the values of a categorical variable. This plot is useful when you want to compare the values of a variable across different categories.

df_age.plot(x='Age', y='Income', kind='bar');

_images/ba340c53fabd618f4eaea2869f41ebb407e37af319b5264d692b99d6eedbf33e.png

Plotly version

df_age.plot(x='Age', y='Income', kind='bar', backend = 'plotly')

Horizontal Bar Plot#

Horizontal Bar Plot: Similar to a bar plot, but with horizontal bars. This plot is useful when you want to compare the values of a variable across different categories and the category labels are long or have long names.

df_age.plot(x='Age', y='Income', kind='barh');

_images/818e6d05ec01a75e539e029a73b52830e779ed9eb9a68db7e3ec2dee2e5a8edc.png

Plotly version

df_age.plot(x ='Age', y='Income', kind='barh', backend = 'plotly')

Stacked Bar Plot#

Stacked Bar Plot: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset, and also want to see how the relative contributions have changed over time or across different categories.

data = {
    'apples': [20, 10, 30, 25],
    'oranges': [10, 20, 10, 15],
    'pears': [15, 15, 10, 20],
}

df_fruits = pd.DataFrame(data, index=['Q1', 'Q2', 'Q3', 'Q4'])

# Create a stacked bar plot
df_fruits.plot(kind='bar', stacked=True);

_images/56553f01266d1be8ece07a67782784167725cbaedd2b789ca965fd4a6a0b4e21.png

Note: Plotly doesn’t support Stacked version

Pie Chart#

Pie Chart: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset, but is generally not recommended due to issues with perception and comparison.

df = pd.DataFrame({'Category': ['A', 'B', 'C', 'D'],
                   'Sales': [200, 350, 400, 150]})
df.plot(y='Sales', kind='pie', labels=df['Category'], autopct='%1.1f%%')
plt.show()

_images/16fb9813f87d2f787b72c584ecc93b702398a669f30f0497d933b251dbd1bc51.png

Plolty doesn’t support pie chart

Histogram#

Histogram Plot: Used to visualize the distribution of a dataset. This plot is useful when you want to see the frequency distribution of a continuous variable.

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='hist');

_images/30ab1a81fa02a285e84fad1dab91eb1c6417f73339acf59be74e865a5c6909e4.png

Plotly version

df.plot(kind='hist', backend = 'plotly')

Area Plot#

Area Plot: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset.

df_sales.plot(x ='Year', y='Sales', kind='area');

_images/8fe081a51258a3412c4342f054d1bf366eb76f9232100e783d08602be6464bb9.png

Plotly Chart

df_sales.plot(x='Year', y='Sales', kind='area', backend = 'plotly')

Box Plot#

Box Plot: Used to visualize the distribution of a dataset through its quartiles. This plot is useful when you want to see the distribution of a variable and identify any outliers.

data = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])

# create a box plot of the dataset
data.plot(kind='box');

_images/0973b35c6fd5788cf1da39970391928c8f72d69fc736f26fc0307bc2a245b3a3.png

Plotly version

data.plot(kind='box', backend = 'plotly')

KDE Plot#

Kernel Density Estimation Plot: Used to estimate the probability density function of a continuous variable. This plot is useful when you want to see the probability distribution of a variable.

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='kde');

_images/6b336e832bdf1779218115dba3977a17cdc5db39a18077b849c4b87c17a8a73f.png

Plotly doesn’t support kde charts

Density Plot#

Density Plot: Used to visualize the density of a continuous variable. This plot is useful when you want to see the probability distribution of a variable.

Note: ‘density’ plot is same as ‘kde’ plot

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='density');

_images/869b3cc41c90af03c4a88838d11ffed9f460609d580b6cba99e84bd4e8ab390c.png

Plotly doesn’t support density

Hexagonal Bin Plot#

Hexagonal Bin Plot: Used to visualize the distribution of a dataset through hexagonal bins. This plot is useful when you want to analyze the distribution of a dataset with a large number of points.

# df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B'])

# # create a hexagonal bin plot of the dataset
# df.plot(kind ='hexbin', x ='A', y ='B', gridsize =20);

Plotly doesn’t support Hexbins

📚 Reference material#

If you wish to know more about Data Visualisation and Storytelling. The below resources should be helpful.

🐼📊 Pandas inbuilt.plot() methods

Contents

🐼📊 Pandas inbuilt.plot() methods#

📝Lab Agenda#

Importing modules#

Data Creation#

Line Plot#

Scatter Plot#

Bar Plot#

Horizontal Bar Plot#

Stacked Bar Plot#

Pie Chart#

Histogram#

Area Plot#

Box Plot#

KDE Plot#

Density Plot#

Hexagonal Bin Plot#

📚 Reference material#