๐Ÿผ๐Ÿ“Š Pandas inbuilt.plot() methods#

๐Ÿ‘จโ€๐Ÿซ Vikesh K
๐Ÿ““ Lab 04

๐Ÿ’ก โ€œIt doesnโ€™t get easier, you get betterโ€ ๐Ÿ’ก

๐Ÿ“Lab Agenda#

  • Pandas plotting

  • Plotly versions

Note

In the notebook, we will focus on thee pandas .plot() method. Its a powerful and a quick method to quickly visualise the charts just using Pandas. For some of the charts, we would also focus on the plotly extension to render an interactive chart.

Please read more about them in the original documentation

Importing modules#

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Data Creation#

# framing a df
df_sales = pd.DataFrame({'Year': [2010, 2011, 2012, 2013, 2014],
                   'Sales': [100, 120, 90, 150, 200]})

# data plot
df_sales
Year Sales
0 2010 100
1 2011 120
2 2012 90
3 2013 150
4 2014 200

Line Plot#

Line Plot: Used to visualize the trend in data over time. This plot is useful when you want to analyze how a variable has changed over time

df_sales.plot(x ='Year', y ='Sales', kind='line');
_images/6b50261ffbb2013673940fa888e46935ccfd8306aed6bb1bd8a379e2b0e6353d.png

Interactive Plotly Version

df_sales.plot(x ='Year', y ='Sales', kind ='line', backend = 'plotly')

Scatter Plot#

Scatter Plot: Used to visualize the relationship between two continuous variables. This plot is useful when you want to analyze the correlation between two variables.

df_age = pd.DataFrame({'Age': [20, 25, 30, 35, 40],
                   'Income': [50000, 60000, 70000, 80000, 90000]})

df_age.plot(x='Age', y='Income', kind='scatter');
_images/721011c74456619ce4771065a5d5a7ee06100d83a522f8a05a05d5d5eeb17212.png

Plotly version

df_age.plot(x='Age', y='Income', kind='scatter', backend = 'plotly')

Bar Plot#

Bar Plot: Used to compare the values of a categorical variable. This plot is useful when you want to compare the values of a variable across different categories.

df_age.plot(x='Age', y='Income', kind='bar');
_images/ba340c53fabd618f4eaea2869f41ebb407e37af319b5264d692b99d6eedbf33e.png

Plotly version

df_age.plot(x='Age', y='Income', kind='bar', backend = 'plotly')

Horizontal Bar Plot#

Horizontal Bar Plot: Similar to a bar plot, but with horizontal bars. This plot is useful when you want to compare the values of a variable across different categories and the category labels are long or have long names.

df_age.plot(x='Age', y='Income', kind='barh');
_images/818e6d05ec01a75e539e029a73b52830e779ed9eb9a68db7e3ec2dee2e5a8edc.png

Plotly version

df_age.plot(x ='Age', y='Income', kind='barh', backend = 'plotly')

Stacked Bar Plot#

Stacked Bar Plot: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset, and also want to see how the relative contributions have changed over time or across different categories.

data = {
    'apples': [20, 10, 30, 25],
    'oranges': [10, 20, 10, 15],
    'pears': [15, 15, 10, 20],
}

df_fruits = pd.DataFrame(data, index=['Q1', 'Q2', 'Q3', 'Q4'])

# Create a stacked bar plot
df_fruits.plot(kind='bar', stacked=True);
_images/56553f01266d1be8ece07a67782784167725cbaedd2b789ca965fd4a6a0b4e21.png

Note: Plotly doesnโ€™t support Stacked version

Pie Chart#

Pie Chart: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset, but is generally not recommended due to issues with perception and comparison.

df = pd.DataFrame({'Category': ['A', 'B', 'C', 'D'],
                   'Sales': [200, 350, 400, 150]})
df.plot(y='Sales', kind='pie', labels=df['Category'], autopct='%1.1f%%')
plt.show()
_images/16fb9813f87d2f787b72c584ecc93b702398a669f30f0497d933b251dbd1bc51.png

Plolty doesnโ€™t support pie chart

Histogram#

Histogram Plot: Used to visualize the distribution of a dataset. This plot is useful when you want to see the frequency distribution of a continuous variable.

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='hist');
_images/30ab1a81fa02a285e84fad1dab91eb1c6417f73339acf59be74e865a5c6909e4.png

Plotly version

df.plot(kind='hist', backend = 'plotly')

Area Plot#

Area Plot: Used to visualize the composition of a dataset. This plot is useful when you want to see the relative contribution of different parts of a dataset to the overall dataset.

df_sales.plot(x ='Year', y='Sales', kind='area');
_images/8fe081a51258a3412c4342f054d1bf366eb76f9232100e783d08602be6464bb9.png

Plotly Chart

df_sales.plot(x='Year', y='Sales', kind='area', backend = 'plotly')

Box Plot#

Box Plot: Used to visualize the distribution of a dataset through its quartiles. This plot is useful when you want to see the distribution of a variable and identify any outliers.

data = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])

# create a box plot of the dataset
data.plot(kind='box');
_images/0973b35c6fd5788cf1da39970391928c8f72d69fc736f26fc0307bc2a245b3a3.png

Plotly version

data.plot(kind='box', backend = 'plotly')

KDE Plot#

Kernel Density Estimation Plot: Used to estimate the probability density function of a continuous variable. This plot is useful when you want to see the probability distribution of a variable.

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='kde');
_images/6b336e832bdf1779218115dba3977a17cdc5db39a18077b849c4b87c17a8a73f.png

Plotly doesnโ€™t support kde charts

Density Plot#

Density Plot: Used to visualize the density of a continuous variable. This plot is useful when you want to see the probability distribution of a variable.

Note: โ€˜densityโ€™ plot is same as โ€˜kdeโ€™ plot

df = pd.Series(np.random.randn(1000))

# create a density plot of the dataset
df.plot(kind='density');
_images/869b3cc41c90af03c4a88838d11ffed9f460609d580b6cba99e84bd4e8ab390c.png

Plotly doesnโ€™t support density

Hexagonal Bin Plot#

Hexagonal Bin Plot: Used to visualize the distribution of a dataset through hexagonal bins. This plot is useful when you want to analyze the distribution of a dataset with a large number of points.

# df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B'])

# # create a hexagonal bin plot of the dataset
# df.plot(kind ='hexbin', x ='A', y ='B', gridsize =20);

Plotly doesnโ€™t support Hexbins

๐Ÿ“š Reference material#

If you wish to know more about Data Visualisation and Storytelling. The below resources should be helpful.