🤖 Support Vector Machines

🤖 Support Vector Machines#

👨‍🏫 Vikesh K
📓 Lab-09

💡 “Work hard in silence. Let success make the noise” 💡

📝Lab Agenda#

We will cover:

Fitting SVM model to numerical data with its variants
Visualizing the decision boundaries of SVM
Comparing the SVM results against other Classifiers

Support Vector Machines Overview#

Technique : Classifies by finding a separator

Algo Steps#

Mapping data to a feature space
- for example a data with two data columns gets mapped to a 2-d chart, forming a line
- Here lies one challenge, we need to map the data in the manner to highlight the differences between the values
- This process is called kernelling and mathemtical function is kernel
Find a separator between the points that maximises the margin
- It will be a straight line in two dimensional space and a hyperplane in higher dimensions
- The separator placement is determined by the values closer to the margin, hence called support vectors

Advantages#

Accurate in high-dimensional spaces
Memory efficient; as it doesn’t rely on the complete data and relies only on support vectors

Disadvantages#

Prone for over-fitting if features is much greater than samples
No probability estimation

Usages#

Good for Image recognition
Effective in text mining; especially due to efficiency in high-dimensional data
Good for Gene Classification
Other classification tasks

📓 Case Study : Wine Data#

Note

In Python, whenever you want to run/launch a package, you need to import it.

In the smart phone analogy, this is equal to tapping on an app to start it. You would have pre-installed apps and apps that you need to install. Remember, like in a phone’s pre-installed apps, os is one of the pre-installed apps.**

You won’t get any message if a package is imported successfully. Only if you do a wrong loading, error message will be shown

We are dealing with wine dataset. It deals with different categories of the wine. https://archive.ics.uci.edu/ml/datasets/wine

Data Set Information:

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are :

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
Proline

Data load and inspect#

## libraries
import pandas as pd
import numpy as np

# data viz
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
# plt.rcdefaults() # to get the default plot 


# ml models
from sklearn.svm import SVC #Support vector classifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier


# for ignoring warnings
import warnings
warnings.simplefilter("ignore")

print("All modules loaded properly")

All modules loaded properly

# loading the datasets
df = pd.read_csv("https://raw.githubusercontent.com/vkoul/data/main/misc/wine.csv")

df.shape

(178, 14)

Inspect the data

# how to get the shape of the data 
print(df.shape)

# How to get the column data types
print(df.info())

# how to check for few initial rows
display(df.head(5))

(178, 14)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Wine                  178 non-null    int64  
 1   Alcohol               178 non-null    float64
 2   Malic.acid            178 non-null    float64
 3   Ash                   178 non-null    float64
 4   Acl                   178 non-null    float64
 5   Mg                    178 non-null    int64  
 6   Phenols               178 non-null    float64
 7   Flavanoids            178 non-null    float64
 8   Nonflavanoid.phenols  178 non-null    float64
 9   Proanth               178 non-null    float64
 10  Color.int             178 non-null    float64
 11  Hue                   178 non-null    float64
 12  OD                    178 non-null    float64
 13  Proline               178 non-null    int64  
dtypes: float64(11), int64(3)
memory usage: 19.6 KB
None

	Alcohol	Malic.acid	Ash	Acl	Mg	Phenols	Flavanoids	Nonflavanoid.phenols	Proanth	Color.int	Hue	OD	Proline
0	14.23	1.71	2.43	15.6	127	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065
1	13.20	1.78	2.14	11.2	100	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050
2	13.16	2.36	2.67	18.6	101	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185
3	14.37	1.95	2.50	16.8	113	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480
4	13.24	2.59	2.87	21.0	118	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735

Understanding the Target Variable

We are trying to predict the type of wine based on chemical properties

df['Wine'].value_counts()

Wine
1    71
0    59
2    48
Name: count, dtype: int64

df['Wine'].value_counts().plot(kind = 'bar', backend = 'plotly')

Note

We will focus on subset of column for prediction to make it easy to visualize the results later

Select only three columns - “Wine”, “Alchol” and “Malic Acid”

# Select only the three columns required for our analysis

# Wine, Alcohol, Malic.acid
selection = ["Wine", "Alcohol", "Malic.acid"]

df = df[selection]
df.head()

	Alcohol	Malic.acid
0	14.23	1.71
1	13.20	1.78
2	13.16	2.36
3	14.37	1.95
4	13.24	2.59

Data Viz#

Plot the appropriate chart

# name the axis, titles
plt.xlabel("Alcohol")
plt.ylabel("Acid")
plt.title("Alcohol vs Acid content of different wines")


# plot the chart between Alchohol and Malic Alic acid, with color given by Wine
sns.scatterplot(x="Alcohol", y="Malic.acid", data=df, s=20, hue="Wine", palette="husl", size='Wine');

_images/fffa1b49330290c4883c39ec26aaa64e2f0ba31ba3e53a766455b47c62cc3092.png

Data Modeling#

Note

Function to generate classification plots

❗❗ You don’t need to understand it at this stage. This is used to generate plots at the end.❗❗

import numpy as np
import matplotlib.pyplot as plt
# Source: Mike Gelbart

def make_meshgrid(x, y, h=.02, lims=None):
    
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """

    if lims is None:
        x_min, x_max = x.min() - 1, x.max() + 1
        y_min, y_max = y.min() - 1, y.max() + 1
    else:
        x_min, x_max, y_min, y_max = lims
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy

def plot_contours(ax, clf, xx, yy, proba=False, transformation=None, **params):
    """Plot the decision boundaries for a classifier.
    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """

    X = np.c_[xx.ravel(), yy.ravel()]
    if transformation is not None:
        X = transformation(X)
        # xx = np.reshape(X[:,0], xx.shape)
        # yy = np.reshape(X[:,1], yy.shape)


    if proba:
        if hasattr(clf, 'predict_proba'):
            Z = clf.predict_proba(X)[:,-1]
            Z = Z.reshape(xx.shape)
            out = ax.imshow(Z,extent=(np.min(xx), np.max(xx), np.min(yy), np.max(yy)), origin='lower', vmin=0, vmax=1, **params)
            ax.contour(xx, yy, Z, levels=[0.5])
        else:
            Z = clf.decision_function(X)
            Z = Z.reshape(xx.shape)
            out = ax.contourf(xx, yy, Z, **params)
    else:
        Z = clf.predict(X)
        Z = Z.reshape(xx.shape)
        out = ax.contourf(xx, yy, Z, **params)
    return out


def plot_classifier(X, y, clf, ax=None, ticks=False, proba=False, lims=None): # assumes classifier "clf" is already fit
    X0, X1 = X.iloc[:,0], X.iloc[:,1]
    xx, yy = make_meshgrid(X0, X1, lims=lims)

    if ax is None:
        plt.figure()
        ax = plt.gca()
        show = True
    else:
        show = False

    # can abstract some of this into a higher-level function for learners to call
    cs = plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8, proba=proba)
    if proba:
        cbar = plt.colorbar(cs)
        cbar.ax.set_ylabel('probability of red $\Delta$ class', fontsize=20, rotation=270, labelpad=30)
        cbar.ax.tick_params(labelsize=14)
    #ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=30, edgecolors='k', linewidth=1)
    labels = np.unique(y)
    if len(labels) == 2:
        ax.scatter(X0[y==labels[0]], X1[y==labels[0]], cmap=plt.cm.coolwarm, s=60, c='b', marker='o', edgecolors='k')
        ax.scatter(X0[y==labels[1]], X1[y==labels[1]], cmap=plt.cm.coolwarm, s=60, c='r', marker='^', edgecolors='k')
    else:
        ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=50, edgecolors='k', linewidth=1)

    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
#     ax.set_xlabel(data.feature_names[0])
#     ax.set_ylabel(data.feature_names[1])
    if ticks:
        ax.set_xticks(())
        ax.set_yticks(())
#     ax.set_title(title)
    if show:
        plt.show()
    else:
        return ax
    
print("Code successfully ran")

Code successfully ran

Create the Feature and Target Variable#

# Create two data-frames. 
# X with only the features
# y with only the target values

# Drop wine column for X and retain only Wine for y
x = df.drop(columns = "Wine")
y = df["Wine"]

# check the shape of data
print(x.shape)
print(y.shape)

# check the type of data
print(type(x))
print(type(y))

(178, 2)
(178,)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

Standard Scaling#

Since the SVM classifier uses distances metrics, we would need to standardize the data before using the classifier

Read: Why Feature Scaling in SVM?

from sklearn.preprocessing import StandardScaler

# intiate 
scaler = StandardScaler()

# fit and transform 
x_scaled = scaler.fit_transform(x)

Converting the numpy array into pandas df for ease of use

x_scaled = pd.DataFrame(x_scaled, columns = x.columns)

Visually inspect the impact of Scaling

Density plot of UNscaled data

x.plot(kind = 'kde', subplots = True);

_images/72d21e2a9276291bf15e2b41f18e454d4ab067f80cc2ce9720be980d1e99d274.png

Density plot of Scaled data

x_scaled.plot(kind = 'kde', subplots = True);

_images/45d4ebf1421d0820cb077fc3835f1c2cf585d8902360f607505d0d719e2643e0.png

Building a SVM Classifier#

Highlighting the use of support vectors#

# get the dimensions of the data 
print("Number of original examples:", len(x))
print("Number of support vectors:", len(svm.support_))

# Extract only the support vectors from data 
x_small = x_scaled.iloc[svm.support_]
y_small = y.iloc[svm.support_]

# # Train a new SVM using only the support vectors
svm_small = SVC(kernel = "linear")
svm_small.fit(x_small, y_small)
plot_classifier(x_small, y_small, svm_small)

Number of original examples: 178
Number of support vectors: 82

_images/047af4571baf72f9456c459d2bd54d6e7a19806d69f08c3439aa30b8a1b68a11.png

Note

Changing the Kernels

The SVM algorithm offers a choice of kernel functions for performing its processing. Basically, mapping data into a higher dimensional space is called kernelling. The mathematical function used for the transformation is known as the kernel function, and can be of different types, such as:

1.Linear 2.Polynomial 3.Radial basis function (RBF) 4.Sigmoid

Each of these functions has its characteristics, its pros and cons, and its equation, but as there’s no easy way of knowing which function performs best with any given dataset, we usually choose different functions in turn and compare the results.

# making a list of the different kernel functions 
kernel_states = ["linear", "poly", "rbf", "sigmoid"]

# applying a loop to apply these kernel states
for k in kernel_states:
    print("The kernel applied is:", k)
    svm = SVC(kernel = k)
    svm.fit(x_scaled,y)
    plot_classifier(x_scaled, y, svm)

The kernel applied is: linear

_images/6c2440004af78ddd796bfb8c54b0f2495fc40aed0b5921f3c61d073eeb6d5a69.png

The kernel applied is: poly

_images/1e638a5ed5513b3698ffaeb7f2e34089fbc5b19e399592a8b262006990a7bafc.png

The kernel applied is: rbf

_images/7d3a680f5f4fa793384718310d6ac44bfd2e1662aba8b3b0c1fbf281fbc840a6.png

The kernel applied is: sigmoid

_images/1e51b8bdd6987a07717bc21c182ba447ddff76e13ebbd649bef2ea15c7fec474.png

Comparison of SVM with other Classification Models#

Decision Tree Classifier#

# call the model 
dt = DecisionTreeClassifier()

# fit the model 
dt.fit(x_scaled,y)

# plot the classification
plot_classifier(x_scaled, y, dt)

_images/275d5497045b9c4090cac3bef7e7ff7af119fcb43d2b377eefe6b2cdb8e3c630.png

Random Forest Classifier#

Logistic Regression Classifier#

📚 Reference material#

Kernels

Data Viz

Analysis

SVM on Wine analysis
Python Data Science handbook - very useful book if you want to learn how to do data science using python. Covers data manipulation, data visualisation and machine learning. Freely available

🤖 Support Vector Machines

Contents

🤖 Support Vector Machines#

📝Lab Agenda#

Support Vector Machines Overview#

Algo Steps#

Advantages#

Disadvantages#

Usages#

📓 Case Study : Wine Data#

Data load and inspect#

Data Viz#

Data Modeling#

Create the Feature and Target Variable#

Standard Scaling#

Building a SVM Classifier#

Highlighting the use of support vectors#

Comparison of SVM with other Classification Models#

Decision Tree Classifier#

Random Forest Classifier#

Logistic Regression Classifier#

📚 Reference material#