Univariate, Bivariate and Multivariate analysis using Python
These analyses are the fundamental steps of Exploratory Data Analysis(EDA) that we perform in our data science world. It shows us the direction of what Machine Learning technique are we going to apply in the further process.
In Univariate Analysis, we choose a single feature from the data and try to determine what the output or the target value is ,i.e., one feature/variable at a time.
Since we take only one feature or variable and classify the feature values with respect to the output, we plot all the feature values on X-axis whereas on the Y-axis there will be nothing, instead we get a line where Y-value for all those points is zero.
Sometimes, in an Univariate Analysis, various points get overlapped resulting into difficulty of classifying them. Therefore, we go for Bivariate or Multivariate Analysis.
In a Bivariate Analysis, we try to analyze two features instead of one, and finally determine the classification of output we are looking for. Here, in many cases, we come across outliers and hence overlapping of data points happens causing the same difficulty of classification. Therefore, the hero enters .i.e., Multivariate Analysis.
Multivariate Analysis deals with such complex sets of data with more than two features or variables.
Let’s get to know more about Univariate,Bivariate and Multivariate Analysis through the famous Iris-dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
df=pd.read_csv("/content/Iris.csv")
df.head()#Univariate Analysisdf_setosa=df.loc[df['Species']=='Iris-setosa']
df_setosa
df_virginica=df.loc[df['Species']=='Iris-virginica']
df_versicolor=df.loc[df['Species']=='Iris-versicolor']
plt.plot(df_setosa['PetalLengthCm'])
plt.plot(df_virginica['PetalLengthCm'])
plt.plot(df_versicolor['PetalLengthCm'])
plt.show()
plt.plot(df_setosa['PetalLengthCm'],np.zeros_like(df_setosa['PetalLengthCm']),'o')
plt.plot(df_virginica['PetalLengthCm'],np.zeros_like(df_virginica['PetalLengthCm']),'o')
plt.plot(df_versicolor['PetalLengthCm'],np.zeros_like(df_versicolor['PetalLengthCm']),'o')
plt.xlabel('PetalLengthCm')
plt.show()#Bivariate Analysissns.FacetGrid(df,hue='Species',size=5).map(plt.scatter,"SepalLengthCm","SepalWidthCm").add_legend()
#this is how the points will look like on the basis of "SepalLengthCm" and "SepalWidthCm" and being classified as well.
sns.FacetGrid(df,hue='Species',size=5).map(plt.scatter,"PetalLengthCm","SepalWidthCm").add_legend() #this is how the points will look like on the basis of "SepalLengthCm" and "SepalWidthCm" and being classified as well. #here, we will come across points that are overlapped.#Multivariate Analysissns.pairplot(df,hue="Species",size=3)
# this will give us the plots of the comparisons between more than 2 features and classify the feature values with respect to the output.
This was just an overview of the Univariate, Bivariate and Multivariate Analysis. These are one of the most fundamental analysis we perform under Exploratory Data Analysis.
Hope you liked reading the article! Thank you for you patience.