Univariate, Bivariate and Multivariate analysis using Python

These analyses are the fundamental steps of Exploratory Data Analysis(EDA) that we perform in our data science world. It shows us the direction of what Machine Learning technique are we going to apply in the further process.

In Univariate Analysis, we choose a single feature from the data and try to determine what the output or the target value is ,i.e., one feature/variable at a time.

Since we take only one feature or variable and classify the feature values with respect to the output, we plot all the feature values on X-axis whereas on the Y-axis there will be nothing, instead we get a line where Y-value for all those points is zero.

Sometimes, in an Univariate Analysis, various points get overlapped resulting into difficulty of classifying them. Therefore, we go for Bivariate or Multivariate Analysis.

In a Bivariate Analysis, we try to analyze two features instead of one, and finally determine the classification of output we are looking for. Here, in many cases, we come across outliers and hence overlapping of data points happens causing the same difficulty of classification. Therefore, the hero enters .i.e., Multivariate Analysis.

Multivariate Analysis deals with such complex sets of data with more than two features or variables.

Let’s get to know more about Univariate,Bivariate and Multivariate Analysis through the famous Iris-dataset.

`import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsimport sklearndf=pd.read_csv("/content/Iris.csv")df.head()#Univariate Analysisdf_setosa=df.loc[df['Species']=='Iris-setosa']df_setosadf_virginica=df.loc[df['Species']=='Iris-virginica']df_versicolor=df.loc[df['Species']=='Iris-versicolor']plt.plot(df_setosa['PetalLengthCm'])plt.plot(df_virginica['PetalLengthCm'])plt.plot(df_versicolor['PetalLengthCm'])plt.show()plt.plot(df_setosa['PetalLengthCm'],np.zeros_like(df_setosa['PetalLengthCm']),'o')plt.plot(df_virginica['PetalLengthCm'],np.zeros_like(df_virginica['PetalLengthCm']),'o')plt.plot(df_versicolor['PetalLengthCm'],np.zeros_like(df_versicolor['PetalLengthCm']),'o')plt.xlabel('PetalLengthCm')plt.show()#Bivariate Analysissns.FacetGrid(df,hue='Species',size=5).map(plt.scatter,"SepalLengthCm","SepalWidthCm").add_legend()                   #this is how the points will look like on the basis of "SepalLengthCm" and "SepalWidthCm" and being classified as well.                                               sns.FacetGrid(df,hue='Species',size=5).map(plt.scatter,"PetalLengthCm","SepalWidthCm").add_legend()                       #this is how the points will look like on the basis of "SepalLengthCm" and "SepalWidthCm" and being classified as well.                                               #here, we will come across points that are overlapped.#Multivariate Analysissns.pairplot(df,hue="Species",size=3)# this will give us the plots of the comparisons between more than 2 features and classify the feature values with respect to the output.`

This was just an overview of the Univariate, Bivariate and Multivariate Analysis. These are one of the most fundamental analysis we perform under Exploratory Data Analysis.

Hope you liked reading the article! Thank you for you patience.

More from Mukut Chakraborty

Data-Science enthusiast, Persuing Masters of Computer Science (specialisation in DATA ANALYTICS) from IIITM-Kerala,Sportsperson

Superset as a bridge between devs and domain experts

Get the Medium app