EDA with Python Flashcards
How do you group a dataframe by a column or combination of columns?
df.groupby(‘Column’)
For the IRIS dataset, how would you group by Species then return the count of each species?
iris_data.groupby(‘Species’).size()
How do you import seaborn?
import seaborn as sns
How would you display a count plot of the Species column?
sns.countplot(x = ‘Species’, data = iris_data)
How do you display a histogram of a dataset?
df. hist()
plt. show()
What does the distplot function show in seaborn and how do you use it?
It shows a histogram of the selected column and a smoothed distribution plot that follows the histogram. Useful for visualizing univariate relationships. `
sns.displot(a=data[‘column’], rug = True)
With optional parameters
What does kdeplot stand for?
Kernal density estimate plot
How do you use kdeplot in seaborn?
sns.kdeplot(data = df[‘Column’])
With optional parameters
What is a FacetGrid?
Basically a class that allows you to view multiple different subsets of your dataset in visualizations,
Type the example shown by Pamela in her notebook of how to use FacetGrid and kdeplot to visualize the kernal density estimation plots of the three different Iris species, using a hue = “Species”, and size = 6 attributes
sns.FacetGrid(iris_data, hue = “Species”, size = 6)\
.map(sns.kdeplot, “PetalLengthCm”) \
.add_legend()
How do you create four subplots of the distplot function visualization?
f, axes = plt.subplots(2,2, figsize = (7,7), sharex = True)
sns.distplot(iris_data[], color = ‘’, ax = axes[0,0]
etc
How do you create a scatterplot matrix?
scatter_matrix(iris_data)
plt.show()
How do you visualize four different boxplots of the iris_data dataframe with a 2 x 2 layout, without sharing x or y axes between the plots?
iris_data.plot(kind = ‘box’, subplots = True, layout = (2,2), sharex = False, sharey = False)
Import stats from scipy, then display a Q-Q plot for Species == Iris-setosa with a title of “Setosa Sepal Width Q-Q Plot”
from scipy import stats
import matplotlib.pyplot as plt
iris_setosa = iris_data.query (‘Species == “Iris-setosa”’)
stats. probplot(iris_setosa[‘SepalWidthCm’], dist = “norm”, plot = plt)
plt. title(“Setosa Sepal Width Q-Q Plot”)