EDA with Python Flashcards

1
Q

How do you group a dataframe by a column or combination of columns?

A

df.groupby(‘Column’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For the IRIS dataset, how would you group by Species then return the count of each species?

A

iris_data.groupby(‘Species’).size()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you import seaborn?

A

import seaborn as sns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you display a count plot of the Species column?

A

sns.countplot(x = ‘Species’, data = iris_data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you display a histogram of a dataset?

A

df. hist()

plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the distplot function show in seaborn and how do you use it?

A

It shows a histogram of the selected column and a smoothed distribution plot that follows the histogram. Useful for visualizing univariate relationships. `

sns.displot(a=data[‘column’], rug = True)
With optional parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does kdeplot stand for?

A

Kernal density estimate plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you use kdeplot in seaborn?

A

sns.kdeplot(data = df[‘Column’])

With optional parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a FacetGrid?

A

Basically a class that allows you to view multiple different subsets of your dataset in visualizations,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Type the example shown by Pamela in her notebook of how to use FacetGrid and kdeplot to visualize the kernal density estimation plots of the three different Iris species, using a hue = “Species”, and size = 6 attributes

A

sns.FacetGrid(iris_data, hue = “Species”, size = 6)\
.map(sns.kdeplot, “PetalLengthCm”) \
.add_legend()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you create four subplots of the distplot function visualization?

A

f, axes = plt.subplots(2,2, figsize = (7,7), sharex = True)

sns.distplot(iris_data[], color = ‘’, ax = axes[0,0]
etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you create a scatterplot matrix?

A

scatter_matrix(iris_data)

plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you visualize four different boxplots of the iris_data dataframe with a 2 x 2 layout, without sharing x or y axes between the plots?

A

iris_data.plot(kind = ‘box’, subplots = True, layout = (2,2), sharex = False, sharey = False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Import stats from scipy, then display a Q-Q plot for Species == Iris-setosa with a title of “Setosa Sepal Width Q-Q Plot”

A

from scipy import stats
import matplotlib.pyplot as plt

iris_setosa = iris_data.query (‘Species == “Iris-setosa”’)

stats. probplot(iris_setosa[‘SepalWidthCm’], dist = “norm”, plot = plt)
plt. title(“Setosa Sepal Width Q-Q Plot”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly