Lesson 17 Exploratory analysis Flashcards

1
Q

Import the follow csv from this location, ensuring to specify that data is separated by delimiter ;

r’C:\Users\User\Documents\CFG_DATA\data\winequality-red.csv’

A

df = pd.read_csv(r’C:\Users\User\Documents\CFG_DATA\data\winequality-red.csv’, sep=’;’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

check labels for each column

A

df.columns.values
or
df.keys()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the number of rows?
Columns?

A

df.shape[0]
df.shape[1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Check information for each column

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Return the unique values from a column called quality.

A

df.quality.unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Calculate the frequency of each unique value in the “quality” column of the DataFrame df (return a Series with the unique values as the index and their respective counts as the values.)

A

df.quality.value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Check for missing values using a heatmap

A

cbar is the colorbar

sns.heatmap(df.isnull(),cbar=False,yticklabels=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

calculate attributes correlation

A

df.corr()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Build correlation heatmap

A

plt.figure(figsize=(6,4))
sns.heatmap(df.corr(),annot=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Increase the size of the heatmap.

A

plt.figure(figsize=(16, 6))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

k = 12

A

specify the number of variables for the heatmap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Question I need to figure out why it is necessary to create this new heatmap of the correlation matrix

A

Quality correlation matrix

Increase the size of the heatmap.
plt.figure(figsize=(16, 6))

k = 12 # number of variables for heatmap
cols = df.corr().nlargest(k, ‘quality’)[‘quality’].index
cm = df[cols].corr()

sns.heatmap(cm, annot=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Create a boxplot

A

plt.boxplot(df_happy_gdp[‘Happiness_score’])

Set the title and labels
plt.title(‘Box Plot of Happiness Score’)
plt.xlabel(‘Happiness Score’)
plt.ylabel(‘Value’)
plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly