Exploratory data analysis Flashcards

1
Q

Find correlation between the following columns

A

df[[‘bore’, ‘stroke’, ‘compression-ratio’, ‘horsepower’]].corr()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Find the scatterplot of two columns

A

sns. regplot(x=”engine-size”, y=”price”, data=df)

plt. ylim(0,)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Find the boxplot of two columns

A

sns.boxplot(x=”body-style”, y=”price”, data=df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Compute basic statistics for all variables

A

df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Value-counts is a good way of understanding how many units of each characteristic/variable we have.

A
df['drive-wheels'].value_counts()
fwd    118
rwd     75
4wd      8
Name: drive-wheels, dtype: int64
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

convert the above results to data frame

and rename the columns

A

drive_wheels_counts = df[‘drive-wheels’].value_counts().to_frame()

drive_wheels_counts.rename(columns={‘drive-wheels’: ‘value_counts’}, inplace=True)

drive_wheels_counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

find out the distinct groups

A

df[‘drive-wheels’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Select multiple columns and assign one variable to it

This is the first step of grouping.

A

df_group_one = df[[‘drive-wheels’,’body-style’,’price’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Calculate the average price for each category

A

df_group_one = df_group_one.groupby([‘drive-wheels’],as_index=False).mean()
df_group_one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

df_gptest = df[[‘drive-wheels’,’body-style’,’price’]]

grouped_test1 = df_gptest.groupby([‘drive-wheels’,’body-style’],as_index=False).mean()

grouped_test1

A

This groups the dataframe by the unique combinations ‘drive-wheels’ and ‘body-style’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

change it to pivot table style

A

grouped_pivot = grouped_test1.pivot(index=’drive-wheels’,columns=’body-style’)
grouped_pivot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Plot a heat map

A

plt. pcolor(grouped_pivot, cmap=’RdBu’)
plt. colorbar()
plt. show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate Pearson correlation coefficient and P-value

A

pearson_coef, p_value = stats.pearsonr(df[‘width’], df[‘price’])
print(“The Pearson Correlation Coefficient is”, pearson_coef, “ with a P-value of P =”, p_value )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly