Exploratory data analysis Flashcards

Question 1

Q

Find correlation between the following columns

Answer

A

df[[‘bore’, ‘stroke’, ‘compression-ratio’, ‘horsepower’]].corr()

Question 2

Q

Find the scatterplot of two columns

Answer

A

sns. regplot(x=”engine-size”, y=”price”, data=df)

plt. ylim(0,)

Question 3

Q

Find the boxplot of two columns

Answer

A

sns.boxplot(x=”body-style”, y=”price”, data=df)

Question 4

Q

Compute basic statistics for all variables

Answer

A

df.describe()

Question 5

Q

Value-counts is a good way of understanding how many units of each characteristic/variable we have.

Answer

A

df['drive-wheels'].value_counts()
fwd    118
rwd     75
4wd      8
Name: drive-wheels, dtype: int64

Question 6

Q

convert the above results to data frame

and rename the columns

Answer

A

drive_wheels_counts = df[‘drive-wheels’].value_counts().to_frame()

drive_wheels_counts.rename(columns={‘drive-wheels’: ‘value_counts’}, inplace=True)

drive_wheels_counts

Question 7

Q

find out the distinct groups

Answer

A

df[‘drive-wheels’].unique()

Question 8

Q

Select multiple columns and assign one variable to it

This is the first step of grouping.

Answer

A

df_group_one = df[[‘drive-wheels’,’body-style’,’price’]]

Question 9

Q

Calculate the average price for each category

Answer

A

df_group_one = df_group_one.groupby([‘drive-wheels’],as_index=False).mean()
df_group_one

Question 10

Q

df_gptest = df[[‘drive-wheels’,’body-style’,’price’]]

grouped_test1 = df_gptest.groupby([‘drive-wheels’,’body-style’],as_index=False).mean()

grouped_test1

Answer

A

This groups the dataframe by the unique combinations ‘drive-wheels’ and ‘body-style’.

Question 11

Q

change it to pivot table style

Answer

A

grouped_pivot = grouped_test1.pivot(index=’drive-wheels’,columns=’body-style’)
grouped_pivot

Question 12

Q

Plot a heat map

Answer

A

plt. pcolor(grouped_pivot, cmap=’RdBu’)
plt. colorbar()
plt. show()

Question 13

Q

Calculate Pearson correlation coefficient and P-value

Answer

A

pearson_coef, p_value = stats.pearsonr(df[‘width’], df[‘price’])
print(“The Pearson Correlation Coefficient is”, pearson_coef, “ with a P-value of P =”, p_value )

Exploratory data analysis Flashcards

(13 cards)