EDA Flashcards

1
Q

Distribution helps with what data exploration activities?

A

-To determine how to fill missing values in a column/variable
-To determine by which measure, if there are missing values, they should be most aligned to within the distribution of the variable (imputing by the mode or the mean, or the median)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do we impute by the median?

A

We impute by the median in a skewed distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does an iloc function accomplish in EDA?

A

It isolates rows or columns in a dataset. We have to provide rows and columns by integer indexing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is this example of iloc instructing: df.iloc[:,[1,2,5,6,10]]

A

The iloc here is calling all the rows in the dataset with respect to the specific columns identified in the indexed column list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the loc function accomplish in EDA?

A

It isolates rows or columns in a dataset. We are allowed to provide columns by the column string names instead of integer indexes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is this example of loc instructing: df.loc[:,[‘gender’, ‘age’]]

A

The loc here is calling all the rows in the dataset with respect to the specific columns identified in the names column list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What function can be used to create a filter in a dataset?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly