lesson_4_flashcards

1
Q

What are the three steps in data preparation?

A

Cleaning the data, transforming the data, and preprocessing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the types of missing data?

A
  1. Missing completely at random, 2. Missing at random, 3. Missing not at random.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is imputation in data cleaning?

A

Filling missing data with best-guess values, e.g., using the mean, mode, or k-nearest neighbors for numerical and categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of data transformation?

A

To convert raw data into a suitable format, such as converting RGB images to grayscale or encoding text as numerical values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of data preprocessing?

A

To improve model convergence by normalizing and standardizing data, such as mean subtraction or scaling features to similar ranges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is fairness in machine learning?

A

Ensuring equitable outcomes through methods like anti-classification, classification parity, and calibration across protected attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is anti-classification in fairness?

A

A fairness metric where protected attributes, such as race or gender, cannot directly influence predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is classification parity in fairness?

A

Ensures predictive performance metrics, like false positive or negative rates, are equal across groups defined by protected attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is calibration in fairness?

A

Ensures predicted probabilities correspond to the same outcomes regardless of protected attribute groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are common techniques for cleaning missing depth data?

A

Nearest neighbor interpolation, colorization, or inpainting techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is normalization important in data preprocessing?

A

It improves numerical stability and gradient flow, aiding model convergence and preventing issues like vanishing gradients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the relationship between bias and data in machine learning?

A

Bias in data, including underrepresentation of groups or skewed samples, can lead to inequitable and inaccurate model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly