lesson_4_flashcards

Question 1

Q

What are the three steps in data preparation?

Answer

A

Cleaning the data, transforming the data, and preprocessing the data.

Question 2

Q

What are the types of missing data?

Answer

A

Question 3

Q

What is imputation in data cleaning?

Answer

A

Filling missing data with best-guess values, e.g., using the mean, mode, or k-nearest neighbors for numerical and categorical data.

Question 4

Q

What is the goal of data transformation?

Answer

A

To convert raw data into a suitable format, such as converting RGB images to grayscale or encoding text as numerical values.

Question 5

Q

What is the purpose of data preprocessing?

Answer

A

To improve model convergence by normalizing and standardizing data, such as mean subtraction or scaling features to similar ranges.

Question 6

Q

What is fairness in machine learning?

Answer

A

Ensuring equitable outcomes through methods like anti-classification, classification parity, and calibration across protected attributes.

Question 7

Q

What is anti-classification in fairness?

Answer

A

A fairness metric where protected attributes, such as race or gender, cannot directly influence predictions.

Question 8

Q

What is classification parity in fairness?

Answer

A

Ensures predictive performance metrics, like false positive or negative rates, are equal across groups defined by protected attributes.

Question 9

Q

What is calibration in fairness?

Answer

A

Ensures predicted probabilities correspond to the same outcomes regardless of protected attribute groups.

Question 10

Q

What are common techniques for cleaning missing depth data?

Answer

A

Nearest neighbor interpolation, colorization, or inpainting techniques.

Question 11

Q

Why is normalization important in data preprocessing?

Answer

A

It improves numerical stability and gradient flow, aiding model convergence and preventing issues like vanishing gradients.

Question 12

Q

What is the relationship between bias and data in machine learning?

Answer

A

Bias in data, including underrepresentation of groups or skewed samples, can lead to inequitable and inaccurate model predictions.