lesson_4_flashcards
What are the three steps in data preparation?
Cleaning the data, transforming the data, and preprocessing the data.
What are the types of missing data?
- Missing completely at random, 2. Missing at random, 3. Missing not at random.
What is imputation in data cleaning?
Filling missing data with best-guess values, e.g., using the mean, mode, or k-nearest neighbors for numerical and categorical data.
What is the goal of data transformation?
To convert raw data into a suitable format, such as converting RGB images to grayscale or encoding text as numerical values.
What is the purpose of data preprocessing?
To improve model convergence by normalizing and standardizing data, such as mean subtraction or scaling features to similar ranges.
What is fairness in machine learning?
Ensuring equitable outcomes through methods like anti-classification, classification parity, and calibration across protected attributes.
What is anti-classification in fairness?
A fairness metric where protected attributes, such as race or gender, cannot directly influence predictions.
What is classification parity in fairness?
Ensures predictive performance metrics, like false positive or negative rates, are equal across groups defined by protected attributes.
What is calibration in fairness?
Ensures predicted probabilities correspond to the same outcomes regardless of protected attribute groups.
What are common techniques for cleaning missing depth data?
Nearest neighbor interpolation, colorization, or inpainting techniques.
Why is normalization important in data preprocessing?
It improves numerical stability and gradient flow, aiding model convergence and preventing issues like vanishing gradients.
What is the relationship between bias and data in machine learning?
Bias in data, including underrepresentation of groups or skewed samples, can lead to inequitable and inaccurate model predictions.