Class Four Flashcards
What is Principal Component Analysis (PCA)?
Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform a high-dimensional dataset into a lower-dimensional representation by finding the principal components that capture the most significant variation in the data.
What are the advantages of Principal Component Analysis (PCA)?
Advantages of PCA include reducing the dimensionality of the data, removing correlated features, and identifying the most informative features or patterns.
What are the limitations of Principal Component Analysis (PCA)?
Limitations of PCA include difficulty in interpreting the transformed components, sensitivity to outliers, and assumptions of linearity and normality.
What are the different types of PCA?
- Randomized PCA quickly finds an approximation of the first principal components.
- Incremental PCA (IPCA) splits the training set into mini-batches and feed an IPCA algorithm one mini-batch at a time.
-
Kernel PCA helps to perform complex nonlinear projections for
dimensionality reduction.
What is Singular Value Decomposition (SVD)?
Singular Value Decomposition (SVD) is a matrix factorization technique used in linear algebra to decompose a matrix into three separate matrices to extract the underlying structure and reduce dimensionality.
What are the advantages of Singular Value Decomposition (SVD)?
Advantages of SVD include its ability to handle missing values in data, extract latent features, and provide a low-rank approximation of the original matrix.
What are the limitations of Singular Value Decomposition (SVD)?
Limitations of SVD include its computational complexity for large matrices, difficulty in interpreting the singular values and vectors directly, and sensitivity to noise.
What is dimensionality reduction?
Dimensionality reduction is the process of reducing the number of features or variables in a dataset while preserving important information, aiming to eliminate irrelevant or redundant features.
What are the advantages of dimensionality reduction?
Advantages of dimensionality reduction include simplifying the analysis and visualization of data, reducing computational complexity, and mitigating the curse of dimensionality.
What are the limitations of dimensionality reduction?
Limitations of dimensionality reduction techniques include potential loss of information, difficulty in selecting the appropriate number of dimensions, and potential distortion of the data’s structure.
What are decision trees?
Decision trees are supervised machine learning models that recursively split the data based on features to create a tree-like structure for making decisions or predictions.
What are the advantages of decision trees?
Advantages of decision trees include interpretability, handling both numerical and categorical features, and automatic feature selection.
What are the limitations of decision trees?
Limitations of decision trees include overfitting, sensitivity to small changes in the data, and difficulty in capturing complex relationships or interactions.
When should we use Decision Trees?
We should consider decision trees when:
1.When our data is described using attributes (characteristics) and values. (attribute-value pairs)
2.When we want to predict outcomes that have specific categories or answers (like “yes” or “no”).
3.When we need to express choices that can be made by combining different conditions (like “if this or that”).
4.When our training data might have mistakes or some missing information, and we want a method that can handle those issues.
What is the Gini impurity measure?
The Gini impurity is a measure of impurity or homogeneity used in decision tree algorithms. It quantifies the probability of misclassifying a randomly chosen element if it were randomly labeled based on the distribution of classes in a given node. A lower Gini impurity indicates a more homogeneous node with a higher degree of purity.