Introduction Flashcards
Non-Technical Definition: Statistical Learning
A broad set of tools for understanding and extracting information from data.
Non-Technical Definition: Supervised Learning
Methods for predicting an output (response) based on one or more inputs (predictors) when the correct output is known.
Non-Technical Definition: Unsupervised Learning
Methods for finding structure in data with inputs but no known or labeled output.
Definition: Regression Problem
Predicting a continuous (quantitative) response.
Definition: Classification Problem
Predicting a discrete (qualitative) output variable.
What is Dimension Reduction?
Summarizing or transforming high-dimensional data into fewer dimensions while retaining key information.
Difference Between Classification and Regression
Classification predicts a discrete category (e.g. ‘Up’ or ‘Down’), while regression predicts a numeric value.
Key Premise of ISLR #1
Statistical learning methods are broadly useful across many fields, not just statistics.
Key Premise of ISLR #2
Statistical learning should not be seen as a ‘black box’; understanding the assumptions and trade-offs is crucial.
Key Premise of ISLR #3
We need not master the deep mathematical details to effectively use these methods.
Key Premise of ISLR #4
Practical real-world applications are the main focus, with hands-on labs demonstrating methods in R.
Notation: n and p
n is the number of observations in a data set; p is the number of variables (features).
Matrix Representation of Data X
X is an n×p matrix, where each row is an observation and each column is a variable.
Definition: Transpose of a Matrix (X^T)
A matrix whose rows are the columns of the original matrix (and vice versa).
Notation: y_i
The i-th observation of the response variable we wish to predict.
Matrix Multiplication Requirement
You can only multiply A (of size r×d) and B (of size d×s) if the number of columns in A equals the number of rows in B.
Formula for (AB)_{ij}
The (i, j) element of AB is the sum of the products of corresponding elements from row i of A and column j of B.
Distinction Between Bold/Capital vs Lower-Case Font
Bold capitals (e.g., A) are matrices, lower-case bold (e.g., a) are n-length vectors, lower-case normal (e.g., a) are scalars or feature vectors, and capital normal (e.g., A) can denote random variables.
Linear vs. Non-Linear Methods
Linear methods assume a linear relationship between predictors and response; non-linear methods can capture more complex, flexible relationships.
Examples of Non-Linear Approaches
Tree-based methods (bagging, boosting, random forests), support vector machines, and generalized additive models.