L4: Foundation of ML and Linear Regression Flashcards by Sandie Huang

Which of the following statements hold true for supervised ML? (Select all correct)
A) the algorithm is trained on a labeled dataset, where the input data is associated with corresponding output labels
B) The goal is for the algorithm to learn a mapping or relationship between the input features and the output labels
C) logistic regression is a form of supervised learning
D) decision trees is a form of supervised learning
E) linear regression is a form of supervised leaning

In supervised learning, all hold true

A) the algorithm is trained on a labeled dataset, where the input data is associated with corresponding output labels
B) The goal is for the algorithm to learn a mapping or relationship between the input features and the output labels
C) logistic regression is a form of supervised learning
D) decision trees is a form of supervised learning
E) linear regression is a form of supervised leaning

How well did you know this?

Not at all

Perfectly

In the case of logistic regression, supervised learning is used for binary classification problems, where the output variable is categorical (e.g., 0 or 1). The LR model estimates the probability that a given input belongs to a particular class and makes predictions based on these probabilities.

TRUE/ FALSE

TRUE

How well did you know this?

Not at all

Perfectly

In the context of decision trees, the algorithm learns (with supervised learning) a set of hierarchical decision rules based on the features of the data to make predictions about the output label.

TRUE/ FALSE

TRUE

How well did you know this?

Not at all

Perfectly

Unsupervised learning involves NOT…
Select all wrong statements:

A) training a model on an labelled dataset
B) the algorithm explores the inherent structure or patterns within the data without explicit guidance
C) Common tasks include clustering, dimensionality reduction, and density estimation
D) The goal is to discover relationships or groupings in the data without predefined output labels

Wrong:
A) unsupervised learning involves training a model on an labelled dataset

the model is trained on an un-labelled dataset, in contrary to supervised dataset

How well did you know this?

Not at all

Perfectly

Which of the following statements are true for reinforcement learning?

A) a learning agent evolves behaviors to solve tasks using evaluative feedback
B) the agent is punished or rewarded as consequence of its actions
C) it requires interactions between agent and environment
D) the agent learns through experience (trial/error)

All statements are true:

How well did you know this?

Not at all

Perfectly

In linear regression, which measure tells us how good the prediction is?

Mean squarred error (MSE) is used as the loss function in linear regression, telling us how good the prediction is

How well did you know this?

Not at all

Perfectly

What are the limitations of ML?

Example/ hint: if the temperature measurer breaks down for one day, you must find a way to take this into account

There are many areas where things can go wrong in ML, leading the machine to learn something wrong.

How well did you know this?

Not at all

Perfectly

When using ML to distinguish between blueberry muffins and chihuahuas, the “chihuahua” label is stratistically associatted with big eyes, small bodies, pointy ears, etc.

What are the potential limitations on ML in this context?

The ML algorithm sees the world as a set of pixel values. But muffins and chihuahuas have similar pixel values…

The same issue might also be present when distinguishing between chihuahuas and other dog breeds with similar attributes.

Limitation: it can be wrong

How well did you know this?

Not at all

Perfectly

When using ML to distinguish between blueberry muffins and chihuahuas, it poses limitations due to the attributes of muffins and chihuahuas being similar. What is a solution to this?

You want to define features that specifically discriminate and thus separate between doubt objects:
- distance between eyes
- max three dots
- max x kilo
- smell and taste data

How well did you know this?

Not at all

Perfectly

Once a rule/ prediction is established, re-testing and re-training is necessary with new data to account for:

A) monoticity
B) changes in rules
C) veloroticity

B) changes in rules

I.e., when the future becomes different from past, retraining and retesting is necessary to maintain good predictive performance on future data

How well did you know this?

Not at all

Perfectly

What is the cross-validation trade-off?

Cross-validation trade-off: if you use part of the data to test the rule made based on the remainder of the data, this allows the model to be tested out. However, this also means that less data is used for building the model

How well did you know this?

Not at all

Perfectly

What is cross-validation?

Cross validation entails splitting your data into one part for training and the held-out set for testing.

How well did you know this?

Not at all

Perfectly

What is the goal of cross-validation?

To avoid or mitigate overfitting

How well did you know this?

Not at all

Perfectly

Which of the following statements are true about bias in the bias-variance tradeoff in ML?
A) Bias refers to the error introduced by approximating a real-world problem with a simplified model
B) High bias models may oversimplify the underlying patterns in the data and lead to systematic errors.
C) Low bias is often associated with underfitting
D) Bias is the error introduced by using a complex model that is highly responsive to the training data

A) Bias refers to the error introduced by approximating a real-world problem with a simplified model
B) High bias models may oversimplify the underlying patterns in the data and lead to systematic errors.

WRONG:
C) Low bias is often associated with underfitting –> HIGH bias has this problem

D) Bias is the error introduced by using a complex model that is highly responsive to the training data –> this is the case for high variance

How well did you know this?

Not at all

Perfectly

In the bias-variance tradeoff in ML, which of the following statements are FALSE about variance? (Select all wrong answers)

A) variance is the error introduce by using a complex model highly responsive to the training data
B) high variance models may capture noise or random fluctuations in the training data
C) high variance leads to poor generalisation on new, unseen data
D) High variance is often associated with overfitting

All options are true

How well did you know this?

Not at all

Perfectly

Simpler models tend to have higher variance but lower bias – and vice-versa for complex models. But very complex models can result in overfitting.

TRUE/ FALSE?

FALSE. Actually true statement:

Simpler models tend to have higher BIAS but lower VARIANCE – and vice-versa for complex models. But very complex models can result in overfitting

How well did you know this?

Not at all

Perfectly

Predictive modeling works by leveraging correlations between feature values (input) and output. Ideally, our predictors would each contribute independent sources of information about the outcome, and this is often the case

TRUE/FALSE

FALSE

Instead:
Predictive modeling works by leveraging correlations between feature values (input) and output. Ideally, our predictors would each contribute independent sources of information about the outcome, but this often not the case

How well did you know this?

Not at all

Perfectly

Which of the following statements are NOT true about co-linearity?
A) it occurs when two+ features are highly correlated with one another.
B) it can cause problems if doing statistical inference because if predictors increase or decrease together it can be hard to determine their separate effects the output
C) Generally the solution is to remove one of the redundant predictors
D) Since we are more interested in prediction and not in explanatory modeling, it is even more important to take into account

WRONG:
D) Since we are more interested in prediction and not in explanatory modeling, it is even more important to take into account

Instead: Since we are more interested in prediction and not in explanatory modeling, we won’t worry too much about the issue of (multi)co-linearity.

How well did you know this?

Not at all

Perfectly

You can make a correlation matrix to explore correlations between variables that are both numeric and categorical

TRUE/FALSE

FALSE
Correlation matrix, or correlation in general, can only be mapped for numeric variables, since it is not possible to see this relation for variables with a finite number of outcomes (e.g. factors)

How well did you know this?

Not at all

Perfectly

In linear regression, the intercept term alone tells you the expected response variable output if explanatory variables had a value of 0

TRUE/FALSE

TRUE

How well did you know this?

Not at all

Perfectly

In linear regression, what does the F-statistics (output) measure?

Study These Flashcards

The F-statistic roughly compares the fit of the given model incl. predictors with an intercept-only model/naive model.

A significant F-statistic tells us that, overall, at least one of the predictors has a coefficient not equal to 0 and is thus related to the output

Which of the following statements are TRUE about t-statistics?
A) The t-statistic reports a test for the association between each predictor individually with the response variable
B) it can result in false discoveries of association due to multiple testing (some will by associated with the outcome purely by chance).
C) the t-statistic was not developed for today’s big data contexts where we might have dozens or hundreds of predictors

Study These Flashcards

All options are true

Neither the F or the t statistic was developed for today’s big data contexts where we might have dozens or hundreds of predictors.

TRUE/ FALSE

Study These Flashcards

TRUE

Linear assumption makes a list of simplistic assumptions. Which of the following are part of the assumptions?
A) normally distributed errors and the expected value of the error is 0
B) homoscedasticity (variance of errors is constant for all values of independent variables)
C) we are using the correct functional form (the true data generating model is a linear function)
D) each row’s values are independent draws from the same underlying distribution
E) there are no omitted relevant variables and no added irrelevant variables
F) no correlation between the error terms and independent variables
G) no autocorrelation among error terms produced by different values of the independent variables

Study These Flashcards

All of the assumptions are true:

A) normally distributed errors and the expected value of the error is 0
B) homoscedasticity (variance of errors is constant for all values of independent variables)
C) we are using the correct functional form (the true data generating model is a linear function)
D) each row’s values are independent draws from the same underlying distribution
E) there are no omitted relevant variables and no added irrelevant variables
F) no correlation between the error terms and independent variables
G) no autocorrelation among error terms produced by different values of the independent variables

Explain what "error" in linear regression, is

The error (residual) for each data point is the vertical distance between the observed value and the value predicted by the linear regression model. Mathematically, the error for an individual data point i is often denoted as epsilon and added lastly in the linear regression function

Due to a number of simplistic assumptions that have to be satisfied in linear regression, the model typically exerts poor predictive performance TRUE/FALSE

FALSE it turns out that a linear model can still predict quite well even when some or all of these assumptions are violated! So it’s always worth trying first, at least as a benchmark to compare against more complicated models.

In linear regression, how is the best fitting line estimated?

The ordinary least squares (OLS) estimation procedure

The ordinary least squares (OLS) seeks to find the line that minimizes the squared residuals between the line and the observations TRUE/FALSE

TRUE

In the context of the bias-variance tradeoff in linear regression, which of the following statements are NOT true? A) simple linear models may exhibit bias by systematically missing out on subtleties in the relationship between features and output. B) on the other hand, more complex models run the risk of overfitting C) overfitting results in the model mistaking noise (random sampling variation) for relational structure D) Usually there is a balance we seek to find between these two extremes E) neural networks is one type of model prone to overfitting due to its complexity

Alle options are correct

Why not simply use all the features available to predict the response variable?

To avoid overfitting!

In general, in many business-related big data contexts, a linear model perhaps oversimplifies or underfits the underlying relationship. TRUE/FALSE

TRUE

In linear regression, ideally, residuals would be normally distributed. In many business contexts, however, there are different costs associated with over- and underprediction, so it can be useful to analyze prediction errors. Sometimes we might even prefer a model that on average performs worse than another, but tends to make fewer errors in the more costly direction. TRUE/FALSE

TRUE

In simple terms, multiple R-squared provides an indication of how well the independent variables in the model explain the variation in the dependent variable TRUE/FALSE

TRUE

How is the multiple R square calculated? A) as the squared correlation between observed values of the dependent variable and the values predicted by the linear regression model B) as the squared coefficient between observed values of the dependent variable and the values predicted by the linear regression model C) as the squared error between observed values of the dependent variable and the values predicted by the linear regression model

A) The multiple R squared is calculated as the squared correlation between the observed values of the dependent variable and the values predicted by the linear regression model

When a linear model includes only a single predictor, R-squared is equal to the correlation coefficient r between the predictor and outcome TRUE/FALSE

TRUE

A multiple R-squared statistic of 0.72 reveals what? A) that the variation in response variable can account for roughly 72% of the variation in the error B) that the variation in predictor variable can account for roughly 72% of the variation in response variable.

B) that the variation in predictor variable can account for roughly 72% of the variation in response variable.

The presence of an interaction indicates that the effect of one predictor variable on the response variable is different at different values of the other predictor variable. TRUE/FALSE

TRUE

Since statistical significance is not a function of sample size, in “big data” contexts it is particularly useful. TRUE/FALSE

FALSE Instead: Since statistical significance is a function of sample size, in “big data” contexts it is not so useful. Given enough data nearly any association, no matter how practically insignificant, can be statistically significant.

Which of the following metrics in linear regression measures the proportion of variance in dependent variable explained by the entire set of independent variables, adjusted for number of predictors (more accurate measure of model fit) A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

B) Adj. R squared

Which of the following metrics in linear regression measures the proportion of variance in dependent variable explained by the entire set of independent variables? A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

A) Multiple R squared

Which of the following metrics in linear regression represents the standard deviation of the residuals, indicating the avg. amount that predicted values deviate from actual values A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

C) Std. Error

Which of the following metrics in linear regression provides the estimated coefficients for each independent variable, indicating the strength and direction of their association with the dependent variable? A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

D) Estimate

Which of the following metrics in linear regression indicates the significance level of each variable based on the p-value? A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

G) Signif. codes

Which of the following metrics in linear regression measures the number of standard deviations that the estimated coefficient is away from zero; helps assess the significance of each variable. A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

E) T-value

Which of the following metrics in linear regression indicates the number of independent values that can be assigned to a statistical distribution. A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

I) Degrees of freedom

Which of the following metrics in linear regression measures the overall significance of the regression model; assesses whether the model explains a significant amount of variance. A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

J) F-statistics

Which of the following metrics in linear regression represents the probability that the observed F-statistic is greater than a critical value for significance? Lower values suggest greater significance of the overall model A) Multiple R squared B) Adj. R squared C) Std. Error D) Estimate E) T-value F) Pr(>|t|) G) Signif. codes H) Residual st. error I) Degrees of freedom J) F-statistics K) P-value

K) P-value

In summary, the "residual standard error" is specific to regression models and quantifies the average prediction error, while "standard error" is a more general term used to express the precision of a statistical estimate. TRUE/FALSE

TRUE

Residual standard error is used to assess the overall goodness of fit of a linear regression model TRUE/FALSE

True

Imagine you have 10 data points. If you are estimating a simple linear regression model (one independent variable), you would have ___ degrees of freedom for the residuals because you've used ____ degree of freedom to estimate the slope of the line. Fill in the blanks: A) 9, 1 B) 10, 0 C) 0, 10

Imagine you have 10 data points. If you are estimating a simple linear regression model (one independent variable), you would have 9 degrees of freedom for the residuals because you've used 1 degree of freedom to estimate the slope of the line.

Imagine you have 10 data points. If you are estimating a linear regression model with two independent variables, you would have ___ degrees of freedom for the residuals because you've used ____ degree of freedom to estimate the slope of the line. The remaining degrees of freedom represent the data points that are free to vary after estimating the parameters Fill in the blanks

Imagine you have 10 data points. If you are estimating a linear regression model with two independent variables, you would have 8 degrees of freedom for the residuals because you've used 2 degree of freedom to estimate the slope of the line.

L4: Foundation of ML and Linear Regression Flashcards

(51 cards)