Quantitative Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

The Task of reducing the number of independent variables in a data set typically involves the use of what?

A

Unsupervised learning with untagged data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

in support vector machine outliers will affect…

A

don’t affect either the support vectors or the discriminant boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

regarding neural networks - True or False?

Each hidden node has both a summation and an activation function?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false?

Penalized regression attempts to minimize the sum of the squared residual less a penalty term that increase with the number of features used

A

False

Penalized regression attempts to minimize the sum of the squared residual plus a penalty term that increase with the number of features used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In supervised learning techniques, the test dataset contains - featured inputs, targeted outcomes, or both?

and in the training dataset contains - featured inputs, targeted outcomes, or both?

A

Test: featured input only

Training: both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the supervised machine learning that requires the modeler to specify the most number of hyperparameters is KNN? LASSO? or random forest?

A

Random Forests

both LASSO and KNN require 1 Hyperparameter each

a random forest requires 4: the number of subset features (m), the number of trees to use, the minimum size (population) of each node or left, and the maximum depth of each tree are all hyperparameters that can be turned to improve overall model predication accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

an ML algo used to minimize the chances of overfitting a model to a dataset when there are many variables that could be used to explain or model Y, is known as …

A

Penalized regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When a target variable is continuous and the relationship between the target and the features is non-linear, the modeler should use;

a) penalized regression
b) classification and regression trees
c) LASSO

A

B) Classification and regression trees

A) and C) are linear regression techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Model generalization is max when predication error on test data is minimized

A

a model that generalizes well is a model that retains its explanatory power when predicting out of sample. the evaluation of any ML algo thus focuses on its prediction error on new data rather than on its goodness-of-fit on the data in which the algo was fitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CART

A

A Classification And Regression Tree (CART), is a predictive model, which explains how an outcome variable’s values can be predicted based on other values. A CART output is a decision tree where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable

The predication would be the mean of the value at a terminal node

used when the target variable of prediction is either categorical or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which supervised learning technique does not require the modeler to specify a hyperparameter

a) KNN
b) SVM
c) LASSO

A

b) SVM

LASSO - he hyperparameter in LASSO is lambda, the weight multiplied by the sum of the absolute value of the coefficient

KNN - k is hyperparameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Combining the prediction from a collection of learning algo is called?

a) deep learning
b) neural networks
c) ensemble learning

A

c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

unsupervised learning is typically not well suited for use in tasks involving:

a) clustering
b) classification
c) dimension reduction

A

b) classification

Two important types of problems that are well suited to unsupervised machine learning are reducing the dimension of data and sorting data into clusters, known as dimension reduction and clustering, respectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Feature selection reduces data set dimensionality by

A

including and excluding features in the data without altering them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Triming

A

when extreme values and outliers are simply removed from the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Winsorization

A

when extreme values and outliers are replaced with the max and min values of data points that are not outliers

17
Q

Data Filtration

A

it is a preprocessing activity that remove unneeded observations

18
Q

Tokenization

A

it is splitting a given text into separate words or characters

19
Q

Large data set with a large number of features will most likely contribution to: good model fit, model overfitting, or model underfitting

A

model overfitting

20
Q

advantage of simulation modelling

A
  • it yields a distribution for expected value rather than a point estimate
  • also estimates a standard deviation and a breakdown of the values, by percentile
  • the expected values from simulations should be fairly close to the expected value that we would obtain using the conventional risk-adjusted model
21
Q

Stemming and lemmatization address the problem of - what?

A

Data sparseness or low frequency tokens.

Data sparseness refers to words that appear very infrequently, resulting in data consisting of many unique, ow- frequency token.

Stemming and lemmatization can decrease data sparseness by aggregating many sparsely occurring words in relatively less sparse steam or lemmas, thereby aiding in training less complex ML models.

22
Q

Data Exploration stages are

A

exploratory data analysis
feature selection (use Ch-square tests and mutual information scores)
feature engineering