3: Machine Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Data are shuffled randomly and then divided into k equal subsamples.
One sample is saved to be used as validation sample, and the other k-1 samples are used as training samples

A

K-fold cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Technique of combining predictions from a number of models, with the objective of canceling out noise

A

Ensemble Learning

Results in: more accuracy & stable predictions (vs single model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • Nodes connected by links
  • Useful in: Supervised Regression & Classification models
  • Works well in presence of: nonlinearities & complex interactions among variables
  • Recognizes: patterns, clusters, and classifies
A

Neural Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsupervised Neural Networks with many hidden layers (often >20), and reinforcorced learning learn from their own prediction errors

Used for: complex tasks; image, pattern, & character recognition

A

Deep Learning Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • Algorithm learns from success & mistakes
  • Seeking to maximize reward and minimize punishment
  • Defined constraints
A

Reinforcement Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inputs & outputs are identified for the computer, and the algorithm uses this labeled training data to model relationships

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Computer is provided unlabeled data that the algorithm uses to determine the structure of the data

A

Unsupervised Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Least Absolute Shrinkage and Selection Operator (LASSO) is useful in building:

Penalized regression model

A

Parsimonious models, through feature reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-Nearest Neighbor, investment application includes:

Used in: classification & regression

A
  • predicting bankrupcty
  • assigning bond ratings class
  • predicting stock prices
  • creating customized indicies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random Forest investment applications include:

A
  • factor based asset allocation
  • prediction models for IPO success
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linear relationships

A penalized regression model tries to use a limited number of most important features that…

A

explain the variation in the dependent variable

Example: monthly returns on 100 stocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Overfitting occurs when:
Bias error:
Variance error:

A

when model fits the training too well
Bias error: low
Variance error: high

displaying non linear characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Generalize is the degree to which the model retains it’s explanatory power when:

A

predicting out of sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bias error is the degree to which:

A

the model fits the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variance error shows how much the model responds to:

A

new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to prevent overfitting:

A
  • don’t let model become too complex
  • proper data sampling using cross validation (k-fold)
17
Q

Complexity Reduction:

A

Dimensional Reduction
Use: PCA

18
Q

With supervised data, the training data contains:

A

ground truth

19
Q

Supervised ML algorithm

Classification focuses on sorting observation into:

A

distinct categories:
* pass or failure

20
Q

Regression based uses:

A

continuous variables

21
Q

Regression:

CART & Random forests are used for:

A

complex & non-linear

22
Q

Classified unsupervised data:

K-means is used for:

A

complex & linear data
with a known number of k clusters