Exam 2 Flashcards

1
Q

Why we use exploratory modeling

A

obtain the best fit model from all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why we use predictive modeling

A

split observations into a training set and a validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are training sets used for

A

create the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are validation sets used for

A

evaluate accuracy of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

predictors

A

these are our variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

target

A

what we are trying to estimate to test model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

regression

A

determining the relationship between a variable and one or more other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression

A

gives a set of observations, determine the equation of a line that can be used to describe the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

types of error

A

error - estimated
mean error - average of errors
mean square error = same as mean but sum of errors are squared
root mean square error = same thing as MSE but taking the square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you know if r-square is accurate?

A

the closer to 1 it is the more accurate it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

validation set

A

used to test a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why do we split validation and training sets

A

to learn about the data and test it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

class

A

category for data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when would we want to use a class?

A

to identify a label for the data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the max of k

A

training size of dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is data normalization

A

organizing data to reduce redundancy. It is important with knn because the distance squares the difference in features

17
Q

regression model vs classification model

A

regression = numeric outcomes
classification = categorical outcomes

18
Q

what type of data can we use with KNN

A

numbers with categories, must prepare the data (like we did with car example in class)

19
Q

scatter plots are good to show change over time true or false

A

false

20
Q

how much data do you use to find K (max value of k?)

A

Set aside 80% of total data for K

21
Q

Knn and Bayes differences

A

Knn is based off euclidean distance and bayes is based on categorical data

22
Q

why do we use categorical for bayes

A

probability based

23
Q

why do we use numerical for knn

A

to measure the euclidean distance between the data

24
Q

what is the best chart for comparing 2 things

A

bar chart

25
Q

what is the best chart for finding proportions

A

pie chart

26
Q

shows the relationship between 2 variables

A

scatter plot

27
Q

what chart is used for change over time

A

line plot

28
Q

one hot encoding

A

001, etc

29
Q

euclidean distance equation

A
30
Q

what kind of data do we use for bayes classifiers

A

categorical but for features we must numeric it

31
Q

What affects the accuracy of a Bayes classifier?

A

assumption of features, the quality and size of the training data, feature relevance, distribution of features, class imbalance, parameter estimation, data preprocessing, outliers

32
Q

What is data imbalance, and why does that matter for Bayes classifiers?

A

biased predictions and unreliable probability estimates for minority classes