Exam 2 Flashcards

(34 cards)

1
Q

Why we use exploratory modeling

A

obtain the best fit model from all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why we use predictive modeling

A

split observations into a training set and a validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are training sets used for

A

create the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are validation sets used for

A

evaluate accuracy of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

predictors

A

these are our variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

target

A

what we are trying to estimate to test model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

regression

A

determining the relationship between a variable and one or more other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression

A

gives a set of observations, determine the equation of a line that can be used to describe the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

types of error

A

error - estimated
mean error - average of errors
mean square error = same as mean but sum of errors are squared
root mean square error = same thing as MSE but taking the square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you know if r-square is accurate?

A

the closer to 1 it is the more accurate it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

validation set

A

used to test a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why do we split validation and training sets

A

to learn about the data and test it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

class

A

category for data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when would we want to use a class?

A

to identify a label for the data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the max of k

A

training size of dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is data normalization

A

organizing data to reduce redundancy. It is important with knn because the distance squares the difference in features

17
Q

regression model vs classification model

A

regression = numeric outcomes
classification = categorical outcomes

18
Q

what type of data can we use with KNN

A

numbers with categories, must prepare the data (like we did with car example in class)

19
Q

scatter plots are good to show change over time true or false

20
Q

how much data do you use to find K (max value of k?)

A

Set aside 80% of total data for K

21
Q

Knn and Bayes differences

A

Knn is based off euclidean distance and bayes is based on categorical data

22
Q

why do we use categorical for bayes

A

probability based

23
Q

why do we use numerical for knn

A

to measure the euclidean distance between the data

24
Q

what is the best chart for comparing 2 things

25
what is the best chart for finding proportions
pie chart
26
shows the relationship between 2 variables
scatter plot
27
what chart is used for change over time
line plot
28
one hot encoding
001, etc
29
euclidean distance equation
30
what kind of data do we use for bayes classifiers
categorical but for features we must numeric it
31
What affects the accuracy of a Bayes classifier?
assumption of features, the quality and size of the training data, feature relevance, distribution of features, class imbalance, parameter estimation, data preprocessing, outliers
32
What is data imbalance, and why does that matter for Bayes classifiers?
biased predictions and unreliable probability estimates for minority classes
33
What is the correlation between skewness in histogram and box plot?
Skew left = median is below mean symmetric = mean = median skew right = median is above mean
34