Exam 2 Flashcards by Kyra Buck

Why we use exploratory modeling

obtain the best fit model from all observations

How well did you know this?

Not at all

Perfectly

Why we use predictive modeling

split observations into a training set and a validation set

How well did you know this?

Not at all

Perfectly

What are training sets used for

create the model

How well did you know this?

Not at all

Perfectly

what are validation sets used for

evaluate accuracy of the model

How well did you know this?

Not at all

Perfectly

predictors

these are our variables

How well did you know this?

Not at all

Perfectly

target

what we are trying to estimate to test model

How well did you know this?

Not at all

Perfectly

regression

determining the relationship between a variable and one or more other variables

How well did you know this?

Not at all

Perfectly

linear regression

gives a set of observations, determine the equation of a line that can be used to describe the dataset

How well did you know this?

Not at all

Perfectly

types of error

error - estimated
mean error - average of errors
mean square error = same as mean but sum of errors are squared
root mean square error = same thing as MSE but taking the square root

How well did you know this?

Not at all

Perfectly

How do you know if r-square is accurate?

the closer to 1 it is the more accurate it is

How well did you know this?

Not at all

Perfectly

validation set

used to test a model

How well did you know this?

Not at all

Perfectly

why do we split validation and training sets

to learn about the data and test it

How well did you know this?

Not at all

Perfectly

class

category for data

How well did you know this?

Not at all

Perfectly

when would we want to use a class?

to identify a label for the data points

How well did you know this?

Not at all

Perfectly

what is the max of k

training size of dataset

How well did you know this?

Not at all

Perfectly

What is data normalization

Study These Flashcards

organizing data to reduce redundancy. It is important with knn because the distance squares the difference in features

regression model vs classification model

Study These Flashcards

regression = numeric outcomes
classification = categorical outcomes

what type of data can we use with KNN

Study These Flashcards

numbers with categories, must prepare the data (like we did with car example in class)

scatter plots are good to show change over time true or false

Study These Flashcards

false

how much data do you use to find K (max value of k?)

Study These Flashcards

Set aside 80% of total data for K

Knn and Bayes differences

Study These Flashcards

Knn is based off euclidean distance and bayes is based on categorical data

why do we use categorical for bayes

Study These Flashcards

probability based

why do we use numerical for knn

Study These Flashcards

to measure the euclidean distance between the data

what is the best chart for comparing 2 things

Study These Flashcards

bar chart

what is the best chart for finding proportions

pie chart

shows the relationship between 2 variables

scatter plot

what chart is used for change over time

line plot

one hot encoding

001, etc

euclidean distance equation

what kind of data do we use for bayes classifiers

categorical but for features we must numeric it

What affects the accuracy of a Bayes classifier?

assumption of features, the quality and size of the training data, feature relevance, distribution of features, class imbalance, parameter estimation, data preprocessing, outliers

What is data imbalance, and why does that matter for Bayes classifiers?

biased predictions and unreliable probability estimates for minority classes

What is the correlation between skewness in histogram and box plot?

Skew left = median is below mean symmetric = mean = median skew right = median is above mean

Exam 2 Flashcards

(34 cards)