Exam Flashcards

1
Q

what is statistical learning (also known as machine learning)?

A

relies on the idea that algorithms can learn from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

supervised learning?

A

is task-driven and the data is labelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

target variable (supervised learning)

A

a variable that we need to gain more information on, or predict the value of.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

true values of the target variable are called??>

A

labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

predictors?

A

they are used in predictive analytics to make predictions on the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

target variables could be:

A

continuous or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discrete

A

can have two levels (binary target) or multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

classification

A

is used to predict the value of a discrete target variable, given predictor variable values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

continuous target

A

can have large number of possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regression:

A

is used to predict the value of a continuous target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cross-validation

A

technique that evaluates predictive models by partitioning the original model into training set and testing set to evaluate it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

training set

A

to build (train) the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

testing set

A

to evaluate it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

overfitting

A

when the algorithm predicts the training data so well it does not generalize to other models well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

classification

A

when the value to be predicted is a categorical variable, the supervised learning is of type classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Regression

A

when the value to be predicted is a numerical variable the supervised learning is of type regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unsupervised Learning

A

there is no target variable algorithm need to come up with the assignment based on data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Clustering

A

no know classes or categories. algorithm tries to learn of similarities and discover groups of similar data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

association

A

tries to find relationships between different variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

parametric

A

rely on the estimation of parameters of a function, or set of functions, for the purpose of prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

non-parametric

A

do not rely on parameter estimation in order to predict outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

hyper-parameters

A

a non parametric model may still involve the determination of settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Inherent Error

A

unavoidable. also called ‘noise’ or ‘irreducible error’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Bias
due to over-simplifications.
25
variance
due to over-complication. overly complex model will be unable to perfectly generalize and correctly predict the target variable
26
K-nearest Neighbors
algorithm assigns each data point to a class based on the class of its nearest points
27
classification report
provides information on different aspects of the classifier
28
precision
proportion of correct positive (event) predictions to all positive predictions Therefore = TP/(TP+FP)
29
Recall
recall for class x indicates the proportion of correct positive predictions to all true positive cases. = TP/(TP+FN)
30
F1-score
the harmonic mean of the precision and recall for each class
31
support
support indicates the number of each class we had in our testing data
32
accuracy
number of correct predictions over the total number of predictions
33
ROC Curve (Receiver Operating Characteristic curve)
the ROC curve is a plot that depicts how the true positive rate changes with respect to the false positive rate
34
in ROC FalsePositiveRate should be
close to 0
35
in ROC TruePositiveRate should be
close to 1
36
Scaling
helps us bring all features into the same scale
37
Logistic Regression
the result will be often mapped to a binary outcome
38
Logistic regression falls into?
supervised learning of the classification type
39
Probabilities need to satisfy two conditions
Always be positive always be between 0 and 1
40
odds of an event
is the probability of that event over its complement.
41
While probability of an event is always between 0 and 1
the odds could be any non-negative value
42
b0,b1 (Logistic Regression)
are the estimators of the model. also called predicted weights or the coefficients for each of the features
43
Data profiling
understanding what the data entails and identify anomalies, missing values, inconsistencies, etc.
44
data cleansing
activities include imputing missing values, removing missing values, addressing outliers, fixing variables that have inconsistent data
45
data structuring
bringing data into a structured form used for the analysis
46
data transformation
data may need to be transformed rescaled or normalized
47
Data collection
if data is not provided to you we need to collect it
48
Simple random sampling
each member of the population has the exact same probability of being selected in the sample
49
Systematic sampling
members of the population are selected based on a system (set of rules)
50
stratified sampling
population is divided into homogeneous slices (strata). Within each slice simple random sampling is performed and he results are combined (reduces sampling bias and improves accuracy of sampling)
51
Cluster sampling
the population is divided into subgroups, such that each cluster is a good representative of the population.
52
lower fence
Q1 - 1.5IQR
53
upper fence
Q3 - 1.5IQR
54
IQR
Q3 - Q1
55
data point is an outlier if
it is smaller than the lower fence or larger than the upper fence.
56
Dummy variables
we do one-hot encoding, variables created using one-hot will be used in place of the categorical variable
57
Label encoding
each category of the categorical variable is assigned a number based on some order
58
Regression
is a mathematical relationship between the features of a problem and the target variable that is to be predicted.
59
Linear regression
is a parametric method, requires a response variable (target) and one or multiple predictor variables (features)
60
the least squares method
produces a line that minimizes the sum of squared error
61
y and y hat
y is the actual value of the target variable y-hat is the predicted value of the target variable
62
e (the residual)
is the difference between y and y hat
63
R^2
coefficient of determination
64
MSE
Mean Squared Error
65
RMSE
Root Mean Squared Error
66
Coef of Determination
is an indicator that determines the goodness of our model's fit to the data, always between 0 and 1, a higher value is preferred
67
Mean Squared Error
measure that evaluates the average of the squared deviation between the values of the target and the predicted values of the target. Smaller values of MSE are preferred. a value of 0 is ideal but not possible
68
root mean squared error
RMSE is the average amount of deviation of data points from the regression line
69
Adj R^2
explicitly accounts for the number of explanatory variables. It is common to use adjusted R^2 for model selection because it imposes a penalty for any additional explanatory variable that is included in the analysis. Only increases when a new variable is added to the model that contributes to the prediction.
70
decision trees
the repeated splitting of nodes until we reach pure subsets is the building block of the classification and regression trees (CART) algorithm
71
When the target variable is categorical
the decision tree is a classification tree
72
when the target is numerical
the decision tree is a regression tree
73
Gini Index
measures the degree of impurity of a set of classes in the target variable
74
K mean algorithm step one
randomly pick k centroids from the sample points as initial cluster centers
75
K mean algorithm step 2
assign each sample to the nearest centroid
76
K mean algorithm step 3
Move the centroids to the center of the samples that were assigned to it
77
k mean algorithm step 4
repeat step 2 and 3 until maximum number of iterations is reached
78
elbow method
find the value of k, where the decrease in inertia slows down as k increases.
79
inertia
sum of squared distances between data points in each cluster and their cluster centre