exam 2 Flashcards
A qualitative variable with more than 2 levels is best modeled by which type of distribution?
multinomial
A quantitative variable is best modeled by which type of distribution?
Gaussian
A qualitative variable with 2 levels is best modeled by which distribution?
binomial
The conjugate prior of a multinomial is which kind of distribution?
Dirichlet
The conjugate prior of a binomial is which type of distribution?
Beta
Which of the following types of ML algorithms would typically be used to predict housing prices?
Regression
Which of the following types of ML algorithms would typically be used to predict whether a student is likely to be admitted to university (yes/no)?
Classification
In building a machine learning model with supervised learning, one column will be the _______________ column, and all others will be attributes used to predict that column.
Target
Which of the following terms refers to a row in a data set? Check all that apply.
observation
instance
example
Categorical data (aka qualitative data) can take on real number values.
False
In regression, the target is quantitative data.
True
What is the shape of the following array (as would be returned by the .shape attribute)?
arr1 = np.array([1, 2, 3])
(3,)
A NumPy array was created with the following line of code. What is the shape of the array?
arr1 = np.array([(10,15,45), (21,17,9)])
(2, 3)
Which expression will sum the second column of a 2D array named arr1?
np.sum(arr1[:, 1])
Which array reference would select element 23 from the following 2x2 NumPy array, named arr1?
18 23
21 9
arr1[0, 1]
T/F:
The following two lines of code would retrieve the same column from a pandas data frame.
df[‘A’]
df.A
True
T/F:
The following line of code returns a pandas data frame consisting of just one column extracted from pandas data frame df:
df[‘A’]
False
T/F:
The difference between categorical data and one-hot encoding is that categorical data can be represented in one column and one-hot encoding would require n columns, where n is the number of levels for the category.
True
T/F:
The following line of code trains the kNN algorithm.
knn = KNeighborsClassifier(n_neighbors=7)
False
T/F:
Instead of using Pandas or NumPy, we can use built-in Python lists for data and get similar execution times.
False
Which metric quantifies TP / (TP + FN)?
Recall
How to calculate recall?
TP / (TP + FN)
Which metric tries to adjust for the likelihood that the classifier guessed correctly by chance?
Kappa
How to calculate precision?
TP/(TP + FP)
Which metric quantifies TP / (TP + FP)?
Precision
In Bayes’ theorem, P(data|class) is called the:
likelihood
The naive Bayes algorithm is ‘naive’ because:
It assumes predictors are independent
T/F: Logistic regression is used for regression tasks where the target is a real number.
False
T/F: The terms ‘probability’ and ‘odds’ refer to the same thing mathematically.
False.
What does “Probability” refer to mathematically?
Percentage of an outcome out of all possible outcomes. Range is 0 to 1.
What does “odds” refer to mathematically?
The ratio of (outcome)/(!outcome). Range from 0 to infinity.
T/F: Logistic Regression and Naive Bayes are both considered to be linear classifiers.
True