turning term exam 1 Flashcards by Allison Schmid

data mining is a

process

How well did you know this?

Not at all

Perfectly

the following stage in data mining involves digging beneath the surface to uncover the structure of the business problem and the data that are available and then match them to one or more data mining tasks for which we may have substantial science and technology to apply

data understanding

How well did you know this?

Not at all

Perfectly

the following is not an example of machine learning tasks

calculating the annual profit or loss

How well did you know this?

Not at all

Perfectly

” a computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E

the training data is referred to as

How well did you know this?

Not at all

Perfectly

CRISP-DM is a codification for the data mining process which starts with the following stage

business and data understanding

How well did you know this?

Not at all

Perfectly

supervised machine learning methods include the following

none of the above

How well did you know this?

Not at all

Perfectly

business data science requires the combination of the know-how in the following

business domain expertise, mathematics, statistics, computer science

How well did you know this?

Not at all

Perfectly

the diagram above depicts the following type of machine learning systems

unsupervised

How well did you know this?

Not at all

Perfectly

the following machine learning algorithm attempts to find associations between entities based on transactions involving them

co-occurrence grouping

How well did you know this?

Not at all

Perfectly

the following is not mentioned in chapter 1 of the data science for business book

IBM

How well did you know this?

Not at all

Perfectly

in the diagrams the amount of shading corresponds to

total entropy

How well did you know this?

Not at all

Perfectly

based on the diagrams which single attribute would you select to spilt between edible and poisonous mushrooms

spore print color

How well did you know this?

Not at all

Perfectly

entropy is a measure of

purity

How well did you know this?

Not at all

Perfectly

a basket contains 10 apples and nothing else a bowl contains 5 cherries and nothing else the entropy values of the set of apples in the basket and the set of cherries in a bowl are

0 and 0 respectively

How well did you know this?

Not at all

Perfectly

a supervised segmentation with tree structured modeling can be done by recursively selecting the best attribute from multiple attributes based on their

information gain

How well did you know this?

Not at all

Perfectly

in the diagram which are considered as nodes

all of the above (employed; balance and age; class write off and class not write off)

How well did you know this?

Not at all

Perfectly

the following is an alternative to the entropy measure of information

Study These Flashcards

gini impurity

in a decision tree a terminal node is also known as a

Study These Flashcards

leaf

the cellular phone churn prediction problem discussed toward the end of chapter 3 uses a historical data set 20000 customer to measure the accuracy of the tree model the authors used a training set consisting of

Study These Flashcards

50% customers who churned and 50% customers who did not churn

an instance is also called a

Study These Flashcards

feature vector

the objective function of support vector machines is based on the idea that

Study These Flashcards

the wider the bar is between the classes, the better

with linear regressions the goal is to find a model that gives the

Study These Flashcards

minimum sum of squared errors

a model is a BLANK of reality created to serve a purpose

Study These Flashcards

sampled representation

the following describes the parametric learning approach

Study These Flashcards

start by specifying the structure of the model and then continue with calculating the best parameter values given a particular set of training data

the above is an example of the BLANK view

instance space

the basic linear model is not appropriate for estimating the class probability because the output of the linear function f(x) ranges

from negative infinity to positive infinity

the corresponding offs of probability of 0.8 is

logistic regression models are used widely for

classification

support vector machines are used widely for

classification

f(x)= w0+w1x1+w2x2+... in the above general line model the parameters are

w0,w1,w2

the variation in mouse size explained by weight divided by the variation in mouse size not explained by weight is called

BLANK tells us how much of the variation in mouse size can be explained by taking mouse weight into account

R squared

in support vector machines to make a threshold that is not so sensitive to outliers we must allow

none of the above

there is BLANK variation around the line that we fit by least squares

less

the sum of squares of the distance from the mean to each data point is called

SS(mean)

R squared is 0.6 which means that BLANK explains 60% of the variation in mouse size

mouse weight

the sum of the distances between the data and the line squared is

SS(fit)

in support vector machines we use BLANK to determine the number of misclassifications and observations to allow inside of the soft margin to get the best classification

cross validation

the variation around the mean can be calculated using

(data-mean)^2 / n

in the case of predicting the weight of a mouse based on its size the following value of r2 squared will indicate a perfect prediction

turning term exam 1 Flashcards

(40 cards)