turning term exam 1 Flashcards
data mining is a
process
the following stage in data mining involves digging beneath the surface to uncover the structure of the business problem and the data that are available and then match them to one or more data mining tasks for which we may have substantial science and technology to apply
data understanding
the following is not an example of machine learning tasks
calculating the annual profit or loss
” a computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E
the training data is referred to as
E
CRISP-DM is a codification for the data mining process which starts with the following stage
business and data understanding
supervised machine learning methods include the following
none of the above
business data science requires the combination of the know-how in the following
business domain expertise, mathematics, statistics, computer science
the diagram above depicts the following type of machine learning systems
unsupervised
the following machine learning algorithm attempts to find associations between entities based on transactions involving them
co-occurrence grouping
the following is not mentioned in chapter 1 of the data science for business book
IBM
in the diagrams the amount of shading corresponds to
total entropy
based on the diagrams which single attribute would you select to spilt between edible and poisonous mushrooms
spore print color
entropy is a measure of
purity
a basket contains 10 apples and nothing else a bowl contains 5 cherries and nothing else the entropy values of the set of apples in the basket and the set of cherries in a bowl are
0 and 0 respectively
a supervised segmentation with tree structured modeling can be done by recursively selecting the best attribute from multiple attributes based on their
information gain
in the diagram which are considered as nodes
all of the above (employed; balance and age; class write off and class not write off)
the following is an alternative to the entropy measure of information
gini impurity
in a decision tree a terminal node is also known as a
leaf
the cellular phone churn prediction problem discussed toward the end of chapter 3 uses a historical data set 20000 customer to measure the accuracy of the tree model the authors used a training set consisting of
50% customers who churned and 50% customers who did not churn
an instance is also called a
feature vector
the objective function of support vector machines is based on the idea that
the wider the bar is between the classes, the better
with linear regressions the goal is to find a model that gives the
minimum sum of squared errors
a model is a BLANK of reality created to serve a purpose
sampled representation
the following describes the parametric learning approach
start by specifying the structure of the model and then continue with calculating the best parameter values given a particular set of training data
the above is an example of the BLANK view
instance space
the basic linear model is not appropriate for estimating the class probability because the output of the linear function f(x) ranges
from negative infinity to positive infinity
the corresponding offs of probability of 0.8 is
4
logistic regression models are used widely for
classification
support vector machines are used widely for
classification
f(x)= w0+w1x1+w2x2+…
in the above general line model the parameters are
w0,w1,w2
the variation in mouse size explained by weight divided by the variation in mouse size not explained by weight is called
F
BLANK tells us how much of the variation in mouse size can be explained by taking mouse weight into account
R squared
in support vector machines to make a threshold that is not so sensitive to outliers we must allow
none of the above
there is BLANK variation around the line that we fit by least squares
less
the sum of squares of the distance from the mean to each data point is called
SS(mean)
R squared is 0.6 which means that BLANK explains 60% of the variation in mouse size
mouse weight
the sum of the distances between the data and the line squared is
SS(fit)
in support vector machines we use BLANK to determine the number of misclassifications and observations to allow inside of the soft margin to get the best classification
cross validation
the variation around the mean can be calculated using
(data-mean)^2 / n
in the case of predicting the weight of a mouse based on its size the following value of r2 squared will indicate a perfect prediction
1