Assignment Questions Flashcards
The marketing team requires your assistance in evaluating model XY with a total accuracy of 60%. This model predicts 75% of the customers that will not adhere to a digital subscription correctly. The new digital subscription will be rolled out to the customers for a weekly distribution at a cost of $50 a year. The current value of the paper magazine is $60 per year. The marketing team has estimated that targeting a customer will have a cost of $2 and in addition will incur the cost of a one year discount of 10% of the value of the subscription. Transferring the magazines to a digital format, has associated costs for the company which should be distributed through the whole customer base, in the first year of the transition. Different business models are being considered:
- Fixed yearly cost of $20 dollars per costumer;
- Pay per use.
Considering the fixed yearly cost case, what is the expected profit for the model XY ?
Considering the fixed yearly cost case, using model XY, over which probabilityConsidering the fixed yearly cost case, using model XY, over which probashould we target each consumer in the dataset?
Fixed yearly cost of $20 dollars per costumer;
2. Pay per use.
Based on the information from the previous questions and Table 1 draw the full dendrogram when using complete linkage. Show the steps of computations.
Calculate dendrogram by going through and combining customers and cluster together based on closest ones. Since complete linkage is being used, once they have been combined you fill the table with the furthest distances to the other clusters/customers
Explain the differences between hierarchical clustering and objective based clustering.
The main difference is that hierarchical clustering focuses on the similarities between the individual instances & how similarities link them together while objective based functions like k-means clustering focuses on the clusters themselves. K means computes the distance between the clusters and center. Hierarchical computes distance between all pairs of clusters on each iteration.
Within cluster distance minimised, distance between clusters needs to be maximised.
Also, objective based clustering can take many forms based on the objective function used.
Objective based functions also must be run many times as the clustering result is dependent upon the initial centroid locations, hierarchical clustering does not have this.
Also, you need to define the number of clusters for objective based.
Given table 2, plot the observations, start by randomly assigning a cluster label to each observation and do five steps of the K-means clustering.
Consider the logistic regression model f (x) = −1.48 −0.11x1 + 0.05x2 where x1 and x2 are ‘Years of Subscription’ and ‘Age’ respectively, trained in a different data than the one presented table 3. Draw the ROC curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve. Hint: : See figure 8.1 For these exercises, you are “ranking instead of classifying”.
Based on the same model, draw the profit curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve.
Based on the same model, draw the cumulative response curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve.
You just got to look at the confusion matrix and make the calculations.
Very straightforward if you have the cumulative response curve.
Classify David using a naive Bayes classifier, based on the information from all customers. David is 32 years old, enjoys fishing and has been a subscriber for 3 years. Show the steps of the computations. Explain the assumptions made.
Using the naive Bayes classifier you created in the previous question, classify the first three training examples. Does the model classifies the first 3 training examples correctly?
Research the use of m-estimate of probability in the naive Bayes classifier. Classify David using the m-estimate of probability with an equivalent sample size m = 4. Use 3 different values for the prior estimate p for P (C = c|E).
Using the Bayes’ optimal classifier you created in the previous question, classify the first three training examples. Does the model classifies the first 3 training
examples correctly?
Notice in the bottom example you need to find complete matches.
Explain what is an appropriate baseline model based on the same concept as a naive Bayes classifier and based on the same concept as decision trees.
Naive bayes - calculating without attributes
Decision tree - most basic version of a tree, so one root split
Research what is a perceptron and what is the role of the activation function.
A perceptron is a single layer neural network. It is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether an input, represented by a vector of numbers, belongs to some specific class. A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs then outputs a 1 if the output is greater than some threshold, or -1 otherwise.
“Perceptron is just a node, it’s the most basic feedforward NN. It consists one layer of one node. It does binary classification.”
The activation function decides whether a neuron should be activated or not by calculating the weighted sum and adding bias to it. The reason the activation function is added, is to introduce non-linearity to the output of the neuron.
In neural networks training, what is an epoch and an iteration?
An epoch is one forward and one backward pass of the data.
For each complete epoch, we have several iterations. Iteration is the number of batches or steps through partitioned packets of the training data, needed to complete one epoch.
Research why the softmax function is used in the output layer for classification problems.